High performance computing (HPC) is the practice of aggregating computing resources to gain performance greater than that of a single workstation, server, or computer. HPC can take the form of custom-built supercomputers or groups of individual computers called clusters.
HPC can be run on-premises, in the cloud, or as a hybrid of both.
Each computer in a cluster is often called a node, with each node responsible for a different task. Controller nodes run essential services and coordinate work between nodes, interactive nodes or login nodes act as the hosts that users log in to, either by graphical user interface or command line, and compute or worker nodes execute the computations. Algorithms and software are run in parallel on each node of the cluster to help perform its given task.
WHAT IS A SUPERCOMPUTER?
A supercomputer is an extremely high-performance computer. Its processing speed exceeds that of an ordinary computer by billions (sometimes trillions).
TYPES OF HPC CLUSTERS
HPC usually has three main components: Compute, Network and Storage.
In basic terms, the nodes (compute) of the HPC system are connected to other nodes to run algorithms and software simultaneously, and are then connected (network) with data servers (storage) to capture the output. As HPC projects tend to be large and complex, the nodes of the system usually have to exchange the results of their computation with each other, which means they need fast disks, high-speed memory, and low-latency, high-bandwidth networking between the nodes and storage systems.
HPC can typically be broken down into two general design types: cluster computing and distributed computing.
Cluster computing
Parallel computing is done with a collection of computers (clusters) working together, such as a connected group of servers placed closely to one another both physically and in network topology, to minimize the latency between nodes.
Distributed computing
The distributed computing model connects the computing power of multiple computers in a network that is either in a single location (often on-premises) or distributed across several locations, which may include on-premises hardware and cloud resources.
HPC clusters can also be distinguished between homogeneous vs heterogeneous hardware models.
In homogenous clusters, all machines have similar performance and configuration, and are often treated as the same and interchangeable. In heterogeneous clusters, there is a collection of hardware with different characteristics (high CPU core-count, GPU-accelerated, and more), and the system is best utilized when nodes are assigned tasks to best leverage their distinct advantages.
HPC IN THE CLOUD
HPC can be performed on-premises with dedicated equipment, in the cloud, or a hybrid of each.
HPC in the cloud offers the benefit of flexibility and scalability without having to purchase and maintain expensive dedicated supercomputers. HPC in the cloud provides all the necessary infrastructure needed to perform large, complex tasks such as data storage, networking solutions, specialized compute resources, security, and artificial intelligence applications. Workloads can be performed on demand, which means that organizations can save money on equipment and time on computing cycles, only using the resources they need, when they need them.
WHY IS HPC IMPORTANT?
The modern ecosystem is flooded with data and compute-intensive tools to analyze it. HPC allows businesses and organizations to process all of that data in a timely manner, fueling new insights, innovations, and scientific discoveries. This allows companies to forecast business scenarios, predict market fluctuations, and make recommendations. The medical field, meanwhile, is being transformed with easy access to HPC in the cloud, helping to model potential outbreaks, decode the genome of cancerous cells, and understand how diseases evolve.
In short, HPC is accelerating the rate at which scientific, technological, and business progress is made, helping humanity craft a more prosperous future.
