High Performance Computing (HPC) is now offered by top cloud vendors. HPC can be deployed on-premise and in the cloud. As a cloud service, HPC provides organizations parallel processing at scale. In this article, you will learn what HPC is and what are the benefits of using cloud-based HPC.
What Is HPC and Why is it Moving to the Cloud?
High Performance Computing (HPC) is a category of computing devices that use parallel processing across clustered processing units, to provide massive computing capacity. Organizations use HPC to solve problems with high computational requirements, or perform large scale analyses that cannot be performed using traditional workstations or servers.
While ordinary computing systems have one CPU with up to 18 cores, an HPC system may have two or more CPUs, each with 16-64 cores. Each core in an HPC CPU has its own high speed memory, allowing the system to crunch even more data. In addition, many HPC systems use Graphical Processing Units (GPUs), which can help with specialized computing tasks such as image and video processing and artificial intelligence algorithms.
HPC computing offers higher speed than traditional computing infrastructure, better scalability, and improved cost efficiency. They can distribute more computing jobs across pooled infrastructure, enabling near 100% utilization of resources. At the same time, HPC machines are very expensive compared to regular workstations or servers, and so until recently, they were out of the reach of many small-to-medium organizations.
Today, HPC is moving to the cloud. Organizations are already accustomed to booking resources like servers, databases and other computing services on the public cloud. Today, they can also use high performance components and complete HPC systems as a service. This is transforming the HPC field, and also the possibilities offered by mainstream cloud computing.
The big three cloud providers—Amazon Web Services, Microsoft Azure and Google (News - Alert) Cloud—all support HPC and help customers orchestrate and manage workloads, making HPC operations seamless. This means virtually anyone can spin up an HPC machine and run heavy computations with low complexity and no upfront investment.
Public Cloud and HPC Resources are Converging
While public cloud providers offer full-fledged HPC hardware on demand, such as Cray supercomputers offered on Azure, they also offer lightweight computing options that offer HPC-like capabilities. A few examples:
- GPU machines—Amazon, Microsoft (News - Alert) and Google all offer the option to use GPU-based machines, as part of their Infrastructure as a Service (IaaS) offering. These machines are useful for computing problems with a huge volume of processes of a similar nature that need to be performed in parallel. However, GPU machines come at a substantial premium compared to regular cloud instance sizes.
- High performance machines—all clouds offer large instance sizes with a large amount of memory, strong traditional CPUs with a large number of cores, and other high performance hardware such as SSD drives. These machines won’t be enough for all HPC workloads, but can certainly handle many demanding computing scenarios. Due to the elastic scalability of the cloud, it is easy to split jobs across multiple high-performance machines.
- Field-programmable gate arrays (FPGAs)—public clouds are starting to offer FPGA hardware, for example Amazon provides the F1 instance category. FPGA can be individually adapted to each computation job for higher efficiency, and can run in parallel to the virtual machine resources, providing an additional performance boost without putting an additional load on the traditional CPU.
Benefits of HPC in the Cloud
There are several reasons HPC on the cloud is increasingly attractive for organizations of all sizes.
HPC as a service
Cloud HPC allows organizations to use HPC hardware without any upfront investment, and lets them run special HPC projects for a limited period of time without needing to purchase equipment. Due to the high cost of specialized HPC hardware, pay-as-you-go is especially compelling for high performance use cases.
Sharing HPC data with other business processes
Unlike traditional HPC systems which were disconnected from other parts of the business, cloud HPC systems can share data much more easily across multiple cloud services. HPC systems typically read data from mainstream cloud storage services like Amazon S3 or Azure Blob Storage, which can also be integrated with a host of other business processes. You can have one team developing AI algorithms and another team running and testing those algorithms on HPC hardware, or provide immediate access to the results of HPC calculations to business analysts.
Easy extensibility and traceability
HPC systems operate on data, often sensitive customer data, and need to play well with compliance standards, security policies and IT operational processes. Cloud-based HPC is much easier to connect to standard organizational processes. From an IT perspective, HPC systems in the cloud look like any other cloud service, and are easier to visualize, standardize and establish the required control structures.
Lower reliance on specialized skills
While traditional HPC systems often required specialized engineering skills, cloud-based solutions use a familiar operational and programming model. Developers, data scientists and IT operations can work with cloud-based HPC similarly to other cloud services. Existing software, analytics tools and algorithms can be applied to HPC data and processed by HPC hardware much more seamlessly. This can reduce the cost of HPC projects and reduce the time to market for HPC-driven data analysis.
HPC in the cloud is enabling new applications, closer integration with developer tools, access to massively scalable big data analytics services, and easier ways to manage and orchestrate large workloads. You can use HPC as a cloud service and grow at scale. In the cloud, HPC data can be shared with other collaborators and interested parties. You can also easily get support and management from your cloud vendor or third parties.
Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung (News - Alert) NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.LinkedIn (News - Alert): https://www.linkedin.com/in/giladdavidmaayan/