There is tremendous interest in hybrid clouds among enterprises today. Hybrid clouds afford the opportunity to combine the scalability, reliability and efficiencies of a public cloud with the security and control of a private cloud. Enterprises can partition workloads, such that mission-critical, integrated applications reside on secure, high-performance private clouds, while stand-alone apps with low security requirements are hosted on public clouds. Hybrid clouds also happen to be today’s biggest driver of high-performance private clouds, the data center nirvana of speed, security and compute power. The appeal is real, and so are the countless pilot programs building apps to leverage its capabilities for open source cloud platforms, like OpenStack. So, what is the holdup? In truth no one really knows. There are many factors that contribute to the chasm between pilot and production. Here we take a look at five of them, as well as strategies for bridging the gap with Web-scale infrastructure management.
Why the Chasm?
First, let’s talk legacy. Projects requiring high-performance computing (HPC) were initially executed on large legacy systems deployed in traditional server farms and data centers. Legacy systems can process very large sets of data, but they fall short on speed and agility compared to today’s cluster environments. Clusters aggregate and share computing power and bandwidth among many less expensive, commodity servers. High performance computing in the cloud utilizes cluster architectures to deliver high bandwidth and very high compute capabilities over a low-cost infrastructure. The transition from legacy systems to commodity servers may represent the largest deterrent impeding high-performance private clouds. There can be a steep learning curve and lengthy installation process that often leads to an IT sinkhole.
Second, cluster architectures are inherently complex and have unique requirements. Servers in a traditional server farm accept external requests, process those requests, and respond to the requester. Each server completes its job independently, without any notion of the other hundreds or thousands of servers also completing requests in neighboring racks. As new servers are added to the data center, they are installed and configured, then brought online without interrupting existing servers. Servers in a cluster behave collaboratively. Each server in a cluster must be aware of every other server in the cluster. This is so that the servers can collectively accept external requests, process those requests, and then respond to those requests as a team. At an absolute minimum, each server in a cluster must have complete awareness of other servers and have the exact same software configuration. New servers added to the cluster must also meet each of these requirements and, likewise, existing servers must become aware of new servers. Many enterprises find managing clusters of 10-20 servers to be fairly straightforward and similar to management for server farms. However, as they scale to hundreds or even thousands of servers, the complexity amplifies exponentially.
Third, infrastructure management of server-based systems is also extremely complex. This is nothing new. There are typically seven complete systems that must work together in concert to manage through all the layers of the infrastructure. So often the infrastructure management platform is a disjointed toolchain, pieced together over time and as needed. Enterprises begin with something, perhaps the bare metal script, then build upon that, one application or service at a time, as needed. They may leverage the management features of some applications, develop a homegrown patch between others, or work with solution providers to develop a proprietary framework to hold it all together. Many vendors develop the technologies to manage their own product offerings, yet they typically don’t extend up or down the stack. To be fair, why should they? It’s not within their core competency. The resulting reactive type of infrastructure management can be effective in meeting needs on a short-term basis. However it is not typically sustainable or scalable to the extent needed for today’s quickly evolving data intensive ecosystem.
Fourth, real-world infrastructures are built upon a highly diversified physical layer, while most pilot environments are homogeneous. Every piece of hardware is known – every connection, every application, every component and system is documented and well-conceived. Once the management platform from the pilot environment is moved to a production scenario, things are less certain and admins don’t know what to expect. It’s difficult to plan for unforeseen variables and challenges.
Fifth, open source cloud environments are not conducive to migratory evolution. OpenStack is a revolutionary open source cloud platform that provides the mechanism for emerging applications to leverage cloud infrastructures like never before. With this opportunity and promise comes tremendous innovation. As the OpenStack Foundation continues to deliver improvements and exciting new capabilities, enterprises are eagerly awaiting upgrades but often opt to rebuild to get the most out of new releases without compromising performance, reliability, security or bandwidth on existing systems. In many cases, it just makes more sense to rebuild than to migrate. This can be a daunting, time consuming and costly prospect, especially if the updates of a particular release are not deemed mission critical at the time.
Bridging the Gap with Infrastructure Management
Given all of these factors, how can enterprises bridge the gap to get from pilot to production? Put simply, it’s about infrastructure management. By implementing an agile, complete and easy infrastructure management platform from the ground up, enterprises can build, manage and run high performance private clouds at scale.
Building the physical layer is often viewed as the most challenging phase of a deployment. This is where the lion’s share of heavy lifting takes place. Installation is the foundation for the system, yet tools for installation are characteristically immature.
What features ensure an agile and complete installation? Automation tools can simplify and speed the installation process by drastically reducing the time spent on routine maintenance tasks and overcoming the complexity of a heterogeneous, cluster environment. Automation makes it easier to install with complete server awareness, uniform bits and identical software stacks.
Scripting tools and bare metal install tools are commonplace among enterprises, but they require manual processes that are time-intensive and error-prone. Parallel discovery and installation is another feature, critical in helping ensure complete server awareness and system reliability as well as expediting new rollouts and upgrades across the complex cluster landscape. Programmability, such as through an XML framework, provides accessibility for IT teams.
Ongoing maintenance and monitoring is another important facet of the installation process that is typically overlooked or tacked on as an afterthought. Maintenance and monitoring of the full physical layer can help management systems identify the source more easily when problems occur. Pre-installation can be another important part of the process but is not yet standard procedure. Pre-installation capabilities can provide “good state” assurance before installation, saving critical time and avoiding false starts.
Agile (News - Alert) management at the installation layer, especially including pre-installation assurance and parallelized discovery, significantly speeds time to rollout. A fast installation process can give enterprises the opportunity to evaluate new versions of their open source cloud platform to determine whether and when to upgrade. Rather than spending weeks to install a new update, an automated process takes hours and new releases can be quickly evaluated. This provides greater flexibility to introduce new and emerging applications and technologies over the high performance private cloud as well as updating business processes and implementing value-added changes. In addition, an agile installation process can allow enterprises to integrate legacy systems or transition services from legacy machines to server clusters.
The configuration layer, or what’s commonly thought of as the home of “middleware”, presents a different set of challenges and solutions. The configuration layer manages the physical layer; it provides the environment to maximize resources for applications. At this layer, physical resources are abstracted and virtualized through software, to be shared among applications. Application hosting platforms reside at the configuration layer along with cloud programming environments and tools. Quality of service, service level agreements, security, accounting, billing and other key foundation and execution services have their roots in the configuration layer.
This is also the layer that can be overrun with script sprawl. As such, there are many solutions on the market today that compartmentalize various portions of the configuration layer to facilitate integration between the physical layer and the applications. These solutions provide pre-configured shortcuts to save IT administrators hours of scripting and make it easier for enterprises to realize the benefits of the application layer.
At the configuration layer, enterprises should look for solutions that offer some level of automation or package-based integration, auto-node repairs, compliance checks, performance diagnostics and configuration for all elements of the infrastructure. Site-specific configurations, updates and patches to all servers and clusters must be synchronized by the infrastructure management platform at this layer. While the configuration layer is its own beast, infrastructure management platforms that also extend to the physical and application layers afford significant benefits in maximizing resources for the full system.
With the quickly evolving OpenStack ecosystem bringing new applications for deep data analysis and insights, the top of the stack is getting a lot of attention. The most critical feature an infrastructure management platform can bring to this layer is ease. An agile and complete infrastructure management system should make it easy to adopt, deploy and manage emerging technologies and applications. Real-time system visualization and monitoring and repair along with direct flow of applications rules and policies into the physical layer are all key features for ensuring reliable, fast and secure application service delivery.
While the journey from pilot to production may seem long and obstacle-laden, the benefits of utilizing a software-defined methodology are many. Enterprises can build powerful high-performance private clouds to seize new opportunities and embrace hybrid cloud strategies to gain high performance, high bandwidth and security. An agile, complete and easy infrastructure management platform that addresses critical requirements from the physical layer installation through configuration and to the running applications can ease the transition and ensure a scalable, reliable and flexible cloud infrastructure.
With an emphasis on easy and fast installation, infrastructure management platforms can help companies evaluate and keep pace with ongoing advancements in open source cloud platforms such as OpenStack. Infrastructure management can bridge the gap between pilot and production to get high performance private clouds off the ground.
Edited by Dominick Sorrentino