Why the future is cloud autonomics

(c)iStock.com/eric1513

Many great innovations have come out of cloud computing, such as on-demand infrastructure, consumption-based pricing and access to global computing resources. However, these powerful innovations have come at a high cost: complexity.

Managing cloud infrastructure today is substantially more complex than managing traditional data center infrastructure. While some of the complexity is a direct consequence of operating in a highly transient and shared computing environment, most has to do with the unintended side effects of cloud computing. For example, consumption-based pricing allows us to pay for only what we use, but requires careful monitoring and continuous optimization to avoid resource waste and poor cost management.

API-driven resources allow us to launch compute and storage with a few lines of code, but require that the resources be highly configurable to support the various needs of its different users. Purpose-built services (e.g. Amazon S3) substantially reduce the barrier to building new types of applications, but require that we obtain the necessary expertise to manage and operate the new services.

The rise of cloud management

These early experiences with the growing complexity of managing cloud environments spawned a new generation of open source and commercial products intended to help us contain this complexity. However, the continued pace of innovation in public, private and hybrid clouds, combined with the increasing importance of multi-cloud environments, has continued to widen the gap between the complexity of our infrastructure and our ability to manage that infrastructure. It has become increasingly clear that something needs to change to support the cloud environments of tomorrow.

The genesis of autonomic computing

In 2001, IBM released a manifesto predicting a looming software complexity crisis caused by our inability to manage the rapid growth in computation, communication and information technology. The solution proposed was autonomic computing, a term which took its name from the autonomic central nervous system, essentially an automatic control system for the human body. In its manifesto, IBM defined autonomic computing as self-managing systems that can configure, optimize, heal and protect themselves without human intervention.

While the paper launched a field of research that remains active today, its impact has not been widely felt in the industry, in large part because the gap IBM forecasted did not become a substantial impediment to businesses until the early mainstream adoption of cloud computing, almost a decade later. But now, with the gap between the complexity of infrastructure and our ability for software to manage this complexity continuing to widen, the IBM manifesto seems suddenly prescient.

The impact of cloud autonomics

Cloud autonomics is the use of autonomic computing to enable organisations to more effectively harness the power of cloud computing by automating management through business policies. Instead of relying on manual processes to optimise cost, usage, security and performance of cloud infrastructure, a CIO/CTO can define business policies that define how they want their infrastructure to be managed and allow an autonomic system to execute these policies.

Cloud autonomics envisions a future in which businesses manage their clouds like brokerages manage equity trading – with policy-aware automation. A business will configure an autonomic system with governance rules. The system will then continuously monitor the business’s cloud environment, and when the environment goes out of compliance, will make the necessary changes to bring it back in line. Some sample policies cloud autonomics anticipates include:

Cost

  • The automated purchase of reserved capacity in support of organizational or functional needs (e.g. AWS reservations).
  • The automated movement of an idempotent workload from one cloud provider to another to obtain cost efficiencies.

Availability

  • The automated migration of data to another region in support of business service level agreements (SLAs).
  • The automated migration and/or backup of storage from one medium to another (e.g. migrating data in AWS EBS to S3 or Glacier).

Performance

  • The automated increase of the machine type for a workload to support more efficient operation of non-horizontally scaling workloads.

Security

  • The automated change of network and/or endpoint security to conform to business policies.

Usage

  • The automated shutdown of idle or long-running instances in support of business policies (e.g. shutdown idle development infrastructure running more than a week).

Making cloud autonomics work for you

Cloud autonomics requires a system capable of executing a collect, analyze, decide and act loop defined in autonomic computing. Unlike autonomic computing, a cloud autonomic system relies on policy-driven optimizations instead of artificial intelligence. The system monitors one or more cloud providers and the customer infrastructure running within these cloud providers, evaluates user-defined policies, and using optimization algorithms, identifies recommendations to alter the infrastructure to be consistent with the user-defined policies. These recommendations may optionally require external approval before execution, and can seek privileges from an external system to execute approved changes.

A cloud autonomic system is capable of continual and automated optimization of cloud infrastructure based on business policies, both within a single cloud environment and across multiple clouds. It promises a future in which organizations can define policies, and have these policies securely managed down to microsecond decisions, resulting in consistent implementation and optimal resource utilization.