Bursting into the Clouds – Experimenting with Cloud Bursting

Guest Post by Dotan Horovits, Senior Solutions Architect at GigaSpaces

Dotan Horovits is one of the primary architects at GigaSpaces

Dotan Horovits is one of the primary architects at GigaSpaces

Who needs Cloud Bursting?

We see many organizations examining Cloud as replacement for their existing in-house IT. But we see interest in cloud even among organizations that have no plan of replacing their traditional data center.

One prominent use case is Cloud Bursting:

Cloud bursting is an application deployment model in which an application runs in a private cloud or data center and bursts into a public cloud when the demand for computing capacity spikes. The advantage of such a hybrid cloud deployment is that an organization only pays for extra compute resources when they are needed.
[Definition from SearchCloudComputing]

Cloud Bursting appears to be a prominent use case in cloud on-boarding projects. In a recent post, Nati Shalom summarizes nicely the economical rationale for cloud bursting and discusses theoretical approaches for architecture. In this post I’d like to examine the architectural challenges more closely and explore possible designs for Cloud Bursting.

Examining Cloud Bursting Architecture

Overflowing compute to the cloud is addressed by workload migration: when we need more compute power we just spin up more VMs in the cloud (the secondary site) and install instances of the application. The challenge in workload migration is around how to build a consistent environment in the secondary site as in the primary site, so the system can overflow transparently. This is usually addressed by DevOps tools such as ChefPuppetCFEngine and Cloudify, which capture the setup and are able to bootstrap the application stack on different environments. In my example I used Cloudify to provide consistent installation between EC2 and RackSpace clouds.

The Cloud Bursting problem becomes more interesting when data is concerned. In his post Nati mentions two approaches for handling data during cloud bursting:

1. The primary site approach – Use the private cloud as the primary data site, and then point all the burst activity to that site.
2. Federated site approach – This approach is similar to the way Content Distribution Networks (CDN) work today. With this approach we maintain a replica of the data available at each site and keep their replicas in sync.

The primary site approach incurs heavy penalty in latency, as each computation needs to make the round trip to the primary site to get the data for the computation. Such architecture is not applicable to online flows.

The federated site approach uses data synchronization to bring the data to the compute, which saves the above latency and enables online flows. But if we want to support “hot” bursting to the cloud, we have to replicate the data between the sites in an ongoing streaming fashion, so that the data is available on the cloud as soon as the peak occurs and we can spin up compute instances and immediately start to redirect load. Let’s see how it’s done.

Cloud Bursting – Examining the Federated Site Approach

Let’s put up our sleeves and start experimenting hands-on with the federated site approach for Cloud Bursting architecture. As reference application let’s take Spring’s PetClinic Sample Application and run it on an Apache Tomcat web container. The application will persist its data locally to a MySQL relational database.

The primary site, representing our private data center, will run the above stack and serve the PetClinic online service. The secondary site, representing the public cloud, will only have a MySQL database, and we will replicate data between the primary and secondary sites to keep data synchronized. As soon as the load on the primary site increases beyond a certain threshold, we will spin up a machine with an instance of Tomcat and the PetClinic application, and update the load balancer to offload some of the traffic to the secondary site.

On my experiment I used Amazon EC2 and RackSpace IaaS providers to simulate the two distinct environments of the primary and secondary sites, but any on-demand environments will do.

REPLICATING RDBMS DATA OVER WAN

How do we replicate data between the MySQL database instances over WAN? On this experiment we’ll use the following pattern:

1.     Monitor data mutating SQL statements on source site. Turn on the MySQL query log, and write a listener (“Feeder”) to intercept data mutating SQL statements, then write them to GigaSpaces In-Memory Data Grid.

2.     Replicate data mutating SQL statements over WAN. I used GigaSpaces WAN Replication to replicate the SQL statements  between the data grids of the primary and secondary sites in a real-time and transactional manner.

3.     Execute data mutating SQL statements on target site. Write a listener (“Processor”) to intercept incoming SQL statements on the data grid and execute them on the local MySQL DB.

 

 

 

 

 

 

 

 

 

 

 

 

To support bi-directional data replication we simply deploy both the Feeder and the Processor on each site.

AUTO-BOOTSTRAP SECONDARY SITE

When peak load occurs, we need to react immediately, and perform a series of operations to activate the secondary site:

1.     spin up compute nodes (VMs)

2.     download and install Tomcat web server

3.     download and install the PetClinic application

4.     configure the load balancer with the new node

5.     when peak load is over – perform the reverse flow to tear down the secondary site

We need to automate this bootstrap process to support real-time response to peak-load events. How do we do this automation? I used GigaSpacesCloudify open-source product as the automation tool for setting up and for taking down the secondary site, utilizing the out-of-the-box connectors for EC2 and RackSpace. Cloudify also provides self-healing  in case of VM or process failure, and can later help in scaling the application (in case of clustered applications).

Implementation Details

The result of the above experimentation is available on GitHub. It contains:

§  DB scripts for setting up the logging, schema and demo data for the PetClinic application

§  PetClinic application (.war) file

§  WAN replication gateway module

§  Cloudify recipe for automating the PetClinic deployment

See the documentation on GitHub for detailed instructions on how to configure the above with your specific deployment details.

Conclusion

Cloud Bursting is a common use case for cloud on-boarding, which requires good architecture patterns. In this post I tried to suggest some patterns and experiment with a simple demo, sharing it with the community to get feedback and raise discussion on these cloud architectures.

More information can be seen at an upcoming GigaSpaces webinar on Transactional Cross-Site Data Replication on June 20th (register at: http://bit.ly/IM0w9F)