Category Archives: Cloudify

Gigaspaces Cloudify Partners with OpSpaces for Chef Onboarding

GigaSpaces Technologies, with its new release of the open source Cloudify product, has partnered with OpsCode for a dedicated Chef integration that caters to diverse application stacks and systems.

“The concept of DevOps and recipes can go well beyond setup, to actually manage the entire lifecycle of your applications—from setup, to monitoring, through maintaining high availability, and auto-scaling when required.  This is where Cloudify and Chef come together,” says Bryan Hale, Director of Business Development for OpsCode. “By enabling users to leverage the power and variety of Chef recipes and cookbooks to deploy services, Cloudify supports comprehensive application level orchestration on any cloud.”

In addition to the integration with Chef, this new release also includes the following features:

  • Complete application-level orchestration, allowing automated provisioning, deployment, management and scaling of complex multi-tier apps to any cloud environment
  • Built-in, ready to use recipes for common big data components, such as Hadoop, Cassandra and MongoDB.
  • Support for non-virtualized environments (AKA Bring Your Own Node), allowing you to treat an arbitrary set of server as your “Cloud” and have Cloudify deploy and manage applications on these servers.
  • Comprehensive REST API for easy integration with third-party tooling and programmatic access.
  • Support for all the common cloud infrastructures, including OpenStack, HPCloud, RackSpace, Windows Azure, Apache CloudStack, Amazon AWS and vCloud.

In addition, Cloudify now also simplifies the complexities involved with deploying big data applications to the cloud.  It is well-known that the massive computing and storage resources that are needed to support big data deployments make cloud environments, public and private, an ideal fit.  But managing big data application on the cloud is no easy feat – as these systems and applications often include other services such as relational and non-relational databases, stream processing tools, web front ends and more, where each framework comes with its own management, installation, configuration, and scaling mechanisms.  With its new built-in recipes, Cloudify provides consistent management and cloud portability for popular big data tools, exponentially reducing the operational and infrastructure costs involved with running these systems.

“We’re seeing a growing market trend for the need to migrate applications – not just in one-off processes anymore – but on a much larger scale, by enterprises, managed service providers, and ISVs alike, who are looking to take advantage of the cloud promise—while until now, only about 5% have actually been able to do so,” says Uri Cohen, Vice President of Product Management at GigaSpaces. “The beauty of Cloudify and its recipe-based model is that it enables you to simply and smoothly take both new and existing applications to the cloud by the tens and hundreds through Cloudify’s built-in recipes and the new integration with OpsCode’s Chef, in very short time frames.”

You can fork the Cloudify code from GitHub, or download the Cloudify GA release.

Woz is Worried About “Everything Going to the Cloud” — the Real Issue is Giving Up Control

Guest Post By Nati Shalom, CTO and Founder of GigaSpaces

In a recent article, Steve Wozniak, who co-founded Apple with the late Steve Jobs, predicted “horrible problems” in the coming years as cloud-based computing takes hold. 

“I really worry about everything going to the cloud,”.. “I think it’s going to be horrendous. I think there are going to be a lot of horrible problems in the  next five years. ….“…with the cloud, you don’t own anything. You already signed it away.”

When I first read the title I thought, Wozniak sounds like Larry Ellison two years ago, when he pitched the Cloud is hype, before he made a 180-degree turn to acknowledge Oracle wished to be a cloud vendor too. 

Reading it more carefully, I realized the framing of the topic is instead just misleading. Wozniak actually touches on something that I hear more often, as the cloud hype cycle is moves from a Peak of Inflated Expectations into through the Trough of Disillusionment.

Wozniak echos an important lesson, that IMO, is major part of the reason many of the companies that moved to cloud have experienced lots of outages during the past months. I addressed several of these aspects in in a recent blog post: Lessons from the Heroku/Amazon Outage.

When we move our operations to the cloud, we often assume that we’re out-sourcing our data center operation completely, including our disaster recovery procedures. The truth is that when we move to the cloud we’re only outsourcing the infrastructure, not our operations, and the responsibility of how to use this infrastructure remain ours.

Choosing better tradeoffs between producivity and control

For companies today, the main reason we chose to move to the cloud in the first place was to gain better agility and productivity. But in starting this cloud journey, we found that we had to give up some measure of control to achieve the agility and productivity.

The good news is that as the industry mature there are more choices that provides better tradeoffs between producivity and control:

  • Open source cloud such as OpenStack and CloudStack
  • Private cloud offering
  • DevOps and automation tools such as Chef and Puppet
  • OpenSource PaaS such as Cloudify, OpenShift and CloudFoundry
  • DevOps and PaaS combined such Cloudify

As businesses look at cloud strategy today, there isn’t a need to give up control over productivity. With technologies like Cloudify, businesses can get the best out of both worlds.


Bursting into the Clouds – Experimenting with Cloud Bursting

Guest Post by Dotan Horovits, Senior Solutions Architect at GigaSpaces

Dotan Horovits is one of the primary architects at GigaSpaces

Dotan Horovits is one of the primary architects at GigaSpaces

Who needs Cloud Bursting?

We see many organizations examining Cloud as replacement for their existing in-house IT. But we see interest in cloud even among organizations that have no plan of replacing their traditional data center.

One prominent use case is Cloud Bursting:

Cloud bursting is an application deployment model in which an application runs in a private cloud or data center and bursts into a public cloud when the demand for computing capacity spikes. The advantage of such a hybrid cloud deployment is that an organization only pays for extra compute resources when they are needed.
[Definition from SearchCloudComputing]

Cloud Bursting appears to be a prominent use case in cloud on-boarding projects. In a recent post, Nati Shalom summarizes nicely the economical rationale for cloud bursting and discusses theoretical approaches for architecture. In this post I’d like to examine the architectural challenges more closely and explore possible designs for Cloud Bursting.

Examining Cloud Bursting Architecture

Overflowing compute to the cloud is addressed by workload migration: when we need more compute power we just spin up more VMs in the cloud (the secondary site) and install instances of the application. The challenge in workload migration is around how to build a consistent environment in the secondary site as in the primary site, so the system can overflow transparently. This is usually addressed by DevOps tools such as ChefPuppetCFEngine and Cloudify, which capture the setup and are able to bootstrap the application stack on different environments. In my example I used Cloudify to provide consistent installation between EC2 and RackSpace clouds.

The Cloud Bursting problem becomes more interesting when data is concerned. In his post Nati mentions two approaches for handling data during cloud bursting:

1. The primary site approach – Use the private cloud as the primary data site, and then point all the burst activity to that site.
2. Federated site approach – This approach is similar to the way Content Distribution Networks (CDN) work today. With this approach we maintain a replica of the data available at each site and keep their replicas in sync.

The primary site approach incurs heavy penalty in latency, as each computation needs to make the round trip to the primary site to get the data for the computation. Such architecture is not applicable to online flows.

The federated site approach uses data synchronization to bring the data to the compute, which saves the above latency and enables online flows. But if we want to support “hot” bursting to the cloud, we have to replicate the data between the sites in an ongoing streaming fashion, so that the data is available on the cloud as soon as the peak occurs and we can spin up compute instances and immediately start to redirect load. Let’s see how it’s done.

Cloud Bursting – Examining the Federated Site Approach

Let’s put up our sleeves and start experimenting hands-on with the federated site approach for Cloud Bursting architecture. As reference application let’s take Spring’s PetClinic Sample Application and run it on an Apache Tomcat web container. The application will persist its data locally to a MySQL relational database.

The primary site, representing our private data center, will run the above stack and serve the PetClinic online service. The secondary site, representing the public cloud, will only have a MySQL database, and we will replicate data between the primary and secondary sites to keep data synchronized. As soon as the load on the primary site increases beyond a certain threshold, we will spin up a machine with an instance of Tomcat and the PetClinic application, and update the load balancer to offload some of the traffic to the secondary site.

On my experiment I used Amazon EC2 and RackSpace IaaS providers to simulate the two distinct environments of the primary and secondary sites, but any on-demand environments will do.


How do we replicate data between the MySQL database instances over WAN? On this experiment we’ll use the following pattern:

1.     Monitor data mutating SQL statements on source site. Turn on the MySQL query log, and write a listener (“Feeder”) to intercept data mutating SQL statements, then write them to GigaSpaces In-Memory Data Grid.

2.     Replicate data mutating SQL statements over WAN. I used GigaSpaces WAN Replication to replicate the SQL statements  between the data grids of the primary and secondary sites in a real-time and transactional manner.

3.     Execute data mutating SQL statements on target site. Write a listener (“Processor”) to intercept incoming SQL statements on the data grid and execute them on the local MySQL DB.













To support bi-directional data replication we simply deploy both the Feeder and the Processor on each site.


When peak load occurs, we need to react immediately, and perform a series of operations to activate the secondary site:

1.     spin up compute nodes (VMs)

2.     download and install Tomcat web server

3.     download and install the PetClinic application

4.     configure the load balancer with the new node

5.     when peak load is over – perform the reverse flow to tear down the secondary site

We need to automate this bootstrap process to support real-time response to peak-load events. How do we do this automation? I used GigaSpacesCloudify open-source product as the automation tool for setting up and for taking down the secondary site, utilizing the out-of-the-box connectors for EC2 and RackSpace. Cloudify also provides self-healing  in case of VM or process failure, and can later help in scaling the application (in case of clustered applications).

Implementation Details

The result of the above experimentation is available on GitHub. It contains:

§  DB scripts for setting up the logging, schema and demo data for the PetClinic application

§  PetClinic application (.war) file

§  WAN replication gateway module

§  Cloudify recipe for automating the PetClinic deployment

See the documentation on GitHub for detailed instructions on how to configure the above with your specific deployment details.


Cloud Bursting is a common use case for cloud on-boarding, which requires good architecture patterns. In this post I tried to suggest some patterns and experiment with a simple demo, sharing it with the community to get feedback and raise discussion on these cloud architectures.

More information can be seen at an upcoming GigaSpaces webinar on Transactional Cross-Site Data Replication on June 20th (register at: