Overcoming the data integration challenge in hybrid and cloud-based environments

Vivo, the Brazilian subsidiary of Spanish telco Telefónica deployed TOA Technologies' cloud-based field service management softawre

Industry experts estimate that data volumes are doubling in size every two years. Managing all of this is a challenge for any enterprise, but it’s not just the volume of data as much as the variety of data that presents a problems. With SaaS and on-premises applications, machine data, and mobile apps all proliferating, we are seeing the rise of an increasingly complicated value-chain ecosystem. IT leaders need to incorporate a portfolio-based approach and combine cloud and on-premises deployment models to sustain competitive advantage. Improving the scale and flexibility of data integration across both environments to deliver a hybrid offering is necessary to provide the right data to the right people at the right time.

The evolution of hybrid integration approaches creates requirements and opportunities for converging application and data integration. The definition of hybrid integration will continue to evolve, but its current trajectory is clearly headed to the cloud.

According to IDC, cloud IT infrastructure spending will grow at a compound annual growth rate (CAGR) of 15.6 percent each year between now and 2019 at which point it will reach $54.6 billion.  In line with this, customers need to advance their hybrid integration strategy to best leverage the cloud. At Talend, we have identified five phases of integration, starting from the oldest and most mature right through to the most bleeding edge and disruptive. Here we take a brief look at each and show how businesses can optimise the approach as they move from one step to the next.

Phase 1: Replicating SaaS Apps to On-Premise Databases

The first stage in developing a hybrid integration platform is to replicate SaaS applications to on-premises databases. Companies in this stage typically either need analytics on some of the business-critical information contained in their SaaS apps, or they are sending SaaS data to a staging database so that it can be picked up by other on-premise apps.

In order to increase the scalability of existing infrastructure, it’s best to move to a cloud-based data warehouse service within AWS, Azure, or Google Cloud. The scalability of these cloud-based services means organisations don’t need to spend cycles refining and tuning the databases. Additionally, they get all the benefits of utility-based pricing. However, with the myriad of SaaS apps today generating even more data, they may also need to adopt a cloud analytics solution as part of their hybrid integration strategy.

Phase 2: Integrating SaaS Apps directly with on-premises apps

Each line of business has their preferred SaaS app of choice: Sales departments have Salesforce, marketing has Marketo, HR has Workday, and Finance has NetSuite. However, these SaaS apps still need to connect to a back-office ERP on-premises system.

Due to the complexity of back-office systems, there isn’t yet a widespread SaaS solution that can serve as a replacement for ERP systems such as SAP R/3 and Oracle EBS. Businesses would be best advised not to try to integrate with every single object and table in these back-office systems – but rather to accomplish a few use cases really well so that their business can continue running, while also benefiting from the agility of cloud.

Phase 3: Hybrid Data Warehousing with the Cloud

Databases or data warehouses on a cloud platform are geared toward supporting data warehouse workloads; low-cost, rapid proof-of-value and ongoing data warehouse solutions. As the volume and variety of data increases, enterprises need to have a strategy to move their data from on-premises warehouses to newer, Big Data-friendly cloud resources.

While they take time to decide which Big Data protocols best serve their needs, they can start by trying to create a Data Lake in the cloud with a cloud-based service such as Amazon Web Services (AWS) S3 or Microsoft Azure Blobs. These lakes can relieve cost pressures imposed by on-premise relational databases and act as a “demo area”, enabling businesses to process information using their Big Data protocol of choice and then transfer into a cloud-based data warehouse. Once enterprise data is held there, the business can enable self-service with Data Preparation tools, capable of organising and cleansing the data prior to analysis in the cloud.

Phase 4: Real-time Analytics with Streaming Data

Businesses today need insight at their fingertips in real-time. In order to prosper from the benefits of real-time analytics, they need an infrastructure to support it. These infrastructure needs may change depending on use case—whether it be to support weblogs, clickstream data, sensor data or database logs.

As big data analytics and ‘Internet of Things’ (IoT) data processing moves to the cloud, companies require fast, scalable, elastic and secure platforms to transform that data into real-time insight. The combination of Talend Integration Cloud and AWS enables customers to easily integrate, cleanse, analyse, and manage batch and streaming data in the Cloud.

Phase 5: Machine Learning for Optimized App Experiences

In the future, every experience will be delivered as an app through mobile devices. In providing the ability to discover patterns buried within data, machine learning has the potential to make applications more powerful and more responsive. Well-tuned algorithms allow value to be extracted from disparate data sources without the limits of human thinking and analysis. For developers, machine learning offers the promise of applying business critical analytics to any application in order to accomplish everything from improving customer experience to serving up hyper-personalised content.

To make this happen, developers need to:

  • Be “all-in” with the use of Big Data technologies and the latest streaming big data protocols
  • Have large enough data sets for the machine algorithm to recognize patterns
  • Create segment-specific datasets using machine-learning algorithms
  • Ensure that their mobile apps have properly-built APIs to draw upon those datasets and provide the end user with whatever information they are looking for in the correct context

Making it Happen with iPaaS

In order for companies to reach this level of ‘application nirvana’, they will need to have first achieved or implemented each of the four previous phases of hybrid application integration.

That’s where we see a key role for integration platform-as-a-service (iPaaS), which is defined by analysts at  Gartner as ‘a suite of cloud services enabling development, execution and governance of integration flows connecting any combination of on premises and cloud-based processes, services, applications and data within individual or across multiple organisations.’

The right iPaaS solution can help businesses achieve the necessary integration, and even bring in native Spark processing capabilities to drive real-time analytics, enabling them to move through the phases outlined above and ultimately successfully complete stage five.

Written by Ashwin Viswanath, Head of Product Marketing at Talend