Six key data strategy considerations for your cloud-native transformation

Many organizations are making the move to cloud-native platforms as their strategy for digital transformation. Cloud native allows companies to deliver fast-responding, user-friendly applications with greater agility. However, the architecture of the data in support of cloud-native transformation is often ignored in the hope that it will take care of itself.

With data becoming the information currency of every organization, how do you avoid the data mistakes commonly made during this cloud transformation journey? What data questions should you ask when building cloud-native applications? How can you gain valuable insight from your data?

The ensuing discussion includes six key considerations companies must have when they make this transition to cloud native.

Farewell, service oriented architecture (SOA) – welcome, microservices!

While there are many legacy applications that are still SOA based, the architectural mindset has changed and microservices have gained much popularity. Rather than architecting monolithic applications, developers can achieve many benefits by creating many independent ‘services’ that work together in concert. A microservice architecture delivers greater agility in application development and simpler codebases; updates and scaling the services can be achieved in isolation and services can be written in different languages and connected to different data tiers and platforms of choice. This strategy allows developers and operators to work together in a much more harmonious way. Such componentized architecture demands a database platform that can support the different data types and structures and programming languages with ease.

12-factor app and cloud-native microservices

The Twelve-Factor App is a set of rules and guidelines for helping organizations build cloud native applications. It serves as an excellent starting point, but when it comes to data platforms, a couple of factors (#4 and #5) need further examination.

#4 – Treat backing services as attached resources: Backing services here refer to databases and the datastores for the most part. This means that microservices demand dedicated single ownership of schema and the underlying datastore.

#5 – Strictly separate build and run stages: Separate build and run stages means the application should be executed as one more stateless processes, and the state is often offloaded onto the backing service. This further implies that the databases and the datastores are expected to be stateful services.

Continuous integration/continuous delivery

The proliferation of service processes, where each service is deployable independently, requires automated mechanisms for deployment and rollback – referred to as continuous integration or continuous delivery (CI/CD). In reality, the value of microservices cannot be fully realized without a mature CI/CD capability to go along with it. Note that such a transient architecture means the database instances will also be ephemeral and they must also be able to easily spin up and spin down on demand.

With the help of the correct cloud native platform and supporting data platform, microservices become easily deployable. The cloud native platform should handle the management of the services running on it and your database should handle the data scaling and monitoring, adding of shards, rebalancing, re-sharding, or failover in the necessary event. The combined database and cloud native solution offloads the operational burden of monitoring the database and the platform, allowing companies to spend more time developing and deploying quality software.

The importance of a multi-cloud deployment model

Enterprises today are adopting a multi-cloud strategy for multiple reasons; to prepare for disaster recovery situations, to take advantage of the financial differences between hosting applications in different cloud infrastructures, for enhanced security, or simply to avoid vendor lock-in. (Who’s not concerned of the powerful behemoth organizations taking over of the world?).

Your application code should be independent of the platform it’s expected to run on.

Monoliths versus non-monoliths

Traditional approaches to data access and data movement are time prohibitive. The legacy approaches involved creating replicas of the data in the primary datastore in other operational datastores and data warehouses/data lakes, where data is updated after many hours or days, typically in batches. As organizations adopt microservices and design patterns, such delays in data movement across different types of datastores impedes agility and prevents organizations from forging ahead with their business plans.

Incrementally migrating a monolithic application to the microservices architecture typically occurs with the adoption of the strangler pattern, gradually replacing specific pieces of functionality with new applications and services. This means that the associated datastores also need to be compartmentalized and componentized, further implying that each microservice can have its own associated datastore/database.

From the data perspective this means:

  • The number of database instances increases with each microservice – again pointing back to spinning up/down on demand
  • For these microservices to communicate with each other, additional HTTP calls, over something like a convenient-to-use REST API, are needed – demanding flexible extensibility across any platform and language. In many cases microservices simply publish events indicating changes, and listeners/subscribers update the associated applications.

The fundamental requirements of a cloud-native database

High Performance: Back in the day, sub-millisecond response times were reserved for a few specialty applications. But, in today’s world of the microservices architecture, this is a must-have requirement for all applications. This latency requirement necessitates the highest-performance, most scalable database solution available.

Active-Active Data Replication: Data replication in batch mode used to be a popular approach. But for real-time applications, replication with event store and event sourcing are getting a lot more traction. In microservices apps, that are loosely coupled and need to share data, there is a need for active/active data replication with tunable consistency. Many customers employ active/active deployment models for many reasons such as:

  • Shared datasets among microservices that are being continually updated
  • Seamless migration of data across datacenters so user experience is not impacted
  • Mitigating failures scenarios and failover to a second datacenter to minimize downtime
  • Handling high volume of incoming traffic and distributing load across multiple servers with seamless syncs and
  • Geographically distributed applications (like a multiplayer game or a real-time bidding/polling application) where data needs to be in sync across geos

High Availability of Data: When you break a big monolith application to microservices with each having its own lifecycle, how do you ensure data availability? The cloud native app developer should choose the datastore based on the Recovery Point Objective (how much data will I lose), Recovery Time Objective (when an event failure occurs, how long will it take for the service to come back), high availability characteristics, installation topology and failover strategy. Single node database instances affect not just failure scenarios but client-downtime events, such as version upgrading, impacting availability.

High availability requirements are typically dependent on the criticality of applications, but the combination of the right database and cloud native solution supports various HA installation strategies for a range of use cases, from internal to mission-critical applications.