Feature Cloud service providers find themselves in a struggle balancing responsibility for maintaining data integrity with delivering cost effective solutions to their customers, all the while protecting their own data assets and bottom line.
Generally, the type of service they are delivering limits a provider’s responsibility level. In the case of infrastructure as a service (IaaS) a provider might just be delivering infrastructure and a means of creating cloud environments with no provision of customer data backup. As part of a platform as a service (PaaS) offering backup and other forms of data protection may well be key selling points.
Basic types of data loss include data destruction, data corruption and unauthorised data access. The reason for these types of loss a varied and include infrastructure malfunctions, software errors and security breaches. Due to the complexities around data centre and cloud security, this article will deal destruction and corruption of data only.
Definition of data loss domains
There exist many types of data within a cloud environment – infact too many to enumerate. These data types can be classified into general categories or data domains. The importance of these domains, to the constituents of the cloud environment, gives rise to the concept of data loss domains or, who is effected most and how much impact is there if the data is lost. The above diagram represents the three major data domains; provider non-customer effective (PNCE), provider customer effective (PCE) and customer (CUST) and in the case of the provider domain examples of the types of data. This section will define the domains and the data types.
Provider data non-customer effective (PNCE)
The data loss domain contains information that belongs to the cloud service provider and has no effect on the customer. This information if lost or damaged will have a significant impact on the provider and their ability to conduct business.
On the other hand, loss of this data has little to no effect on the customers. For example if billing information were lost and irretrievable the customer would probably not mind. The obvious responsibility for protecting this data lies with the provider. The following is a short list of examples of PNCE data:
- Business management data
– Billing and metering information
– Service quality data
– IT Benchmarking data
- Environment management data
– Development/DevOpS data
– Inventory and configuration management data
– Performance and capacity management data
- Security data
– Security systems management
– ID management, authentication and authorisation data
- Logging data
– System logs
– Network activity logs
– Security logs
Provider data customer-effective (PCE)
The domain represents that data which is owned by the provider and significant to the provider for business reasons (the provider needs to know how many VMs a customer has created) and significant to the customer as it defines their cloud deployment.
Both provider and customer will be impacted in the case of loss and responsibility – for protected the data is shared but primarily falls on the provider. For example, Virtual Machine configurations are the responsibility of the provider to protect but not if they are marked transient (the usual default state). If they are marked transient, then no protection is required. Some of the data types that fall into this domain are:
- Self-service portal data
– Environment default settings
- Virtual infrastructure configuration
– Virtual machine/compute configurations
– Virtual networking (SDN, vRouting, vSwitching, VLAN, VXLAN)
- Orchestration and Automation
– Provisioning and provisioning management
Customer data can take an infinite number of forms but constitutes the universe of data need to run customer developed and/or deployed services. The customer owns this data and is responsible for its protection unless otherwise arranged with the provider. A customer may choose to have a cloud service provider replicate, back-up or otherwise protect customer owned data based on an agreement with the provider. These services generally take the form of a financial and service-level agreement between the parties.
Just because the IT world now lives, or is moving to the cloud, doesn’t mean that the rules of data protection have changed. We still need to measure Recovery Point Objective (RPO) and Recovery Time Objective (RTO) the same way that we have in the past. We still need to implement data protection solutions based on the balance of RTO/RPO, the criticality of data and the cost of implementation. We still tend to implement multiple solutions or multiple tiers of solutions to suit the environment.
The difference is, as we have shown above, who owns the data and who is responsible for the protection of the data.
There are some general categories of data protection methods that can be used, and should be considered in an effort to prepare an environment for minimized data loss. They include:
- Disk level data protection – This can be the old, and still best practice, of RAID based protection of disk resources. Another, option is Scale-out storage (ex. EMC Isilon, Nutanix) which spreads data across multiple nodes of the data cluster.
- Backup/replicated backup – The periodic backing up data to a lower tier, lower cost medium. The advances in disk-based backup and replication technologies have lowered the cost, increased efficiency and raised the level of recoverability of backup and recovery solutions.
- Data replication – Data replication technology has existed in one form or another for a number of years. Data written to one set of storage resources is automatically replicated, via software, to secondary storage. The problem for most of replication technology’s history has been accessibility and granularity. Technologies such as SRDF have always been highly reliable but the ability to access the data from primary and secondary resources and the ability to retrieve any but the most recent information were not possible.
- Journaled/checkpoint based replication – Enter journaled files systems and checkpoint based replication. This type of technology (e.g. EMC Recoverpoint) allows not only read/write access to either side of a replicated storage set, but also the ability to recover data at a point in time.
Now that we understand the data and the means for protecting it we can now move on to the process for doing so. Two major steps are necessary for a cloud service provider to consider, classifying data and building a flexible DP environment.
The following template can be used to classify the data that needs protecting:
Once RTO/RPO and criticality characteristics of the data are understood, an intelligent decision about protection method and frequency of execution can be made. For example if a data set has a low RPO (transactional) then replication with frequent snapshots might be necessary. If a data set has a high RPO (low rate of change) then low frequency backup should suffice.
The following diagram shows an example environment including data protection elements:
The combination of clear classification of data and a comprehensive data protection solution, including tools, infrastructure, processes and procedures should be a major goal of the cloud service provider. Planning and execution around these concepts will allow the service provider to remain fiscally responsible while maintaining their data protection responsibilities to their customers.