Melting the big data avalanche through copy data virtualisation

(c)iStock.com/baranozdemir

The volume of data within companies is growing day by day due to new data – most of which comes from the uncontrolled proliferation of data copies. This avalanche of data is a great challenge for businesses having to manage it efficiently and securely. So, where does this copy data come from?

Copy data is redundantly generated copies of production data for purposes such as backup, disaster recovery, test and development, analytics, snapshots or migrations. According to IDC, companies might face up to 120 copies of certain production data in circulation.

In addition, IDC estimated in a 2013 study that companies spend up to $44 billion worldwide managing redundant copies of of data. According to IDC, 85% of the storage hardware investments, and 65% in storage software, are owed to data copies. Their management in the company is now taking more resources than the actual production data. Therefore, IT departments are faced with the question of how to control data growth caused by redundant copies through cost-effective data management. This applies both to companies that hold data in their own house and data centre operators.

Ban the data flood with copy data virtualisation

The virtualisation of data copies has proven to be an effective measure to take data management to the next level. By further integration of global data de-duplication and optimisation of the network utilisation, very efficient data handling is possible. Since less bandwidth and storage is required, very short recovery times can be achieved.

A possible principle is the use of a so-called “Virtual Data Pipeline”. It is a distributed object file system in which the fundamentals of data management – copying, storing, moving and restoring – are virtualised. In this way, virtual copies can be time-specific data from the collection of unique data blocks at any time. If the data must be restored, the underlying virtual object file is then extracted and analysed on a user-defined recovery point in any application format. Since the recovered data can be mounted directly on a server; no data movement is required at all, which contributes to extremely fast recovery times. The recovered data is immediately available.

More efficiency in data handling

The Virtual Data Pipeline technology is used to collect, manage and provide data as efficiently and effectively as possible. After creating and storing a single complete snapshot, only changed blocks of the application data are captured using Change Block Tracking with an incremental-forever principle. Data is collected at the block level, as this is the most efficient way to track and transfer changes. Since data will always be used in its native format, it is beneficial to store it in its native format. This way there is no need to create or restore data from backup files. The data can be both managed and accessed more efficiently.

Another angle of Copy Data Virtualisation is the possibility to capture data on the basis of SLAs that are set by the administrator. These include the frequency of the snapshots, the type of memory in which to store the data, the location and the retention policy. Also, replication to a remote location or a cloud service provider can be defined. Once an SLA is created, this can be connected with any application or virtual machine to capture the data accordingly.

The prerequisite for the generation of virtual copies is the creation of a single physical “golden” image or a “master copy” of the production data. From this an indefinite number of virtual copies can be made available instantly for all day-to-day use cases such as backup, test and development, and analytics, without affecting the production environment. The “golden copy” can also be mirrored to an outsourced location for disaster recovery.

Data virtualisation on the rise

An increasing level of virtualisation in the data centre is clearly noticeable. Data virtualisation represents the next consequent step after server, compute and network virtualisation. Virtualised infrastructures are easier to manage, more energy and cost-effective, because the model is demand-oriented compared to traditional environments. It matches today’s reality where an increasing amount of challenges in the data centre need to be managed with fewer resources.

The proven efficiency gains from server, client and network virtualisation can now be extended to data protection and management. This comes with less bandwidth requirements and instant restore possibilities. With a recent expansion for VMware and Oracle integration, Actifio leads this data virtualisation trend. The platform accelerates data management, reduces the complexity in data centres and distributed environments as well as enables access to the cloud.

By combining virtualisation with smart data management, companies can benefit from greater efficiency, flexibility and performance – and save money.