All posts by arielmaislos

How to reduce write amplification on NAND chips in a cloud storage system

(c)iStock.com/BsWei

Solid-state disk (SSD) technology using integrated circuit assemblies is considered to be a readily available, drop-in replacement for discrete hard disks and is used to increase throughput in large cloud storage networks. But unfortunately, these devices introduce the problem of write amplification (WA) into a cloud storage system.

Reducing this occurrence is of paramount concern to most system administrators who use NAND chips in these situations because it reduces the lifespan of these chips and harms the throughput rate of a cloud storage and retrieval paradigm.

Calculating write amplification values

All forms of NAND memory chips used in SSD construction have to be erased before they can be written to again. This means that moving, rewriting and storing data to pages on SSDs will cause used portions of the chips to go through the process more than once for each file system operation. Consider a drive that has several logical blocks filled with data and several other continuous blocks that are empty. An operator using a Web-interface document manager adds data to a spreadsheet, and then saves it from his/her client machine. This necessitates the logical block that holds it to be erased and then rewritten.

If the file now takes up more space than a single block provides, other used blocks will have to be erased and rewritten until the file system achieves stability. Added to this wear and tear is the leveling and garbage collection necessary on NAND devices. Write amplification can be seriously problematic. Fortunately, there is a way to calculate the amount of WA that will be necessary to store data from cloud clients: total wear amplification = (amount of data written to NAND chips / data the host writes)

Overprovisioning and logical capacity

The difference between the actual physical capacity of an SSD and the logical capacity that an operating system sees is referred to as “over-provisioning” (OP). Additional space provided by an OP design helps the controller chip handle wear-leveling and garbage collection tasks, and provide additional blocks after some go bad

The most obvious and superfluous source of this comes from system software that uses a binary Gigabyte (1,024 MB = 1 GB), while hardware manufacturers utilise the metric Gigabyte (1,000 MB = 1 GB). This figure usually isn’t relevant when considering the problem of write amplification, but a second OP source comes from hardware vendors that purposefully add unused NAND chips somewhere in an SSD. If these values are known, OP values can be calculated easily: OP = (true capacity – visible capacity) / total user capacity

Cooperative data management policies

Several different new cooperative data management policies can be installed across a cloud storage network to cut down on WA cycles. Eliminating redundant cloud storage system caches is the first step. Currently, file systems often keep additional caches with the hopes of speeding the system up, but this only increases speed when working with mechanical drives. Ultimately, this increases granularity by orders of magnitude on SSD devices.

These policies also create victim blocks, which aren’t written to and are instead skipped, because they’re marked as “in use” in the cache. Utilising file systems designed to work with NAND chips changes these policies, which also help to reuse these victim blocks. Most importantly, the cache system should be flushed out, ensuring that extra data isn’t written to the drive.

Leveraging page metadata to reduce file system overhead

Traditional file systems that rely on an allocation table, journal or master record protect all file metadata to ensure durability (even in the event of system failures). This requires numerous writes to these tables, and while it provides an extra layer of safety, new file systems designed to work with solid-state memory are starting to replace these models.

Metadata can be leveraged to guarantee that indices can be restored if anything unusual happens, as opposed to ensuring that the metadata is always durable. By using a consistent approach instead of looking at durability to solve the problem, far less metadata has to be written to the drive to begin with. This dramatically reduces the influence of WA cycles.

Write amplification must be reduced

System administrators need to reduce write amplification, as it adds overhead and reduces the life of chips. As new file systems designed with NAND technology mature, compliance from the industry is likely to rise as the advantages become clearer.

Hypervisors in cloud computing: What is out there for you?

(c)iStock.com/Avalon_Studio

Choosing a cloud provider may seem like a trivial thing. You could go with the choice that most of the world has already made and pick the clear leader in the cloud space — AWS. However, as the public cloud market matures, the volume and range of relevant enterprise options expand. Depending on your requirements, you might find one of the other cloud providers out there more suitable, mainly because of the hypervisor that enterprises use. For example, for VMware vSphere users, vCloud might be the “natural” public cloud choice.

In this article, we discuss several public cloud vendors and consider their underlying infrastructure and the hypervisors that they use to operate their clouds.

Public cloud providers

There are several key players in today’s cloud market, each offering different advantages.

Amazon Web Services (AWS): Amazon not only has an online store where you can buy and sell things, it is also the biggest public cloud provider in the world. It is safe to say that Amazon was the first to provide an option to run workloads in the cloud on a large-scale.

Today, AWS is clearly perceived as the market leader. Of all the cloud providers, AWS offers the richest set of features for use in the cloud. It also has the biggest share in the Cloud Infrastructure as a Service (IaaS) market, and has been highlighted as such in the past six years in the de-facto “State of the Union” from Gartner.

AWS has built up its proprietary platform over the years, and its business model is based entirely on the expectation and assumption that everything can and will run in the public cloud. AWS has no on premise solution.

Microsoft Azure: Microsoft Azure has been around since 2008 (and underwent a name change in 2010). Microsoft is well known for its stronghold in the enterprise market, particularly for desktop software as well as enterprise software such as Exchange and SQL. A few years ago, Microsoft plunged head first into virtualization.

As a result of their sheer size and market share, Azure is perceived as a market leader – along with AWS. The option to mix and match your workloads in your data center and in the cloud has been a real temptation to many of the enterprises. Microsoft recently announced Azure Stack, which is Microsoft’s Azure cloud deployed within your organization’s data center. Under the hood, the stack includes Microsoft’s Hyper-V, Windows, and networking and storage capabilities. Ultimately this creates a very compelling hybrid solution for enterprises.

Google Cloud Platform (GCP): We all know that Google runs the biggest search engine in the world. Google also decided to get into the business of cloud computing in June 2012 and has been a contender in this market ever since. GCP does not yet offer the vast amount of services available from AWS, although they are continuously adding more services to compete. It does, however, have a number of differentiators that allow it to stand out from the competition. Shorter billing cycles, for example, afford customers the option for per minute pricing instead of per hour (in AWS).

Unlike AWS, which runs on Xen hypervisor, Google Cloud runs on KVM. But like AWS, Google Cloud also has a proprietary platform, and is relying on the fact that anything and everything will run in the public cloud, and therefore does not offer an on premise solution.

Rackspace: Rackspace is extremely well known in the hosted infrastructure space. The company takes pride in its fanatical support and first dipped its toes into the public cloud market almost 10 years ago.

Rackspace cloud has added a significant amount of customizations to its platform that will never become part of the community. The customizations include a networking stack and load balancing service and the whole billing and UI aspect – which is not vanilla.

Rackspace offers a private cloud solution as well as a supported and managed service – and this is where they differ from the two cloud providers above. Rackspace will set up a cloud for you, then support and manage it as well.

VMware vCloud: VMware has been the clear visionary and leader in the enterprise virtualization space for many years, although their attempts at becoming a Cloud market leader never really took off. There are a number of possible explanations as to why this happened, be it licensing, pricing or the fact that VMware was very late getting into the game.

VMware originally did not offer a public cloud service of its own. It sold a private cloud product and offered public cloud services through partners and the ecosystem. That changed a few years ago when VMware tried to break into the public cloud market with vCloud Air, which has never been deemed a success.

Hypervisor of choice

The underlying hypervisor capabilities depend on your choice of cloud provider. These are the main hypervisors of choice in use today:

KVM: In 2008, Red Hat acquired Qumranet (the creators of KVM) and since then has put its full support and effort into developing KVM. It is important to note that KVM is an open source project, meaning that there is no licensing involved.

KVM runs on most Linux distributions today and is perceived as the default hypervisor to be used in all virtualization and cloud products offered by most Linux vendors. The open source hypervisor is also the default hypervisor used for most clouds today, probably making it one of the most widely used hypervisors in the world.

Xen: As an open source hypervisor, Xen has undergone a journey, starting with University of Cambridge, then over to Xensource, then acquired by Citrix, and finally to its current place of residence – the Linux Foundation.

AWS is the biggest cloud provider that uses Xen today, where it is the hypervisor of choice. Xen offers a number of advantages over KVM such as the efficiency of paravirtualization, which exceeds what is available in KVM due to the closer access Xen has to the physical hardware, and the fact that it is a more mature product. Xen is not actually part of the Linux Operating system, whereas KVM is part of the Linux kernel.

Hyper-V: Hyper-V is a Microsoft product and, as such, it does not come free. Yes, there are free versions available; however these have many built-in limitations and management at any scale becomes impossible without actually putting out some hard cash.

Hyper-V and Microsoft have always feuded with VMware. Over the past few years, they have managed to chisel away at VMware’s market share in the enterprise by providing a product that does most of what vSphere can do and at a more attractive price. It would be a natural choice if your workloads are Microsoft based, even though Microsoft is looking to support any and all Linux flavours in the future.

ESXi: The feature-rich hypervisor that many enterprises use is ESXi (vSphere). Of course, this is not a free product — VMware has built their whole company on their hypervisor and for many years this has been a great strategy.

It supports any operating system, be it Linux or Windows, with almost any kind of esoteric flavor that you could imagine covered by ESXi. But this is first and foremost an enterprise solution – one that might not be cost effective for everyone – especially if you are just starting out.

One of the biggest advantages offered when running a cloud platform on an Enterprise solution is the added benefits that come with this solution. Two examples (and for transparency, this is also true for Microsoft solutions) are Host restart and instance scheduling. In addition, VMware has built-in HA; in case a host fails, all instances are restarted on another host in the cluster – and your cloud solution does not need to manage or worry about these pieces.

Another example is DRS (Distributed Resource Scheduler), which is responsible for moving instances around between hypervisor nodes in the cluster. Again, this is something that is taken care of in the Hypervisor layer. As opposed to other cloud providers, this happens on a regular basis and not only upon instantiation.

Docker and a final note

It would be remiss to not mention the new kid (or container) on the block – Docker. Docker can run inside an instance on any of the hypervisors above. It is actually another abstraction layer on top of the hypervisor, which allows you to treat the hypervisor and in turn the cloud where you are running it as a commodity. Docker enables you to move between them in a much easier way than previously possible.

Even though your choice of cloud provider will be based on a number of different criteria, such as maturity, feature set or geographical location, there are cases where the underlying hypervisor capabilities will also be a contributing factor in your decision. Understanding the different features that each hypervisor can provide, its history and its share in the market will help you to make the best decision on which cloud to run your workloads.