Amazon Cloud Storage Options Enhanced with Glacier

In case you missed it, Amazon Web Services (AWS) has enhanced their cloud services (Elastic Cloud Compute or EC2) along with storage offerings. These include Relational Database Service (RDS), DynamoDB, Elastic Block Store (EBS), and Simple Storage Service (S3). Enhancements include new functionality along with availability or reliability in the wake of recent events (outages or service disruptions). Earlier this year AWS announced their Cloud Storage Gateway solution that you can read an analysis here. More recently AWS announced provisioned IOPS among other enhancements (see AWS whats new page here).

Amazon Web Services logo

Before announcing Glacier, options for Amazon storage services relied on general purpose S3, or EBS with other Amazon services. S3 has provided users the ability to select different availability zones (e.g. geographical regions where data is stored) along with level of reliability for different price points for their applications or services being offered.

Note that AWS S3 flexibility lends itself to individuals or organizations using it for various purposes. This ranges from storing backup or file sharing data to being used as a target for other cloud services. S3 pricing options vary depending on which availability zones you select as well as if standard or reduced redundancy. As its name implies, reduced redundancy trades lower availability recovery time objective (RTO) in exchange for lower cost per given amount of space capacity.

AWS has now announced a new class or tier of storage service called Glacier, which as its name implies moves very slow and capable of supporting large amounts of data. In other words, targeting inactive or seldom accessed data where emphasis is on ultra-low cost in exchange for a longer RTO. In exchange for an RTO that AWS is stating that it can be measured in hours, your monthly storage cost can be as low as 1 cent per GByte or about 12 cents per year per GByte plus any extra fees (See here).

Here is a note that I received from the Amazon Web Services (AWS) team:

Dear Amazon Web Services Customer,
We are excited to announce the immediate availability of Amazon Glacier – a secure, reliable and extremely low cost storage service designed for data archiving and backup. Amazon Glacier is designed for data that is infrequently accessed, yet still important to keep for future reference. Examples include digital media archives, financial and healthcare records, raw genomic sequence data, long-term database backups, and data that must be retained for regulatory compliance. With Amazon Glacier, customers can reliably and durably store large or small amounts of data for as little as $0.01/GB/month. As with all Amazon Web Services, you pay only for what you use, and there are no up-front expenses or long-term commitments.

Amazon Glacier is:

  • Low cost– Amazon Glacier is an extremely low-cost, pay-as-you-go storage service that can cost as little as $0.01 per gigabyte per month, irrespective of how much data you store.
  • Secure – Amazon Glacier supports secure transfer of your data over Secure Sockets Layer (SSL) and automatically stores data encrypted at rest using Advanced Encryption Standard (AES) 256, a secure symmetric-key encryption standard using 256-bit encryption keys.
  • Durable– Amazon Glacier is designed to give average annual durability of 99.999999999% for each item stored.
  • Flexible -Amazon Glacier scales to meet your growing and often unpredictable storage requirements. There is no limit to the amount of data you can store in the service.
  • Simple– Amazon Glacier allows you to offload the administrative burdens of operating and scaling archival storage to AWS, and makes long term data archiving especially simple. You no longer need to worry about capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time-consuming hardware migrations.
  • Designed for use with other Amazon Web Services – You can use AWS Import/Export to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport. In the coming months, Amazon Simple Storage Service (Amazon S3) plans to introduce an option that will allow you to seamlessly move data between Amazon S3 and Amazon Glacier using data lifecycle policies.

Amazon Glacier is currently available in the US-East (N. Virginia), US-West (N. California), US-West (Oregon), EU-West (Ireland), and Asia Pacific (Japan) Regions.

A few clicks in the AWS Management Console are all it takes to setup Amazon Glacier. You can learn more by visiting the Amazon Glacier detail page, reading Jeff Barrs blog post, or joining our September 19th webinar.
Sincerely,
The Amazon Web Services Team

StorageIO industry trend for storage IO

What is AWS Glacier?

Glacier is low-cost for lower performance (e.g. access time) storage suited to data applications including archiving, inactive or idle data that you are not in a hurry to retrieve. Pay as you go pricing that can be as low as $0.01 USD per GByte per month (and other optional fees may apply, see here) depending on availability zone. Availability zone or regions include US West coast (Oregon or Northern California), US East Coast (Northern Virginia), Europe (Ireland) and Asia (Tokyo).

Amazon Web Services logo

Now what is understood should have to be discussed, however just to be safe, pity the fool who complains about signing up for AWS Glacier due to its penny per month per GByte cost and it being too slow for their iTunes or videos as you know its going to happen. Likewise, you know that some creative vendor or their surrogate is going to try to show a miss-match of AWS Glacier vs. their faster service that caters to a different usage model; it is just a matter of time.

StorageIO industry trend for storage IO

Lets be clear, Glacier is designed for low-cost, high-capacity, slow access of infrequently accessed data such as an archive or other items. This means that you will be more than disappointed if you try to stream a video, or access a document or photo from Glacier as you would from S3 or EBS or any other cloud service. The reason being is that Glacier is designed with the premise of low-cost, high-capacity, high availability at the cost of slow access time or performance. How slow? AWS states that you may have to wait several hours to reach your data when needed, however that is the tradeoff. If you need faster access, pay more or find a different class and tier of storage service to meet that need, perhaps for those with the real need for speed, AWS SSD capabilities ;).

Here is a link to a good post over at Planforcloud.com comparing Glacier vs. S3, which is like comparing apples and oranges; however, it helps to put things into context.

Amazon Web Services logo

In terms of functionality, Glacier security includes secure socket layer (SSL), advanced encryption standard (AES) 256 (256-bit encryption keys) data at rest encryption along with AWS identify and access management (IAM) policies.

Persistent storage designed for 99.999999999% durability with data automatically placed in different facilities on multiple devices for redundancy when data is ingested or uploaded. Self-healing is accomplished with automatic background data integrity checks and repair.

Scale and flexibility are bound by the size of your budget or credit card spending limit along with what availability zones and other options you choose. Integration with other AWS services including Import/Export where you can ship large amounts of data to Amazon using different media and mediums. Note that AWS has also made a statement of direction (SOD) that S3 will be enhanced to seamless move data in and out of Glacier using data policies.

Part of stretching budgets for organizations of all size is to avoid treating all data and applications the same (key theme of data protection modernization). This means classifying and addressing how and where different applications and data are placed on various types of servers, storage along with revisiting modernizing data protection.

While the low-cost of Amazon Glacier is an attention getter, I am looking for more than just the lowest cost, which means I am also looking for reliability, security among other things to gain and keep confidence in my cloud storage services providers. As an example, a few years ago I switched from one cloud backup provider to another not based on cost, rather functionality and ability to leverage the service more extensively. In fact, I could switch back to the other provider and save money on the monthly bills; however I would end up paying more in lost time, productivity and other costs.

StorageIO industry trend for storage IO

What do I see as the barrier to AWS Glacier adoption?

Simple, getting vendors and other service providers to enhance their products or services to leverage the new AWS Glacier storage category. This means backup/restore, BC and DR vendors ranging from Amazon (e.g. releasing S3 to Glacier automated policy based migration), Commvault, Dell (via their acquisitions of Appassure and Quest), EMC (Avamar, Networker and other tools), HP, IBM/Tivoli, Jungledisk/Rackspace, NetApp, Symantec and others, not to mention cloud gateway providers will need to add support for this new capabilities, along with those from other providers.

As an Amazon EC2 and S3 customer, it is great to see Amazon continue to expand their cloud compute, storage, networking and application service offerings. I look forward to actually trying out Amazon Glacier for storing encrypted archive or inactive data to compliment what I am doing. Since I am not using the Amazon Cloud Storage Gateway, I am looking into how I can use Rackspace Jungledisk to manage an Amazon Glacier repository similar to how it manages my S3 stores.

Some more related reading:
Only you can prevent cloud data loss
Data protection modernization, more than swapping out media
Amazon Web Services (AWS) and the NetFlix Fix?
AWS (Amazon) storage gateway, first, second and third impressions

As of now, it looks like I will have to wait for either Jungledisk adds native support as they do today for managing my S3 storage pool today, or, the automated policy based movement between S3 and Glacier is transparently enabled.

Ok, nuff said for now

Cheers Gs

Greg Schulz – Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO All Rights Reserved

read more

Examining Excellent Eucalyptus

Eucalyptus is an open source Infrastructure as a Service cloud offering. What is unique about Eucalyptus is that it is compatible with Amazon AWS APIs.
Eucalyptus leverages operating system virtualization, such as KVM or XEN, to achieve isolation between applications and stacks. Operating system virtualization dedicates CPU and RAM to systems and applications such that they don’t interfere with each other. In cloud parlance, this is called isolation and is essential to achieve multi-tenancy. (For a refresher on basic cloud terminology, see here; for a refresher on Infrastructure as a Service, see here).
Cloud computing layers on top of operating system virtualization and when combined with dynamic allocation of IP addresses, storage and firewall rules creates a service that end users interact with to run instances of images

read more

Rackspace To Open Australian Data Center

Rackspace is close to launching its first Australian data center in Sydney, a multimillion-dollar investment. First customers are expected to go live in late 2012.

Rackspace will be able to offer local dedicated hosting and managed virtualization solutions to larger IT contracts looking to deploy enterprise-grade private cloud solutions based on VMWare that want to keep their data onshore.

The facility is also supposed to serve as a launch pad for Rackspace’s own OpenStack-based Open Cloud platform, when it launches into the local market.

read more

Apache Hadoop Just Got Simpler

Hortonworks recently unveiled the Hortonworks Data Platform (HDP), which is 100% open source data management software powered by Apache Hadoop. HDP makes Hadoop easier to install, integrate, manage and use for enterprises and solution providers.
Join us for this webinar as we outline and demo the key features of the Hortonworks Data Platform, including:
Rapid Installation thanks to a wizard that makes it easy to install and provision Hadoop across clusters of machines.
Data Integration Services including Talend Open Studio for Big Data, a visual development environment that allows you to connect to hundreds of data sources without writing code.
Management and Monitoring Services including Hortonworks Management Center, which is an open source and extensible tool that provides intuitive web-based dashboards for monitoring your clusters and creating alerts.
Centralized Metadata Services, including HCatalog, which greatly simplifies data sharing between Hadoop and other enterprise data systems.
Don’t miss this opportunity to hear about Hortonworks Data Platform from the team that created it.

read more

Continuing Momentum, Neebula Adds Senior Cloud Leaders to Management Team

Neebula is scaling to serve its growing clientele of global enterprises, government and education customers in North America and Europe with the addition of senior leaders to its management team.
Neebula Systems, a provider of business-level service modeling, management, and automated full-stack discovery and dependency mapping solutions,has announced the addition of two senior executives – Bob Johnson, as chief marketing officer, and Ilan Shmargad, as vice president of business development.

read more

Neebula Promotes Service-Centric IT Management to the Cloud

The process of mapping IT computing resources to business services – commonly known as “business service management” (BSM) or “IT service management” (ITSM) – is time-consuming and becomes even trickier when cloud computing gets added into the mix. Neebula is making available a preview of its SaaS-based discovery and mapping product, ServiceWatch.
Neebula Systems, a provider of business-level service modeling, management, and automated full-stack discovery and dependency mapping solutions, invites customers to preview the Neebula ServiceWatch solution in the cloud. For the first time, IT managers will be able to use a Software-as-a-Service (SaaS)-based product to quickly and effectively discover and map IT resources – hardware and software – that make up a specific business service. This eliminates the long, labor-intensive process of installing on-premise software and then manually discovering and mapping IT resources.

read more

How to lead the way for new data center technology

By Patrick Burke

The networking layer of the data center may be the next segment of IT to undergo some disruption, putting it on par with servers and storage, which have seen major changes with the help of cloud computing, virtualisation and other trends designed to improve efficiency and performance.

Software-defined networking, or SDN, has been around for several years now and is utilized by such big-name players as Rackspace.

But the technology is poised to gain more of a foothold in the data center. SDN offers clients more flexibility and less down time if they need to expand from their current server usage.

For the most part, networking has not evolved at the same pace as servers and storage, and networking has become somewhat of a costly bottleneck. SDN’s goal is to take tasks currently handled by hardware and perform these tasks in the software.

The intelligence of …

The cloud news categorized.