Primary data storage options for the cloud
A comprehensive collection of articles, videos and more, hand-picked by our editors
Cloud-integrated storage appliances allow hybrid storage configurations that seamlessly link data center storage with cost-effective, scalable cloud storage.
The word "hybrid" has a variety of definitions when it's used to refer to cloud storage or computing. For our purposes, we'll define hybrid cloud storage as storage that transparently and effectively integrates on-premises storage and in-the-cloud storage to create a greater overall value. That means hybrid cloud storage must deliver increased value in one or more of these dimensions: cost reduction, scalability, manageability, performance, data protection, business continuity, degree of automation and security. And it must do so by integrating transparently with -- and without altering -- the existing storage on-premises infrastructure. Lastly, it must not require any changes to the applications.
While the benefits of cloud storage are well understood and appealing, adoption has been somewhat slower than expected because cloud storage stores and accesses data based on the Representational State Transfer (REST) protocol. Some early gateways translated iSCSI and CIFS protocols to REST protocols to make it possible to load and extract data to/from the cloud. It was a step in the right direction, but not enough to significantly boost the cloud storage market.
That changed when cloud-integrated storage (CIS) appliances appeared a few years ago. These appliances enable hybrid cloud storage, allowing true integration of on-premises and in-the-cloud storage with 100% transparency to existing storage environments. There are four use cases that illustrate what's possible with these state-of-the-art systems:
- Disaster recovery
- Tier-two primary storage (both NAS and SAN)
Anatomy of a cloud-integrated storage appliance
A CIS appliance presents an iSCSI, CIFS or NFS interface to on-premises hosts, which effectively removes the main issue associated with cloud storage: the use of the REST protocol. A typical appliance has a suite of technologies, including compute, caching, tiering, deduplication, compression, encryption, thin provisioning, WAN optimization, replication, data protection, protocol conversion, snapshots and cloning. Most appliances come with solid-state and/or hard disk drives built in. The back end of the appliance connects to the cloud via the Internet and speaks REST protocol.
When a write request arrives, it's written to cache and acknowledged; the data is then deduped at the block level, or it's combined with other contiguous blocks and converted into a chunk. Some appliances also apply compression algorithms. The chunks are then stored on local drives but also optimized for transmission across the WAN to the cloud. The appliance does the protocol conversion and only sends the chunks that are unique and not already stored in the cloud.
Snapshots can be taken periodically and, depending on the appliance, may be immediately transferred to the cloud, with or without retaining a local copy. Metadata maps containing pointers that describe the makeup of chunks are stored in the cloud (and in the appliance in some cases) along with the chunks. Some appliances only use their local storage for caching while others also use it for tiering. Cloud service support varies, but AT&T Synaptic, Amazon Web Services (AWS), Google, IBM SoftLayer, Microsoft Azure, Nirvanix and Rackspace are broadly supported.
Users don't need to know any details about how the data is stored in the cloud, how it's protected or how it's managed. The interaction with the appliance is exactly as if it were an iSCSI volume or a CIFS or NFS share. CIS products may include a physical or virtual appliance, with both often available.
Use case 1: Backup and restore
A cloud-integrated storage appliance can fundamentally transform backup and restore, and eliminate most headaches associated with traditional data protection. One implementation works in conjunction with the backup software currently in use, including all the major backup apps. Alternatively, a CIS appliance can eliminate the need for traditional backup software completely, which can significantly reduce costs and simplify the overall data protection environment.
CIS appliance with backup software. In this scenario the CIS appliance is seen by the media server as an iSCSI (or CIFS) target. All traditional backup management happens as usual, but the backup streams are deduped, compressed, WAN optimized and encrypted by the appliance before being sent to the cloud. Such an appliance can be very cost effective versus buying a backup appliance such as EMC/Data Domain, HP StoreOnce or IBM ProtecTIER. In addition to getting full data protection in the cloud, customers also get geographically remote storage for their data. That represents substantial savings over buying another set of backup appliances, locating them at a remote site and replicating to them. And since only unique chunks are sent to the cloud, storage costs are further minimized. Depending on the cloud-integrated storage product, a local copy of the latest backup may reside on the appliance, allowing for fast recoveries. Backup management is still performed using the backup software, and data must be recovered as usual before it can be made available on primary storage. Riverbed Whitewater, StorSimple (Microsoft) CiS and TwinStrata CloudArray are examples of appliances built for this use case.
CIS appliance without backup software. This functionality is enabled when the CIS appliance is used as primary storage (more details later). The CIS appliance may be used to improve data protection, with periodically scheduled snapshots integrated with cloud storage. A local copy of the snapshot may be kept in the appliance for faster recoveries. Hundreds of snapshots can be stored in the cloud since only changed data is kept, and the concept of full and incremental backups disappears. In this case, the recovery happens via the CIS user interface since no backup software is involved. Both file-level and volume-level recoveries are possible; and if the application server is running virtual machines (VMs), the recovery can be at the VM, volume or file level. Because the data is never reformatted by backup software, snapshots can be mounted immediately without having to recover first.
Three ways hybrid cloud storage can improve disaster recovery
- Recovery point objective (RPO) and recovery time objective (RTO) matter in disaster recovery (DR). RPO is controlled by the frequency of snapshots taken; most cloud-integrated storage (CIS) appliances support snapshotting. However, not all CIS appliances deliver the same RTO. For the fastest RTO, look for products where applications can be started without requiring all the data to be recovered first.
- The ability to start higher-priority applications first, without having to treat all application recovery alike, is important. Look for offerings that allow this.
- In a DR scenario, if an application is restarted at another site, make sure you start protecting it right away with new cloud snapshots. That way there won't be a lapse in protection.
Use case 2: Disaster recovery
Probably the most novel use case for a CIS appliance is disaster recovery (DR). A CIS appliance has the potential to deliver a cost-effective, testable "on-demand" DR offering for all applications. By uploading cloud-integrated snapshots into the cloud, when the local site is disabled a new CIS appliance can be fired up at another site and connected to the cloud-based snapshot. In the most sophisticated appliances, the application can be started without performing a recovery, and no data needs to be downloaded into the appliance. The recovery time objective (RTO) is determined by the amount of time it takes to download the metadata map that describes the contents of the snapshot -- a trivial process compared to downloading the entire volume of data. Some performance may be sacrificed, but the application will be up and running with a very short RTO.
A huge benefit is that the company doesn't require a second site for DR purposes. The StorSimple CiS appliance from Microsoft and TwinStrata's CloudArray are probably the best examples for this use case. Both of these appliances are also available as VMs so users can choose to fire up the application(s) in the cloud, assuming compute capability is offered by the cloud vendor, such as Microsoft Azure and AWS Elastic Compute Cloud (EC2).
This use case is also ideal for protecting data at remote offices. A CIS appliance installed at each remote office can transfer snapshots to the cloud; a disaster at a site can be handled by recovering at the data center or at one of the other remote sites. This flexibility enables simple, inexpensive DR for remote offices.
Use case 3: Archiving
The archiving use case is similar to backup/restore, except that cold data is literally separated from active data and moved from primary storage to the cloud. This relieves pressure on primary storage, improves application performance and delays new purchases, but all archived data is still available online when needed. Archive data may be stored for very long periods of time and must not lose its integrity. The burden of keeping the data safe and with integrity shifts to the cloud provider. The provider performs technology refreshes and migrates data to newer technologies when necessary -- all while maintaining 100% data integrity and without involving the customer.
When a CIS appliance is used as primary storage, the implications for archiving are even greater. Unlike traditional on-premises primary storage where cold data occupies expensive real estate, a cloud-integrated storage appliance only keeps active data in the appliance, with cold data getting moved to the cloud. All data is always available online, and IT doesn't have to worry about periodic housecleaning of primary storage.
Applications can be run against this archived data in the cloud for best performance. But not all the bells and whistles, such as the ability to place legal holds against specific data, perform audits for compliance and so on are currently available. Still, cloud-based archiving solutions can be very cost effective for most companies not in highly regulated industries.
Use case 4: Primary storage
The use of cloud storage as primary storage is the most challenging implementation because of latency issues. Most latency-sensitive applications can't deal with the tens of milliseconds (or more) of delay typically associated with accessing cloud storage. However, with a well-architected CIS appliance it's possible to enable excellent performance for all but the most latency-sensitive critical applications. For example, Microsoft's StorSimple appliance offers excellent performance for its SQL Server, Exchange and SharePoint applications. These applications require low latency, which is achieved by a judicious use of the appliance's storage for caching. This scenario provides scalable, on-demand storage for applications, effortless provisioning and a greatly reduced on-premises storage footprint. The bulk of the storage is delivered online from the cloud, yet it appears and behaves exactly like local storage. Data is also protected in the cloud by the cloud provider, further relieving IT of some data management tasks.
To be sure, using cloud storage for primary storage can't replace on-site tier-one storage, at least at this stage of hybrid cloud storage development. But tier-two storage generally makes up the largest portion of storage in most companies, and that tier is certainly a candidate to be moved to a hybrid cloud storage architecture. Not all CIS appliances do a good job at delivering reasonable performance for primary storage, however, so you must weigh your options carefully.
Product sampler: Hybrid cloud storage appliances
For iSCSI block storage StorSimple is worthy of consideration, and for file Nasuni has a solid offering. TwinStrata's CloudArray is primarily intended for use for backup, archiving and DR, but it can also be considered for primary storage use. Amazon offers a gateway product, but it's mostly designed to move on-premises storage to AWS Simple Storage Service. Nirvanix's CloudNAS is an appliance to move files back and forth between on-premises NAS and the Nirvanix cloud. Nirvanix also offers customers the ability to actually create a Nirvanix cloud on the premises using Nirvanix technology, and to integrate it with Nirvanix's public cloud for what might be considered a homogeneous hybrid cloud. It uses its hNode appliance as the interface between the two. A number of other variations will become available as the concept of hybrid cloud storage gains traction.
Hybrid cloud is ready for prime time
Hybrid cloud storage has finally matured to a point where organizations of all sizes can consider it for their data storage environments. Security, once a major issue with cloud storage, has been dealt with by most cloud-integrated storage appliances, since all data is transferred to and stored in the cloud in an encrypted fashion. In addition, all the usual authentication methods used with on-premises storage are now available with these appliances. The benefits of hybrid cloud storage are so overwhelming for the majority of organizations that it should definitely be evaluated when upgrading or augmenting a storage infrastructure.
About the author:
Arun Taneja is founder and president at Taneja Group, an analyst and consulting group focused on storage and storage-centric server technologies.