Primary data storage options for the cloud
A comprehensive collection of articles, videos and more, hand-picked by our editors
When most IT professionals hear about cloud data storage solutions, they often associate storage in the cloud with disaster recovery (DR), archiving and backup. Most would never think to utilize primary data storage, or nearline data storage, in the cloud. The question is, why?
SearchCloudStorage assistant site editor Rachel Kossman sat down with Arun Taneja, founder and consulting analyst at Milford, Mass.-based Taneja Group, to discuss latency, which is the reason behind the majority of this hesitation. Taneja explains to listeners how evolving technologies -- such as inline data deduplication and compression -- have drastically improved the option to place primary data storage in the cloud. He also outlines the benefits of cloud storage gateways, and why they're such a key factor in cloud storage for primary data.
Download for later:
- Internet Explorer: Right Click > Save Target As
- Firefox: Right Click > Save Link As
When most storage professionals think about cloud storage, they think primarily of disaster recovery and backup, not primary storage. Why is that?
Arun Taneja: Let's look at what primary data storage is and how it's defined. Primary storage is storage that the application is directly interacting with. That means that whether the application is SharePoint, Exchange or a manufacturing application, that application is directly affecting that data. It's interacting with it, creating new data, modifying data, reading data -- it's a direct relationship. Most customers don't have enough confidence in the cloud at this point in time to put that type of data storage in the cloud. That's reason No. 1.
No. 2 is equally important, and that's anytime you put data in the cloud, unless the data center you're operating in happens to be right next door to where the cloud is created, there's going to be latency. There's going to be time that it takes for instructions to go from the data center to the cloud and then back. That time is way too much for most primary storage applications. So that's why we've seen most of the applications that have gone into the cloud so far are secondary applications and secondary data, like backup and archiving.
More on primary data storage in the cloud
Storing nearline or primary data in the cloud
Utilizing hybrid clouds for the storage of primary data
Medical device manufacturer uses cloud storage appliance for primary data
Cloud storage 2012: The year in review
Of course, cloud storage is a rapidly evolving technology. What technological advancements have been made that offer more options for nearline data to be placed in the cloud?
Taneja: I would say there are certain improved technologies on the storage side, and the way they’ve been tightly integrated with the cloud, and then certain technologies that have improved on the cloud side itself. Let's start with the ones on the storage side first.
Technologies like inline data deduplication, inline compression, making snapshots in the cloud itself and [wide-area network] WAN optimization technologies that have been mapped over to the cloud --these are all technologies that are very relevant from the storage side.
On the cloud side itself, there's a set of technologies that have become available. For example, making a number of cloud instances -- perhaps geographically separated -- look like one cloud from the application's perspective.
When you combine all these technologies I've mentioned together, they've made it possible to start using the cloud for primary data storage.
Listen to part two of the Taneja podcast
In part two of this podcast series, Arun Taneja talks about the use of document collaboration in the cloud, why it’s important to look under the hood of your cloud solution, and explains the service levels listeners can expect with near-line storage in the cloud.
Where do cloud gateways come into play in the discussion of placing primary data in the cloud? How do they compare to traditional, on-premise storage for primary data?
Taneja: A cloud storage gateway is absolutely a requirement if the customer wants to use the cloud as primary storage. A few years ago, a number of vendors tried to do that without a gateway. The application could make a direct NFS call, for example, across the WAN directly to the cloud, and using the cloud as primary, NFS storage. The latency itself just killed the application -- regardless of how many methods the vendor used in that situation, it wouldn't work. Placing primary data on the cloud without any sort of gateway is foolhardy, it just doesn’t work.
Now the question is: What does the cloud gateway do to enable cloud storage to look like primary storage to the application?
The best example in the industry right now is StorSimple. StorSimple has a gateway that fits on the edge of the data center. It looks like an iSCSI target to the application side and it speaks in a Web-type of language to the cloud side because that's all the cloud can speak. The cloud, in this case, can be Amazon, Nirvanix, Rackspace or any of the major public clouds that are on the market right now. This gateway has inline data deduplication, inline compression and snapshot capability where the snapshots can be taken and then placed into the cloud. This box also has a variety of caching technologies -- there are [solid-state drives] SSDs that are included and a number of disk drives. You have one or two of these gateway boxes that are effectively taking the data from the cloud, bringing that data that's needed for the application at a given point in time and keeping the most current data within these caches. This way, for all practical purposes, the application doesn't see the latency it would otherwise see if the storage was in the cloud.
Effectively, cloud storage gateways eliminate the effect of latency and make it look like the cloud storage is completely local. That's the magic of that type of a gateway, and it's a prerequisite for using primary data storage in the cloud.