Before putting their data into public cloud storage, IT shops need to think about how they're going to get their data out.
Simple data access is typically straightforward, according to Jeff Byrne, senior analyst and consultant at Hopkinton, Mass.-based Taneja Group Inc. Most cloud storage providers support Web architectures based on representational state transfer (REST) application programming interfaces (APIs). Some also support traditional block- and file-based data, and cloud storage gateway providers can help customers access data in major storage clouds.
But, Byrne said, customers are often on their own if they want to transfer data from one cloud provider to another or bring their data back in-house. In this podcast with TechTarget senior writer Carol Sliwa, Byrne talks about the mechanisms of data access of the major cloud storage providers and some of the problems associated with data transfer.
If a business has stored data with a cloud provider and needs to gain access to it, how much work is involved? Generally speaking, how would you characterize the process in terms of time and ease?
Jeff Byrne: These days, it's generally pretty straightforward. The ease of access and time required really depend on the type and format of data that a user has and the service levels for that specific type of storage. For example, developers who are writing new applications for execution in the cloud will often utilize a cloud-based object storage service, such as Amazon S3. To access their data in S3, developers use a Web-services API based on REST principles, and these days, most cloud storage APIs, including those supported in Amazon S3 and in OpenStack's storage cloud, conform to REST. So, it's a fairly universally available API.
Continuing with our example on Amazon S3, developers access and manipulate their object data in their S3 buckets and can use the 'get' command to access a particular data object that's stored in S3. Basically, a developer who's familiar with these REST-conforming APIs can gain rapid access to their data, and the process is just about as easy as if that data were residing on hard disk storage, let's say, on the developer's laptop or in a data center.
That's sort of the quick access example. But if you're talking about data at the other end of the spectrum, such as archived data, the process can be much longer. Just for example, there's a new service that Amazon has -- it came out in the middle of last year -- called 'Glacier.' As the name suggests, access to data on Glacier can be very time-consuming, because it's really designed to be an archival storage service. The assumption is that user data will be accessed only very infrequently. So, in that case, it might take three to five hours to get your data out.
How much does the method of data access differ among the major cloud storage providers?
Byrne: The method of data access does vary among different providers, and it's really based on the APIs they support and the type of data that a user is storing. Again, if you're talking about the case of a developer that's using a cloud object store, such as in S3, it's a fairly straightforward approach, because those REST APIs are available on almost all cloud storage sites. But there are other protocols supported out there still. For example, Azure still supports SOAP [Simple Object Access Protocol]. Several sites still support WebDAV.
And it's not just these newer Web-based protocols that are available. There are also a number of providers that support more traditional block- or file-based data. Nirvanix Cloud Storage Network and IBM SmartCloud Enterprise, for example, support file-based APIs, [such as] NFS and CIFS. There are other vendors who support block-based access [methods], such as iSCSI. IBM and CleverSafe are examples of those.
One other category I should mention here is this category of cloud storage gateway vendors. A couple of examples are Ctera and StorSimple. StorSimple actually is now part of Microsoft. What these vendors do is provide on-premises appliances that enable access to file- or block-based data in major storage clouds. I really think that these vendors fill an important need for the majority of businesses that want to take advantage of the cloud, its low cost and universal access and so on, but can't afford themselves to rearchitect their applications and data to new cloud formats. What these gateways allow them to do, basically, is to move their existing data into a device where it's then translated into these REST-based formats in the cloud, and then they can gain access locally to that data from their own on-premises appliance.
If a business wants to make a change and transfer data from one cloud provider to another cloud provider, or bring the data back in-house, how tough is that?
Byrne: When users are looking to put data into a cloud site, there's a whole set of services out there. Almost every cloud storage provider provides some kind of initial data-seeding service or shuttle service or whatever to get the data in. But when you're looking at getting the data out, it's surprising, but users are still for the most part on their own. There are really several things that people need to watch out for. One big issue is that there's still a lack of interoperability standards in the cloud industry, which means that businesses that are looking to get their data out must be sure that their new provider, assuming they're moving to a new provider, supports the existing data formats that they're using.
There are tools out there, traditional file transfer tools [such as] FTP that users can use to move data, but these are not really geared well to moving cloud data. I think if you tried those tools, you'll find the process not only time-consuming, but very cumbersome, because you have to still write scripts with those types of tools; and then if you're doing it on your own, the data transfers themselves are going to be subject to the latency and lack of security that you find on the Internet.
So, if you're going to be moving significant amounts of data and you can't afford to lose access or compromise that data, you might want to look at third-party data migration solutions. There are a number of those solutions available. Some of them are open source. Some are commercial but are not too pricey. One example that is pretty prominent right now is a third party called 'CloudBerry Lab,' which makes available a tool called 'Cloud Migrator,' and this is supported on Amazon, Azure, Rackspace, etc. What a product like this will allow you to do is basically move your data directly between buckets, let's say, within a cloud site [such as] S3 or from one cloud provider to another. And, these services that the tools are based on tend to be both secure and automated, and they also don't require installation of software on a local server, for example. So, [they're] fairly nonintrusive. You don't have to move data initially to your data center before transferring it to a second provider's cloud.
One other alternative here to think about, again, is cloud storage gateway providers, because most gateways support a number of major cloud storage providers, and they have the tools in place to migrate data from one provider to another.