Primary data storage options for the cloud
A comprehensive collection of articles, videos and more, hand-picked by our editors
Moving data to the cloud is cheaper, can be expanded endlessly and needs little attention; but how much data can a company realistically park in cloud storage?
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
It finally happened. The CIO just got back from his favorite analyst firm's annual cloud conference and golf tournament and promptly decreed the company was going to move all its data, not just the archives, to public cloud storage.
Apparently, he was sold on an analyst's view that the cloud was the future of computing and anyone storing data in their own data centers in the years to come would be considered a Luddite.
It's a popular belief that moving data to the cloud not only takes advantage of the low prices cloud storage vendors can achieve due to their economies of scale, but it can also free a company from the drudgery of buying, commissioning, provisioning and maintaining storage systems. In addition, if cloud storage is endlessly elastic, it can be used without the kind of careful capacity planning in-house storage requires. And if that's not convincing enough, consider that cloud storage providers say they protect our data by replicating it to multiple data centers.
Maybe data, but what about apps?
Any move to the cloud must first consider the impact such a move would have on your users and their applications. In many cases, it may be feasible to move much of your data to the cloud, but many key applications are likely to require their data to be kept in-house.
Still, if the data existed in the cloud, local caches and copies of data can be retained and used as needed.
But you will need to conduct a careful assessment of each of your data types, and the applications that access them, to see what options are available for moving data to the cloud and how each of those options would affect the user experience, data protection processes, bandwidth required at each location that needs to access the data and, of course, cost.
Grab low-hanging fruit with a move to SaaS
The easiest data to move to the cloud is the data belonging to apps that could be replaced with a Software-as-a-Service (SaaS) application. Despite some widely publicized occasional outages of well-known SaaS apps, such as Gmail and Salesforce.com, moving some in-house apps to a SaaS model can make sense. For example, an internal Exchange infrastructure may have its own reliability issues and users might be spending too much time pruning their email to meet mailbox quotas when they could be doing more constructive things. Moving to a hosted Exchange offering from Rackspace or Intermedia, for example, would be relatively painless; users would get 10 GB or 100 GB mailboxes with additional archive space, and you may be able to avoid a costly and disruptive upgrade from Exchange 2010 to Exchange 2013.
Moving to a hosted exchange product from Microsoft or any of its partners (such as Intermedia or Rackspace) would ensure the user experience remains the same as it was with the in-house Exchange implementation. You may see a small increase in network traffic as users retrieve their data from the cloud server, but the cost of running Exchange would be limited to the $10 or so charge per user the provider has for the service.
Similarly, it's often advantageous to move a current customer relationship management (CRM) system and put the whole application in the cloud. Switching to a Web-based CRM app will make life easier for sales teams and other road warriors vs. using a CRM app that may require installing an application on their laptops as well as a VPN connection back to headquarters.
Moving the files
In many organizations, the bulk of installed storage capacity is consumed by file systems of one type or another. These files are stored on dedicated network-attached storage (NAS) appliances and on a few Windows file servers at remote sites. While simply moving all the files to Amazon Simple Storage Service (S3) or another cloud storage service would effectively put that data in the cloud, S3 and similar services present their data through an object storage interface that would essentially make it inaccessible to users.
There are a number of available solutions, including getting file access via a hosted SharePoint, but that approach may require users to change their work habits. File sync-and-share products and cloud storage gateways are more viable alternatives. The best of these offerings maintain basically the same user experience, including performance, as existing NAS systems, and provide users with easier and broader access to shared files from any location.
File sync and share
Consumer-grade file sync-and-share products such as Dropbox or SugarSync provide an obvious solution to users who need to access their files from multiple devices and locations. But some of these commercial services lack the kind of data access controls, security and scalability that enterprises require.
Corporate sync-and-share solutions such as EMC Syncplicity, Egnyte and Soonr, as well as offerings from Box and Dropbox, address enterprise management issues by integrating with the corporate active directory, imposing access control lists and maintaining an audit trail of who has accessed which file and when. While this is a big improvement, and these offerings can be very useful for sharing data with highly mobile users that need their data on the road and with people outside the organization, most organizations are unlikely to give up traditional file services in favor of file sync and share. Also, with many users syncing all the files in the folders they're subscribed to, bandwidth requirements could become an issue.
The best way to get files into the cloud, while still making them accessible to users in other locations, is to install an integrated cloud storage appliance in each location. These cloud storage gateways, from vendors such as Amazon Web Services, Avere Systems, Nasuni, Panzura and TwinStrata, use local storage, which can include solid-state drives, as a cache and present the data via Server Message Block (SMB) and/or NFS so users can access their data just as if they had a local NAS while the authoritative copy of the data is stored in the cloud.
These gateways map the files into the block storage paradigm used by the cloud providers and, in many cases, also deduplicate the data so, for example, a sales presentation that's in dozens of users' home directories is only stored once. The data is also typically encrypted and then sent to the cloud.
Most of these solutions also use the cloud to store an essentially unlimited number of snapshots. Between these snapshots and the cloud storage provider replicating the data to multiple locations, traditional backups could become a thing of the past.
The best part about these offerings is that all corporate data can be accessed through any gateway in any location. While gateway products vary as to how well they handle multiple users modifying the same file at the same time, they all provide a better shared file-access model for multiple locations than traditional NAS appliances.
What to do with databases
The biggest challenge is finding a solution that can serve database applications. You could run Microsoft SQL Server or Oracle in an Amazon Elastic Compute Cloud (EC2) instance accessing Amazon's Elastic Block Store (EBS), but adding 20ms to 200ms of latency between the application running on a user's PC and the database server will likely have a negative impact on performance and user experience.
Assuming you don't have the sort of huge Oracle databases that require big RISC servers and all-flash arrays, there's a cloud-based solution for database applications. For databases running on virtual Windows or Linux machines, file storage can be provided by cloud storage gateways. The biggest difference is that instead of using SMB, iSCSI will be used to access the gateway resources.
Another difference is in how the gateway is used. Pinning the entire volume holding the database to the gateway's local storage, rather than using it as a cache, offers two significant advantages. First, it leads to better performance. A cache miss on a cloud gateway will add 20ms to 200ms of latency, and just a few cache misses will cause a big drop in performance.
The other advantage to pinning is that the applications can still run even if the Internet connection between the remote site and the cloud storage provider is down. While most locations have, or could be equipped with reliable, high-bandwidth connections to the Internet, good connectivity for field offices and remote locations may be too expensive or simply unavailable.
For data protection, the gateways will take a periodic snapshot of the database volume and send the snapshot to the cloud provider. In the event of a disaster, a virtual version of the gateway can be spun up in EC2 or another cloud compute provider and the apps can be mounted from the snapshot.
Maybe not quite 100% cloud
While moving all of a company's data into cloud storage services may be somewhat impractical at this time, there are tools available today to move at least a copy of all your data to the cloud. Some applications, like email, can be shifted to the cloud fairly easily, while other applications can be replaced with cloud-based apps that offer equivalent functionality. Both of these approaches will move data off premises and into the cloud. For other applications, especially those that might suffer from the latency that cloud storage incurs, a hybrid approach where data is stored locally for performance, but eventually ends up in the cloud, may be best.
About the author:
Howard Marks is founder and chief scientist at DeepStorage LLC. He is a regular contributor to SearchStorage.com and other sites, and a frequent speaker at TechTarget seminars and conferences.
Learn why cloud data access</> isn't always easy