Data storage in the cloud: On-premises storage and public clouds

Data storage in the cloud: On-premises storage and public clouds

Data storage in the cloud: On-premises storage and public clouds

Date: Sep 25, 2012

In part two of our three-part hybrid cloud tip series, Dragon Slayer Consulting founder and senior analyst Marc Staimer examines the pros and cons associated with integrating on-premises storage with a public cloud. In tip one he explains an implementation approach typically known as a federated private-public cloud. In tip three he addresses another hybrid cloud approach: bringing public cloud services into a private data center without shipping data over a WAN.

Read the rest of Staimer's hybrid cloud technology tip series

Private-public cloud integration reduces passive data storage costs

On-site public cloud data storage yields performance advantages

For starters, let's differentiate between interconnected and integrated; interconnected means two distinct systems can pass data back and forth -- but they don't interoperate. Integrated means they function and manage as one system. When considering integrating on-premises storage with a public cloud, there are three different approaches to consider: primary storage integration, gateways for backup and volume manager integration. Each one involves integrating your private data center, whether NAS or SAN, with a public storage cloud, and converting NFS, CIFS or iSCSI to representational state transfer (REST) and/or Simple Object Access Protocol (SOAP). Whether you want to integrate primary storage or just back up is something to consider at the outset. Here are the benefits and challenges associated with each approach to data storage in the cloud.

Primary storage integration

In this first variation, the storage system is primary local storage. It can be SAN or NAS storage or based on a hard-disk drive, a solid-statedrive or a hybrid. Snapshots and/or older passive data can be replicated to the public cloud based on user policies. Data is typically deduplicated and compressed before it's sent to the public storage cloud.

Data is moved via automated storage tiering based on policies. Typically, the sort of data moved is snapshots, incremental snapshots and older passive data not often accessed. The tiering software replicates the software over the REST/SOAP application programming interface (API) sending it directly to the cloud. It then deletes the data (actually the pointers to the local data) on the storage system. When an application calls up the data that now resides on the cloud, the storage system looks it up and presents it as if it were residing on local disk; it just takes a bit longer to read.

Primary storage integration pros:

  • Provides an essentially unlimited, low-cost storage tier for infrequently accessed passive data.
  • Offers low-cost, off-site data disaster recovery (DR) for snapshots.
  • Involves lower public storage cloud costs due to deduplication and compression before storing in public cloud storage.
  • Provides simple multisite collaboration and workflow sharing.
  • Delivers a simpler tech refresh with a lot less data required to be migrated.

Primary storage integration cons:

  • Requires the same storage systems or system architecture at every site where data will be recalled from or written to the public storage cloud. That means if you have five sites sending data to the same cloud, all five sites need to have the same storage systems. You can't directly access the cloud storage and read the data without reading it through the same storage system that placed it there.
  • Data can't be read directly from the public storage cloud. It must first be recalled through the primary storage system.
  • Many current systems lack all the feature and functions of traditional primary storage, such as VMware integration, Hyper-V integration, backup software snapshot integration and more.
  • It takes a while to move a lot of data from private storage to the public service cloud because of bandwidth and latency issues (speed of light and TCP/IP).
  • Doesn't completely avoid the issues of a tech refresh.

Gateways for backup

The next option is more appropriately termed a gateway or targeted secondary storage. The targeted secondary storage system is architected as the target -- it can be a physical or virtual appliance, but it's still a targeted secondary storage system. In this design, the storage system is architected as the target for backup data, archive data, and second- or third-tier data. Data is deduplicated, compressed and encrypted before being moved to the public storage cloud.

The pros associated with gateways for backup are:

  • Just like the primary storage integration, it delivers an unlimited, low-cost storage tier for infrequently accessed passive data.
  • Offers low-cost, off-site data DR for secondary data.
  • Involves lower public storage cloud costs because of deduplication and compression before storing in the public storage cloud.
  • Provides simple multisite collaboration and workflow sharing of secondary data.
  • Delivers a simpler tech refresh with a lot less data required to be migrated, at least for the secondary storage systems.

The cons associated with gateways for backup are:

  • Data must be migrated or moved from primary data storage systems to these secondary storage systems. Data movers are typically third-party products that add expense and complexity. I’m talking about things like archiving software, backup software, data migration software, and multisystem storage tiering software.
  • Requires the same secondary storage systems or system architecture at every site where data will be recalled from or written to the public storage cloud.
  • Takes a long time to move data from primary storage to the gateway. That’s because backup software doesn't move data from primary storage to the gateway (with a few exceptions). Instead, backup software moves data from the servers to the media server, which stores it on the gateway or, in most cases, directly on the storage cloud.

Volume manager integration

The third option for integrating private data center storage with a public cloud focuses on direct application, file system or volume manager integration with public storage clouds. The applications, server file system or volume manager must be modified to write data directly to the REST or SOAP API of the public storage cloud.

Pros of application, file system or volume manager integration with public storage clouds are:

  • Applications, as well as physical or virtual servers, can place data directly in the public storage cloud with no intermediary systems.
  • Lower overall total cost of ownership because there's no requirement for an intermediary storage system and supporting infrastructure (racks, floor space, cables, cable runs, power, cooling and so on) in between.
  • Simple multisite collaboration and workflow sharing of data; i.e., the data can be read from or written to the public storage cloud natively from anywhere.
  • Many popular backup and archive software products already write natively to public cloud storage, with more coming.
  • It takes a while to move a lot of data from private storage to the public service cloud because of bandwidth and latency issues (speed of light and TCP/IP).
  • Tech refreshs are a non-issue because there are no local storage systems to refresh.

Cons of application, file system or volume manager integration with public storage clouds are:

  • Most applications, file systems or volume managers must be modified (code written) to write to or from public cloud storage -- and that code must be documented, quality assured, patched and fixed on an ongoing basis. As previously noted, many software vendors have added or will be adding this capability natively to their software.
  • Each public storage cloud has its own proprietary API. Slow public storage cloud adoption rate for the Cloud Data Management Interface (CDMI) standard currently makes the less-capable Amazon Simple Storage Service (Amazon S3) de facto API standard the best available.
  • Higher public storage cloud costs than variation one or two; because there's no inherent deduplication or compression, more data is stored in the cloud.

All three variations of private storage to public storage cloud integration noted above are effective. Tip three of this series on hybrid cloud technology will scrutinize hybrid storage clouds that extend the public storage cloud into local premises.

BIO: Marc Staimer is the founder and senior analyst at Dragon Slayer Consulting in Beaverton, Ore. The consulting practice of 14 years has focused on the areas of strategic planning, product development and market development. With more than 32 years of marketing, sales and business experience in infrastructure, storage, server, software and virtualization, Marc is considered one of the industry's leading experts.

More on Hybrid Cloud Storage

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: