What you will learn in this tip: A cloud service-level agreement (SLA) provides added protection for businesses looking
to shield their data from the spate of recent cloud outages. Find out how to better understand your cloud SLA and what other steps you can take to protect yourself in the cloud.
Cloud service outages have been in the news a lot lately. Amazon EC2 EBS, Google's Blogger service and Microsoft Business Productivity Online Standard Suite (BPOS) cloud email are the latest big-name cloud service providers to suffer service disruptions. If you do a quick Google search for “Microsoft Cloud Service Outage,” you’ll get pages of results going back over a year with headlines about Microsoft's apologies for service interruptions. There were at least three BPOS incidents in May 2011, but to be fair, not all incidents impacted all users; the May 19 service interruption is said to have impacted less than 1% of its users and only lasted a few hours.
Microsoft wasn’t the only cloud service provider with outages. The Amazon EC2 outage had possibly the furthest reaching effect. Thousands of EC2 users and a number of well-known sites were taken down, which created a ton of online buzz and visibility. Amazon didn't help its case with its poor communication regarding the extent of the problem and the expected resolution timeline, but promised to do better with future instances.
These outages aren't isolated to the omnipresent cloud computing industry. Enterprise IT systems also have outages, but those instances are kept quiet and rarely get the same level of public visibility as those in the cloud. Cloud service outages get a high level of visibility because they have ripple effects -- an enterprise IT outage may affect a number of internal applications, but a cloud outage may affect a number of applications at hundreds or thousands of businesses. The resolution to these problems goes beyond the control of any of them. All that leads to one rule: If you plan on using a cloud service, you need to fully understand what it can do for you and how to protect yourself in the event of a problem.
Protect yourself with a cloud service-level agreement
If you're thinking about using a cloud service, or are already using one, here's how to get the most from your cloud service-level agreement.
Understand your cloud service-level agreement. Read the fine print and understand exactly what the cloud SLA guarantees. Many services guarantee a 99.999% uptime, which means you can access the service for 525,594 and ¾ minutes out of 525,600 minutes in a year. In other words, you would only suffer 5 ¼ minutes of downtime per year. This would mean that out of every 1,440 minutes in a day, you would have only .0144 minutes (mere seconds) of downtime. Who can’t tolerate that? But a cloud SLA of 99.999% uptime doesn't mean you'll never experience downtime that exceeds the guarantee. You might, but you would have a right to receive redress if the vendor misses its guarantee.
More importantly, you should understand that service availability doesn't guarantee data availability. Most services have data protection methods in place to protect you from physical failures such as a failed disk drive or even a site outage. But you wouldn’t be protected from logical failures like accidental deletion or a software bug. You still need to deploy some level of protection from logical failures by implementing some form of data protection technology, whether that's data backup, point-in-time copy or continuous data protection (CDP). Data protection still matters in the cloud; you can’t just leave it up to a cloud service provider.
Design a solution that meets your needs. You still need to dedicate resources to architect and design a solution that meets your particular business needs. Many people are drawn to the cloud because it sounds easy. You just subscribe to a cloud service and it takes care of all the day-to-day data management, capacity planning, load balancing and tuning, right? However, you're the only one that understands what data is critical, how sensitive your data is, how long it needs to be retained, how often it needs to be backed up, as well as its performance parameters and recovery point objective (RPO)/recovery time objective (RTO) needs.
Consider a hybrid strategy. Some applications can withstand downtime without a significant impact to your business and are entirely suitable for the cloud. Do the risk/reward analysis. Most data is inactive and rarely used, making it perfectly suitable for archiving to the cloud. Other data is extremely performance-sensitive and performance-intensive, but it doesn’t necessarily stay that way. Keeping newer data local and aging it to the cloud over time is a strategy that provides the best of both worlds. But make sure the policies suit the application; put the resources you dedicated above to work to understand RPO/RTO, data sensitivity and protection requirements, as well as usage patterns to ensure you're dealing with cloud-suitable data.
It's still relatively early in the creation and understanding of hyperscale systems like the ones being used to support cloud computing services. Cloud service providers are building systems at a scale never before attempted and learning as they go, but that doesn’t mean you should stay away from them. The potential to save precious Capex and Opex dollars on buying, managing and maintaining your own systems by investing those dollars in other ways to grow your business is compelling. However, users need to understand exactly what they're signing up for and, just like with internally housed IT applications, they need to architect solutions that support business requirements and plan for the inevitable unplanned outage.
BIO: Terri McClure is a senior storage analyst at Enterprise Strategy Group, Milford, Mass.
This was first published in June 2011