Q

Does it matter if cloud providers use erasure codes or MCM?

Many admins may not care whether their cloud provider uses erasure coding or multicopy mirroring, but one approach has a bigger impact on protection.

Does it matter whether multi-copy mirroring or erasure codes are used by my cloud provider?

In my experience, most users don't care whether their public cloud provider uses multi-copying mirroring (MCM) or erasure codes -- but they should. The short answer is that erasure codes offer a better guarantee that your data will be protected. But since the price per gigabyte is approximately the same for companies that use MCM -- Google and Amazon, for example -- and offerings with erasure coding -- such as Microsoft's Azure, many customers ignore this debate when choosing a provider. However, MCM generally uses more storage, and is less resilient, than erasure codes.

With MCM, you are protected against only two concurrent losses. MCM is most effective when the mirrored object has some form of autonomic healing that periodically checks data object health and corrects it if there are any issues. If a disk drive or node has failed, gone dark, isn't there or comes back corrupted, the storage system will make another copy from a healthy version of the data objects and write them somewhere else. This system works pretty well, but it has a high cost associated with it because protecting against a single data object failure requires 100% additional storage capacity and nodes. Protecting against two concurrent data object failures requires 200% additional storage capacity and nodes; three concurrent failures require 300% and so on. It's an extremely high overhead.

With autonomic healing, erasure codes change that equation by protecting against six or more concurrent drive, node, system, site or other failures. It does this by dividing a data object into a number of chunks, the total being referred to as the width. A common width is 16 chunks. The object storage has to read only a subset of those chunks (referred to as the breadth) to reconstitute the data object. A common breadth is 10 chunks. In this example, the object storage system can tolerate six concurrent failures of drives, nodes, sites and so on, and still read the data objects. Autonomic healing enables the lost chunks to be recreated and written elsewhere. This example demonstrates three times the resilience of MCM at one-fifth the overhead. You therefore should care which data protection method is implemented by a cloud provider because it is more likely that data objects will be there when they're needed if erasure codes are used.

However, nothing is perfect. A downside to erasure codes is that additional latency can reduce performance. Depending on how many other latencies you are introducing, this can be an important factor to consider.

This was first published in May 2014

Dig deeper on Public Cloud Storage

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

1 comment

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchStorage

SearchSolidStateStorage

SearchVirtualStorage

SearchAWS

SearchDisasterRecovery

SearchDataBackup

SearchSMBStorage

Close