The Where, Why and How of Hybrid Storage

Arthur Cole

Flash technology has made a substantial contribution to the enterprise storage array, but most organizations see it as an adjunct to traditional spinning media, not a replacement. Implementation of these hybrid storage architectures is not without its challenges, however. Exactly when and how flash is to be activated in working data environments remains a key issue. But as Nimble Storage’s Head of Product Marketing, Radhika Krishnan, notes in a conversation with IT Business Edge’s Arthur Cole, the key goal is to maximize flash’s utility without wearing it down.

Cole: There is a lot of activity surrounding flash cache support for traditional disk storage, but questions remain as to how these hybrid storage systems should be designed. What does the enterprise need to consider in order to strike the right balance between flash and hard disk?

Krishnan: Flash offers distinct performance advantages, however, there are some key factors to keep in mind when designing flash into storage. One is cost economics. With most mainstream workloads, only a small subset of data needs to reside on flash. The key to a successful hybrid design is ensuring that the right data is located in flash at the right time. It is also equally critical to use capacity savings techniques such as compression to maximize the amount of data on flash.

I/O optimization is another. Flash performs really well with random I/O on a per-dollar basis. For sequential I/O on a per-dollar basis, hard disk can be more effective. A system that leverages flash and disk accordingly delivers the best levels of storage efficiency.

It will also be necessary to work around flash write endurance limitations. Every write performed to flash degrades the life of the flash device, so a key consideration is to minimize the number of writes to flash or to sufficiently over provision it to maximize the lifespan.

As well, different applications need different ratios of flash/disk. An efficient architecture provides visibility into flash usage and allows the flash ratio to be varied flexibly and non-disruptively.

Data integrity is another issue. Using flash as a storage tier inherently poses some disadvantages. First off, data in a hybrid needs to be migrated off flash to disk, introducing bottlenecks in the system. Also, using flash as a tier requires that it be RAID protected, which produces overhead and additional costs. The reliability and error characteristics of hard disk drives are very well understood vis-a-vis flash – thus always having a copy of all data in disk eliminates any risk of data integrity being compromised.

Finally, the data layout needs to be optimized to deliver data management features such as snapshots and clones, which deliver data protection and performance/capacity benefits.

Cole: Does hybrid technology drive a need to alter existing block, file or even object-oriented architectures?

Krishnan: Data layout is to storage systems what the foundation is to a house. The layout of data on flash and disk determines the performance, capacity and data management benefits that a system delivers. Irrespective of whether the underlying architecture is block, object or file based, retrofitting flash on top of an existing architecture optimized for spinning disk delivers sub-optimal results.

For instance, keeping frequently accessed metadata in flash is crucial to deliver sustained performance, but the actual type of such metadata varies depending upon whether it is block or file or object. This feature is not something that can be cleanly retrofitted into legacy architectures.

Cole: In the end, though, doesn't hybrid technology simply push the data bottleneck from the storage system to the CPU?

Krishnan: Most hybrid systems in the market today solve for read bottlenecks. With tiered systems or those that use flash as a write-back cache, bottlenecks get introduced in the write path when draining data from flash to disk. To truly eliminate all I/O bottlenecks, hybrid systems need to be carefully architected, optimizing for both read and writes, leveraging the best of flash and disk characteristics.

Furthermore, different workloads have varying performance needs. A truly flexible hybrid architecture where you can easily and non-disruptively scale performance is more likely to be effective in today's consolidated data center in eliminating storage bottlenecks.

To avoid the CPU becoming a bottleneck, care needs to be exercised to architect an environment where compute, networking and storage are balanced and scalable.

Add Comment      Leave a comment on this blog post
Jul 19, 2013 8:25 AM StorageOlogist StorageOlogist  says:
Good comments here from Radhika. All the points resonate. A couple of clarifications though. The layout of data is extremely important as mentioned. Nimble uses a write through cache and therefore lays data out on disk in write optimized fashion. At Starboard Storage they have a writeback cache and therefore can layout data on disk in a read optimized sequential layout that requires no restriping of data. Using a write through cache as Nimble does introduces a problem because data written to disk in write optimized fashion is not optimal for reads. Nimble uses the read cache to mask this but ultimately the architecture requires that you clean up data on the disk. It is misleading to say that having a writeback cache introduces bottlenecks into the write path because the real issue is the size of the cache vs. the workload and because writeback cache can be based on SSD it can be a factor or more larger than using NVRAM for a write through cache. All architectures have bottlenecks including Nimble's. The skill is in making sure you provide the customer adequate capacity and performance for his workload and can simply upgrade both in plug and play fashion. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.