Everyone in the storage market is excited about flash. As well they should be. Flash can deliver much better performance than spinning disks. In addition, the cost of flash media is starting to come down to a level where all enterprises should consider adding flash-based storage to their enterprise storage infrastructure.
Yet amidst all this excitement there is also confusion. In particular, as vendors introduce a variety of all-flash and hybrid-flash storage systems, many storage administrators are confused as to the differences between the systems, as well as the benefits and drawbacks of each of them. Almost all current products in the flash storage landscape can be broken down into four different types of system architectures – all-flash, post-process tiering hybrid, caching hybrid, and continuous tiering hybrid – each with its own unique characteristics.
By comparing flash storage architectures side by side, storage administrators can better understand what flash architectures – and by extension what flash products – make the most sense for their particular set of applications. In doing so, they are likely to find that while flash has improved enterprise storage system performance, until recently storage system architectures have failed to keep up with media and other technology advancements, and fully leveraged these technologies to deliver better performance while minimizing data storage costs.
In this slideshow, Jacob Cherian, vice president of product strategy at Reduxio, takes a closer look at the four flash storage architectures currently on the market.
A Closer Look at Flash Storage
Click through for more on the four flash storage architectures currently on the market, as identified by Jacob Cherian, vice president of product strategy at Reduxio.
Storage systems that employ all-flash architectures use only a single type of media: flash. Because of this they are very fast. But, they are also expensive, making them suitable only for those applications where speed at any cost is the driving motivation.
Enterprise storage systems serve two functions – providing read/write access to the data (where speed is important) and storing data (where speed is less important). At any given time only a portion of a storage system’s data (the working set) is being read or written by applications, while the remaining data is cold.
Since all-flash systems use a single type of media, all data – even colder data (data that is inactive) – is kept on flash media. This colder data is either stored on expensive flash – which has a higher write endurance – or on less expensive flash – which requires sufficient reserve (larger amounts) to deal with its lower write endurance. Therefore, even though flash does not offer much benefit for storing cold data, all-flash systems still use it to do so. All-flash systems try to compensate for this by supporting in-line dedupe and compression to reduce the systems’ effective cost. But even with in-line dedupe and compression, all-flash systems still cost too much for broad deployment. Think of all-flash systems as the private jets of the storage industry. They might be suitable for some applications, such as flying your CEO to important meetings around the world, but for commuting to work on a daily basis, they don’t make economic sense.
As opposed to systems that use all-flash architectures, systems that use hybrid architectures try to optimize price-performance by combining a media type that provides fast access to data with a media type that can store data inexpensively. Today, hybrid systems use a combination of flash and spinning media in order to provide higher levels of performance compared to solely disk-based systems – all at prices significantly lower than all-flash systems. The hybrid architectures discussed on the follow slides attempt to provide such high-level performance, with various degrees of success.
Hybrid systems offered by legacy storage vendors such as EMC, NetApp, IBM, HP and Dell Systems generally use the post-process-tiering-hybrid architecture. This architecture retrofits flash into storage systems that were originally designed and optimized for use with disk drives. Post-processing-hybrid systems examine IO patterns over a long period of time to determine what data is hot (the working set) and what is cold. Then, based on a set schedule (every 12 hours, for example), they move data that they think will be active (hot data) to flash and data they expect to be inactive (cold data) to disk – essentially guessing what data will be hot during the next period based on historical data. However, because the data movement happens periodically, very often the hot data ends up on disk and the cold data on flash. The result is that post-process-tiering-hybrid systems deliver only marginal performance improvements over disk-based systems. In addition, because they lack dedupe and compression, they also make less efficient use of the available media.
Caching-hybrid systems have an architecture in which flash is used as a cache for a pool of disks, either as a read and write cache or solely as a read cache. While caching-hybrid systems do deliver better performance than post-process-tiering systems, they come with significant drawbacks. Because flash is being used as a cache, rather than as storage media, the cache does not contribute to the overall capacity of the system, making these caching-hybrid systems more expensive than tiering systems in terms of cost per gigabyte of raw capacity. In addition, systems that use flash only to cache reads have to employ DRAM to serialize writes that are made directly to spinning media. This strategy works only with very small working sets. If the working set is large, write performance slows to the pace of disk drives, lowering overall system performance.
Moreover, as innovations in both flash and other solid-state storage media continue, hybrid systems that use multiple (two or more) solid-state media types will soon become possible. Systems that employ a caching-hybrid architecture can’t benefit from these innovations, since when two types of flash are used there is very little performance difference between the cache and the pooled media. Therefore caching-hybrid systems using multiple solid state media types only secure marginal performance benefits from caching, despite a significant increase in cost. Also many caching hybrids do not support dedupe, and when dedupe is supported it must be disabled for volumes that are performance sensitive.
Continuous Tiering Hybrid
The last architecture – and the newest – is the continuous-tiering-hybrid architecture. This architecture differs in several fundamental ways from both post-process-tiering systems and caching-hybrid systems.
- Data in these systems is deduped and compressed in real-time, before it gets placed on the first tier of media. This greatly expands the usable capacity of the system, while maximizing utilization of all tiers.
- All data in the normal case is initially written to the primary media tier. Because of this, continuous tiering hybrid systems that use flash and disk can deliver write IOPs and write latency performance similar to that of all-flash systems at a significantly lower cost.
- Rather than tiering data on a schedule, continuous-tiering-hybrid systems automatically tier data continuously based on workload changes. Therefore the hot data is immediately moved to the fastest media available. To enable this instantaneous reactivity to change in workloads, the tiering is fine-grained – with “heat” being maintained for very small blocks of data (as low as 8K).
- The tiering is media aware. To ensure that write and read-back performance of the slower tiers is not affected, data movement is optimized to the properties of the specific source and target media. This allows continuous-tiering-hybrid systems to support multiple media tiers and allows for easy integration of new types of media, future proofing the system against changes in the media landscape.
Of course, outside of a flash storage system’s architecture there are other factors to consider when purchasing a system, including how users recover data on the system, how they clone data from the system, how well the system integrates with virtualization environments and applications and how easy the system is to deploy and manage. Nonetheless, a flash storage system’s basic architecture is a key factor in determining whether it can deliver the storage performance users needs at a price they can afford. In shopping for flash storage systems, customers need to ask vendors whether their system’s architecture has been designed to fully leverage the latest data media, processing and networking technologies to maximize price/performance, or if they are employing a simplistic or antiqued architecture that fails to take full advantage of these technologies.