Modern analytics applications are only as good as the data they are given. Data resides in storage pools, so in order to make the most of it, the enterprise will need to convert legacy storage systems to meet the particular needs of analytics-driven workflows.
But this is likely to be a tall order. For one thing, modernizing storage to meet an entirely new operational model is rather difficult without disrupting legacy data models. And secondly, the fact is that even the experts are not entirely sure exactly what needs to be done to produce an optimal storage environment for analytics, particularly since most environments will be built around artificial intelligence and machine learning technologies that are likely to evolve in ways that are unique to the individual enterprise.
Probably the one common element in analytics-facing storage will be the fact that it must take on a more active role in the application process. In the past, notes Enterprise Storage Forum’s Drew Robb, storage was mostly inert, waiting for something to come along and tell it what to do. Going forward, organizations will have to build storage to proactively meet the highly defined needs of Hadoop, SAS, Watson and other platforms. This will require the intent of the storage system’s role to be built into its core, with all other functions layered on top.
In terms of raw performance, however, it is hard to see how speed would not play a major factor in emerging analytics infrastructure, says Datamation’s Pedro Hernandez. A recent survey conducted by DataDirect Networks showed that more than three quarters of IT executives who specialize in HPC infrastructure are worried that their current I/O capabilities are not good enough for complex analytics. This may seem surprising considering most HPC platforms rely on Flash storage, but in many cases just moving to solid state is not enough without an upgrade to existing I/O performance.
In some cases, upgrades to storage infrastructure should be paired with entirely new architectural approaches optimized for both the scale and complexity of the analytics challenge. HPE, for example, is out with a new hierarchical management solution called the Data Management Framework (DMF) that improves on traditional tiered architectures through more targeted management of hot and cold data. Using automated workflow migration, resource scaling and other techniques, the platform aims to introduce greater efficiency to the storage and retrieval process and reduce resource consumption to help keep costs under control. (Disclosure: I provide content services for HPE.)
Another key architectural shift in analytics-facing storage is the increased prevalence of object storage as opposed to traditional block and file approaches. As eWeek’s Chris Preimesberger points out, object storage not only meets the scalability and efficiency demands of high-volume analytics workloads, but it conforms to the hybrid, software-defined architectures that are under development and the parallel-access models that are needed to cope with workflows requiring continuously expanding data throughput. On top of that, object storage is more durable due to its built-in redundancy, and it happens to be the solution of choice for most public clouds where the bulk of data accessed by analytics applications happens to be.
While storage is crucial for analytics, the enterprise should take care not to develop storage infrastructure in isolation from the rest of the data environment. The only way to arrive at a truly optimized system is through a strategic vision that stresses outcomes first and technologies second.
When applications are highly dependent on data, any information left in isolated pockets will skew the results, perhaps leading to critical errors in the pursuit of key objectives.
Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.