It isn’t called Big Data for nothing.
But while most of the attention surrounding Big Data focuses on analysis and management, there is a more fundamental issue at hand: Where are you going to keep it? Big Data, after, will require Big Storage, and while many enterprises are no doubt looking to the cloud to provide the needed capacity, the fact is that simply housing massive volumes is only one small part of the problem.
According to Red Hat’s George DeBono, there are five key elements when it comes to storing and retrieving Big Data. First, of course, is scale and capacity, which must not only be adequate, but cost-effective. There is also the need to perform periodic data migration as new infrastructure is provisioned and to ensure that data is accessible from both legacy systems and increasingly distributed, even global, operations. And while we’re at it, let’s not forget about security and data protection. If Big Data is valuable to you, it is probably valuable to someone else, as well.
This isn’t to say that the cloud is of little use when it comes to Big Data. In fact, argues Forbes’ Joe McKendrick, they can actually be viewed as two sides of the same coin. His take from the recent South-by-Southwest (SXSW) conference is that the trends form a symbiotic relationship, with Big Data feeding a need for the cloud and the cloud enhancing the capability to store and manage Big Data projects. In the end, enterprises of all sizes will soon hit new levels of innovation now that both the means and the desire to capitalize on massive data stores have arrived.
At some point, however, data has to have a physical home, even in the cloud. And since the high-performance computing (HPC) industry has had lengthy experience dealing with some of the largest volumes ever created, it’s no surprise that some of its hardware would trickle down to the enterprise. SGI, for example, recently launched the InfiniteStorage 5600 platform that uses a modular design and an advanced controller architecture that can support up to 60 4TB disk drives or high-speed SSDs. The company is claiming a 2.5-fold increase in throughput per spindle under the SPC-2 benchmark.
Infrastructure alone is not the answer to Big Data, however. Enterprises will have to adopt new data management and prioritization policies, as well. As Data Center Journal’s Jeff Clark asked recently, are you saving Big Data or merely hoarding it? The difference lies in the fact that much of the structured and unstructured data that comprises Big Data is simply junk. The trick, then, will be to mine volumes for the few nuggets of gold that can lead to new business opportunities or drive higher levels of efficiency and productivity. The challenge going forward will be to perfect the algorithms that weigh things like age, frequency of access, source, location and myriad other factors without getting bogged down in tedious file-by-file analysis.
Big Data is a perfect example of the age-old challenge/opportunity paradigm. There probably isn’t an enterprise in existence today that doesn’t have something of high value buried away on a disk drive somewhere. But with capacity already overstretched and new data coming in daily, the immediate goal is to shore up storage capabilities and ensure there is room for expansion in the future.
Once that is done, the real work of analyzing and leveraging Big Data can begin.