The enterprise is doing its best to build the infrastructure needed to support Big Data, but not surprisingly most organizations are already starting to feel the strain. Data, after all, has a way of accumulating faster than either hardware or software can handle it, even in the age of rapid scalability.
And as many IT executives are finding out, there is more to Big Data than simply finding a place to store and analyze it.
According to a recent survey of U.S. and UK executives from Researchscape International, nearly half say their current data warehousing platforms are starting to break due to rising analytic workloads. Perhaps coincidentally, about half say they are employing new platforms like Hadoop and Spark for Big Data while the other half is trying to leverage legacy platforms, although there was no indication as to whether it was the latter group that was experiencing the most severe growing pains. A key complaint by about a third of the group, however, was that volume growth is pushing the warehousing budget to unsustainable levels.
Naturally, says IT World’s Martyn Jones, a warehousing platform that was never designed for Big Data will have trouble fulfilling its demanding requirements. This is why the new, fourth generation warehouse represents such a significant upgrade over previous generations: It basically turns warehousing from a storage and analysis platform to a full-blown data logistics solution. Traditional warehousing allowed organizations to customize web experiences and other interactions based on immediate needs, such as past mouse-clicks and page-view duration. Emerging platforms will provide much more extensive analytics that cover everything from the past few seconds to the past few years, and will incorporate not just individual requirements but those of groups and entire markets.
The central mistake that many organizations make when upgrading to Big Data analytics is in assuming that it is simply a matter of more scale and faster throughput, according to Database Journal’s Lockwood Lyon. More CPUs, more memory, and more resources may address the volume question of Big Data, but the complexity of data types like large objects (LOBs), XML and multi-structured rich media will require a diversity of architectures that must be integrated into the overall analytics environment. This will require the enterprise to take a hard look at how both source and operational data is ingested, modeled and documented within overarching contextual analysis workflows.
It is for these and other reasons that the enterprise should not be overly concerned with Big Data ROI at this point, says Pentaho data scientist Wael Elrifai. As he explained to Computing.com, the entire purpose of data science is to explore the digital ecosystem to see if there are any relationships that might lead to opportunities, but the nature of exploration is that you never know what you are going to find at the outset. Whether you are building a data warehouse or its emerging cousin, the data lake, the focus should be on refining the analytics process, not pushing for an immediate financial return.
As I and others have pointed out in the past, Big Data is not just big, it’s big, fast, complicated and, hopefully, valuable. As such, it will require more than just scale and power, but intelligence and sophisticated architectural design to really see it through.
And the best time to plan for all these attributes is at the beginning, before the first server or storage module has been provisioned.
Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.