Big Data is not just the latest fad to hit the enterprise, it’s an obsession. On the one hand is the fear of constructing the infrastructure capable of handling massive volumes, and on the other is the anticipation of all the advantages to be gained by mining and analyzing that data.
New research from QuinStreet Enterprise reveals that more than three quarters of all organizations consider Big Data a top priority in the coming year, citing the need to foster speed and accuracy in the decision-making process as a key driver. Interestingly, it seems that Big Data is not just the province of the Big Enterprise either. More than 70 percent of mid-sized companies are also planning Big Data initiatives.
For the most part, it does not appear that the enterprise is looking to retrofit existing infrastructure for Big Data analysis. Legacy architectures are too well-tuned for standard business application support to provide a useful jumping-off point for scale-out, large-volume data environments. In all likelihood, Big Data will be the province of Greenfield development or the cloud, most likely using new generations of purpose-built systems like the CloudOOP 12000 RD server from PSSC Labs. The device crams 12 3.5-inch hard disk drives and a dual-processor eATX motherboard into a 1RU footprint, offering upwards of 48TB per unit. It also comes with built-in support for leading Hadoop distributions like Cloudera, MapR and Horton Works.
Big Data will also need support from higher-level automation and systems orchestration tools. In yet another new initiative from a revitalized HP, the company has launched the Orchestrated Datacenter platform, which is a collection of automation and management services designed to coordinate an entire IT stack for the benefit of Big Data and other enterprise needs. The package includes HP Enterprise Maps, HP Cloud Service Automation, Operations Orchestration and Server Automation modules, all of which can be leveraged to reduce the IT provisioning process from days to a matter of minutes.
So much for the infrastructure side of the equation. When it comes to the analysis and actual utility of all that data, however, things are little murkier. Not that the tools don’t exist to carry out those functions, but our understanding of the results is still very rudimentary. Take, for example, Google’s boast a few years ago that it could more accurately predict the spread of the flu virus by mining geographical data linked to search terms like “flu remedies” and “pharmacist.” It turns out the conclusions drawn from the data were not that accurate. The algorithms used to calculate the results worked just fine, it was the interpretation that went astray.
This, of course, leads to the larger question surrounding Big Data: Will more information simply lead to more confusion? No one doubts the IT industry’s ability to capture and manipulate all that data blossoming from thousands of sources on the Internet of Things, but the human mind is still fallible and can be easily tricked into seeing what it wants to see rather than how things really are, particularly when that reality is hidden beneath layers and layers of algorithmic calculations.
To the enterprise contemplating the introduction of Big Data into their business processes, I offer a word of caution: Capturing and harnessing Big Data is one thing; turning it into actual knowledge is quite another.