In physics, the nightmare scenario is when an unstoppable force encounters an immovable object. In the enterprise, that would be like Big Data volumes becoming so large that even your expensive new data warehousing solution can’t handle it.
Warehousing vendors have always prided themselves on their ability to scale, but with Big Data about to make the jump from generalized shopping patterns and mobile app usage to highly granular details like how hot an individual car engine is running or whether the fridge needs a new water filter, it’s starting to seem that yesterday’s version of big wasn’t as future-proof as it seemed.
The cloud, of course, is a lifesaver when it comes to scalability. As long as the resources are available somewhere at a reasonable cost, warehousing platforms should be able to cope. Informatica, for instance, recently teamed up with Cloudera to devise a new warehousing reference architecture that they say is optimized for rapid scale environments. The issue, they say, isn’t so much the volume of data that is hitting current warehousing solutions, but the speed at which it is increasing, which causes performance bottlenecks and other problems that lead to costly upgrades. The new design can be used to create Enterprise Data Hubs built around the Apache Hadoop-based Cloudera Enterprise solution with the Informatica Vibe virtual data machine (VDM) the specializes in complex data integration tasks.
At the same time, Teradata has ported pretty much its entire warehousing solution to the cloud where it can be had on a subscription model. The Teradata Cloud features service versions of leading tools like the Teradata Database and Teradata Aster Discovery system, allowing organizations to build stacks of warehousing solutions in a way that the company says is “TCO-neutral” with on-premises models over a three-year lifespan. It can also be used as the external side of a hybrid solution should the enterprise need a ready means of bursting data into the cloud.
Regardless of your system’s footprint, however, the fundamentals of building and managing a warehousing environment remain the same (registration required), according to Slalom Consulting’s Steve Chang. These include having a basic understanding of the type of data you need to analyze (unstructured email, web data, embedded systems, etc.), as well as the kind of resources you have at your disposal, including the human ones. And since warehousing is already complicated enough, the KISS principal (Keep It Simple, Stupid) should be employed regularly throughout the design and implementation process.
Fortunately, even simple designs can be utilized in new and innovative ways, says slashdot.org’s David Strom. These include linking databases together or tracking persistent data across multiple web sites to gain insight into previously disparate sets of information, as well as taking a second look at seemingly useless data that is scheduled to be dumped. As always, the goal is not to simply generate more data that must be analyzed, but to find hidden ways in which data environments and system resources can be optimized for the highest ROI.
It will be interesting going forward to see which will prove the lesser: the unstoppable force (Big Data) or the immovable object (warehouse platforms and enterprise resources in general). Will data volumes increase exponentially, forever? Will the enterprise continue to amass the physical, virtual or cloud resources needed to handle it at all costs? Or is there a happy medium in which volumes, resources and budgets can exist under an uneasy peace?
These are probably some of the most fundamental questions confronting data executives, and I don’t think we’re even close to the answers.