While most of the conversation surrounding Big Data these days is squarely focused on Hadoop, Capgemini is betting that Big Data will require a small pantheon of technologies to be successfully deployed and managed.
As a result, Capgemini has inked an extensive alliance with Pivotal, a unit of EMC, under which it will co-develop a suite of Big Data applications.
According to Steve Jones, Capgemini’s director of strategy for Big Data and analytics, Capgemini has a long history of working with EMC that is now being extended to include the suite of Big Data products offered by Pivotal, which in addition to a distribution of Hadoop includes the Greenplum massively parallel database and the GemFire in-memory database.
Jones says Capgemini is working with Pivotal to develop these applications because the lakes of Big Data that IT organizations will eventually build will be based on multiple types of databases that will need to be tightly integrated with data management and governance tools. In effect, these “data lakes” will then become the central repository through which all other applications access data. Rather than standardizing on Hadoop, however, Capgemini envisions a world where Hadoop provides a common data substrate that is accessed by a variety of database engines running different classes of enterprise applications.
In that construct, Jones says Capgemini expects SQL to remain the dominant language through which business applications access all that data. But the backend data warehouse where all the structured and unstructured data is maintained and governed is going to be much more of a virtual entity.
As IT organizations move deeper into Big Data in the coming year, it’s pretty clear that it will be a lot more involved than setting up a Hadoop cluster. While the degree to which any organization will choose to wade into a proverbial Big Data lake will vary, the thing to keep in mind is not so much how wide the lake is, but rather how deep.