As the sheer volume of Big Data continues to scale upward, the complexity of managing the overall IT environment increases exponentially. The challenge is not only figuring out how to make Big Data accessible, but to also efficiently and continually manage it.
Coinciding with release of the IBM PureData System for Hadoop, IBM this week during a Building Confidence in Big Data event in New York, moved to address this issue with the release of its new Big Data governance tools. These tools include a Data Click that allows end user to self-provision data sources and a dashboard that informs those users how much that information is actually trusted by the organization.
In addition, IBM announced Big Stampede, which is a new IBM consulting service specifically aimed at helping organizations launch Big Data projects at a time when the skills needed to actually launch those project are in short supply. IBM says The Big Stampede service captures the best practices that IBM has developed from over 3,000 Big Data engagements, much of which is focused on setting up the right zones of Big Data usage across the enterprise based on the context in which the data is actually being used.
According to Bob Picciano, IBM general manager for its Information Management division, the two biggest challenges with Big Data come down to the veracity and rate at which the data is being consumed.
Not all business data is of equal value, so it’s incumbent on the IT organizations to come up with a way that identifies the level of confidence the organization has in that data in order to be sure decisions are being made on facts that have been verified based on the lineage of the data. Almost by definition, says Picciano, all Big Data is to one degree or another uncertain.
Picciano also notes that the speed at which organizations can provide insight into Big Data is what will ultimately distinguish one project from another. Historical insights in Big Data are rapidly becoming a commodity in terms of being a capability that every organization will have. The real opportunity, says Picciano, is to be able to leverage technologies such as streaming analytics to generate actionable intelligence in real time.
That capability, says Picciano, requires the ability to leverage Hadoop and SQL database platforms in a way that dynamically scales, using both Flash and magnetic storage in the form of the new IBM BLU Acceleration, an instance of DB2 that has been optimized for analytics applications running on top of a data warehouse. According to Picciano, BLU Acceleration is essentially a superset of in-memory columnar databases, such as SAP HANA, that combines the best attributes of solid-state drives (SSDs) and magnetic storage to effectively manage Big Data at scale.
Ultimately, IBM is making a push to accelerate the rate at which Big Data projects will move from being mere tactical experiments to becoming strategic business initiatives. That can’t really happen, however, until organizations not only trust the data they are being presented, but can actually effectively manage it in the moment it’s truly needed.