The Business Impact of Big Data
Many business executives want more information than ever, even though they're already drowning in it.
Tuesday, I suggested organizations consider two questions about Big Data:
- What use is Big Data to me?
- Assuming it's worth pursuing, how do I go about doing that?
I discussed the first one, which is, alas, not as cut and dried as you would hope, particularly if you're going to have to build a business case for it. It seems companies either have a Big Data problem or they don't-and if you do, then you'll pursue it.
Then the question becomes how you go about that, particularly if you're not a super-large company with tons of resources? After all, Hadoop is supposed to make Big Data cheaper, but "cheaper" doesn't necessarily translate into "affordable." For instance, Hadoop is open source and can use servers you may have on hand-but good luck finding an unemployed Hadoop expert or R programmers - as IT Business Edge's Susan Hall shares, there's a shortage of Hadoop or NoSQLexperts.
Vendors are well aware of this problem, and it's a hole they'd very much like to plug. That's why they're talking more about "the democratizing of Big Data." What amuses me about that term is it was also used when Hadoop hit the mainstream press to mean that Hadoop would make Big Data storage and processing available to more companies. Now, vendors are using the term to mean making Hadoop itself more accessible to companies of all sizes.
I've been watching this emerging space and talking with vendors, and thus far, I'd categorize them as:
- Solutions built on Hadoop, designed to make it more user-friendly. Into this group, I'd place IBM's Big Insight. Syncsort's proposed Hadoop edition would arguably fit into this category, as well as the next category, because it's designed to make Hadoop easier, but would require a plugin modification of the existing framework.
- Solutions that extend their existing product into Hadoop - these are not built on Hadoop, but rather function as a sort of "Hadoop add-on." Informatica is a good example of this type of solution, because it allows companies to use Informatica's user interface to read and write to Hadoop stores, as well as apply many of Informatica's core capabilities to Hadoop. "... you've taken the hundred-thousand-plus Informatica developers in the world and effectively turned them into Hadoop programmers ..." explained Informatica's CTO in a remark that also sums up the difference in this approach. I would also consider Pentaho along these lines, since it includes native support for handling Hadoop data integration and movement with its ETL tool (more on that in a future interview). By far, this seems to be the most common type of Big Data option.
- Solutions that handle large amounts of data, usually using virtualization or data federation, but are not focused on Hadoop or NoSQL. I added this category after interviewing Stone Bond last week. Stone Bond has been around since 2002 and says its solution could handle Big Data before all the hoopla about Big Data. "According to Forrester Research Inc., proprietary data management software vendors will increasingly add 'big data' processing capabilities to their portfolios in the near future, something that Stone Bond has been doing quietly for years," the company says in its press release on Enterprise Enabler Virtuoso, a data integration tool with a focus on Big Data support. "While 'big data' might be the hot topic of the moment and in turn is the recipient of billions of investor dollars, Stone Bond is the only company delivering on the real industry pain point of time-to-value and has evolved their integration technology to further prove that." Watch for my interview with Stone Bond, which will run in September.
Regardless of how they approach Big Data, the goal is the same: To make it more widely available to the average organization.
And it seems to be working. As a recent GigaOM article pointed out, a really cool film forecasting tool developed by the University of Southern California's Annenberg Innovation Lab pulled in tweets and analyzed them to predict which films would likely be hits. You'd think it would be a huge technical challenge, but no. "... it really came down to one communications masters student who learned Big Sheets in a day, then pulled in the tweets and analyzed them." Big Sheets, by the way, is an application that's part of IBM's Big Insights.
IBM isn't the only one offering such impressive solutions, GigaOM adds, writing:
What this shows is that with the rise of big data, we're also seeing the emergence of really powerful but simple tools that can democratize data analytics and business intelligence. Big data won't necessarily be handled by just data scientists; it can be wielded by non-technical people. That's a powerful idea, because it suggests a world in which we can all be data jockeys.
There are other forces at play to make Big Data more widely available, too. Frank Moss writes that "affordable cloud computing storage, open source software for processing large volumes of data, and Big Data sets being made available in the public domain" are helping "democratizing Big Data."
Thanks to these types of efforts and trends, Big Data - or even very large sets of small data - may be more within reach than you might think.