Business leaders often rely on data as facts, yielding to them as a foundation (or sometimes a rationale) for decisions.
Big Data is supposed to empower us to make informed decisions, right? That’s part of the whole “data-driven enterprise” movement: making business decisions based on the data.
But here’s the thing: Big Data is often inherently messy. As a result, the answers you get back aren’t necessarily a clear picture of the situation.
So how do you put messy data to use? I’ve found three success stories that help show how even messy data can be useful, as well as a few reality checks on what leaders should expect.
UPS Improves Truck Maintenance
One of the issues you have to accept with Big Data is that it may not tell you causation. So if you want to know why a radiator hose keeps breaking, Big Data may not be the tool to use.
But that doesn’t mean the messy data from a sensor can’t be extremely useful.
Take, for example, the delivery company UPS.
UPS uses “messy” sensor data to detect heat or vibration patterns that correlate with breakdowns. Mind you, the data didn’t identify an exact causation between the vibration and the part’s breakdown, but the sensor data revealed that certain vibration or heat patterns correlated with a breakdown.
UPS didn’t need to know whether the vibration or heat caused the problem. Just identifying these patterns as early breakdown indicators saved the company money.
Google Flu Trends’ Successes and First Failure
It’s hard for me to imagine a data type messier than search data. People routinely misspell words — assuming they’re using the same language in the first place — and anything can trigger a search, from the vagaries of the human mind to a news report about a particular disease.
How do you clean that up? In general, you don’t. But by working with really large data sets, it turns out that data can still be useful.
Google has demonstrated this in a number of ways, from its success with Google Translate to Google Flu Trends.
Dr. Nicholas Diakopoulos, a Tow Fellow at Columbia University’s Graduate School of Journalism, recently wrote about how Google Flu Trends uses messy search data to show the spread of influenza-like illnesses two weeks before the CDC.
He also points out that last season, Google Flu Trends had its first failure and overestimated the flu in the U.S.
Its successes continue to impress experts, but the failure should serve as a warning: When you’re dealing with messy, big data sets, there’s no guarantee you’ll get the insights you expect, Diakopoulos warns.
“The Google approach suggests a certain data vigilantism comprised of smart people wielding smart algorithms to act as sentinels against faulty inference,” he writes. “Big data vigilantism can help your company cope with two of big data’s main issues: messiness and sampling bias, and ultimately help contribute to growing your confidence in wielding big data in your decision process.”
Ford’s Faster Horse
Everyone’s familiar with Google’s use of Big Data, and the UPS example is very focused.
Is messy data limited to delivery trucks and search engines?
If you want to read about how broadly messy Big Data can be applied for business use, check out this piece on Ford’s use of Big Data to address a range of business problems.
The article doesn’t specifically mention “messy,” but I think it’s safe to assume that the 25 gigabytes of sensor data generated every hour by Ford’s Energi line of hybrids is probably messy. And it’s for certain that the social network and blog data all count as messy — and unstructured — data.
Ford puts the data to use to improve its cars and design cars that better reflect what customers want.
“The car manufacturing cannot operate anymore without understanding every aspect of its production as well as how its drivers are using the cars,” blog writer Mark van Rijmenam states. “Competition is fierce and those companies that obtain valuable insights from big data will outperform their peers. Ford is driving in the right direction with its big data strategy to be ahead of it competitors.”
Of course, the flip side of that is that Ford has also given us a perfect example of how sophisticated market research can fail, with the Edsel.
It’ll be interesting to see whether messy, big data sets lead to more success stories or more Edsels.