There’s a lot of talk about the data scientist since Big Data came on the scene. But when it comes to gaining truly valuable information from Big Data, you may be better off focusing less on the science and more on exploration.
Hypotheses are a central part of science, and thus far, that’s been a major part of how we use data, according to Jill Dyché, the vice president of Thought Leadership at SAS and a veteran data management expert. Business reports, for instance, are usually strongly driven by hypothesis of one sort or another, she points out in a recent Harvard Business Review column. It’s available with a free HBR blog registration.
But when it comes to Big Data, the best and most profitable findings are the result of what she calls “low-hypothesis exploration.”
Explorers usually have only a vague goal in mind: Find a new route, be the first to visit the North Pole, discover the Fountain of Youth. In many ways, this gives them the upper hand when you need new knowledge and discoveries.
This isn’t the first time I’ve heard that it’s better to approach Big Data as an exploration. And it makes sense, when you consider that much of the data is text-based or weblogs and so big, it’s very hard to predict what it’s going to tell you.
Dyché gives several examples of real Big Data discoveries that were found through exploration rather than more traditional reports. One company managed to increase its per-shopping-cart revenue 16 percent in one month, thanks to this “knowledge discovery” approach, she writes.
But what impressed me is the example of Stanford University researchers, who used this approach on breast cancer research and learned that non-cancerous cells also contribute to cancer cell growth.
Of course, this approach also yielded what I see as one of Big Data’s more questionable (thus far) findings: A commercial lines insurer “team found that ‘loose affiliations’ with low-income friends was an indicator [of] a higher propensity to file fraudulent claims.”
That, too, was found using an informal approach to data exploration, and it may show the weakness in this approach. Because while the finding may be legit — and far be it from me to question these people — it does seem to open up questions of observation bias, as well as issues about what’s actionable. It strikes me as a somewhat legal and ethical murky area when you think about using it as actionable data. (Should I automatically be subject to more audits because I have more low-income friends than you? Would that open the company up to legal actions?)
For the most part, though, taking an open-minded approach to exploring Big Data has lead to some very concrete, positive findings.
Here’s the interesting part, though: This may actually be the best argument I’ve seen for keeping Big Data in silos from existing systems.
“Running discovery trials on big data should be a continuous process, where the results may feed more traditional business intelligence or drive additional discovery tests,” Dyché writes. “Sometimes this means isolating big data efforts from traditional analytics programs where delivery processes and organizational roles are already entrenched.”
Check out the full piece, which explains why it’s hard for companies to take an “exploring” approach to any data and how you can change that.
For more on this topic, you might also want to download Nov. 27’s recording of “The Briefing Room.” I personally haven’t had a chance to listen to it yet, but it was promoted as a discussion on how to explore Big Data and featured veteran EMA Analyst John Myers, as well as a briefing on Big Data analytics vendor Alteryx.