In my previous post, I shared the ongoing debate about the most effective way to approach Big Data so that it will yield meaningful, useful and, hopefully, profitable findings.
The top two options are approaching data as an explorer versus Tom Davenport’s contention that you need to use a hypothesis, which I translate as using a more scientific-method based approach.
Explorer advocates say Big Data is too big for the typical reports-driven approach, and what’s worked for early adopters has been tinkering with the data to see what it reveals. Davenport and others contend that is a great way to waste time, spend money and create unhappy business leaders.
Jim Harris’ recent post on Big Data Theory adds further insight into the discussion on best approaches. For Harris, the first step is to ensure that you’re dealing with high quality data so that you can separate the noise from new insight.
Now, data quality is Harris’ “thing,” and who’s going to argue that it’s not important? Data quality ensures that you don’t make errors, right? Sure. But what he’s saying here goes beyond this obvious point.
Harris points to radio astronomers Arno Penzias and Robert Wilson, whose Big Data came from the sky. They expected silence from the dead of space, so they tried to use it to calibrate their equipment.
Instead, they heard noise. “Knowing” that nothing existed in space, they assumed it was an equipment problem. It wasn’t. A year later, they addressed all potential problems with the equipment — and by extension, the data — and, lo, the noise still existed.
That noise was caused by cosmic microwave background radiation and evidence lent support to the Big Bang Theory. Its discovery led to Penzias and Wilson winning the 1978 Nobel Prize in Physics.
Harris says today a sort of Big Data Theory has come about, which says Big Data will challenge “steady state theories that have been the bedrock of the status quo within the data management industry for decades.” The question is, when Big Data challenges what we “know,” will we dismiss it as mere “noise?”
“Even though big data analytics will reveal wonders, I can’t help but wonder how often the tepid response to it will be: ‘Yeah, well that might be what Big Data shows. But it’s just a theory,” he writes.
So where does that leave us on exploring Big Data versus using a more methodical, science-based approach?
Another option might be: Don’t choose. Instead, take a lesson from Charles Darwin as he set sail on the Beagle and start with informed observation and an existing hypothesis to prove or disprove.
Darwin was well versed in the then-popular theory that species were stable because of “designed creation.” When he observed animals during the Beagle’s two-year trip, his findings among the Galápagos Islands conflicted with the accepted beliefs of the time, and led Darwin to develop his theory of evolution by natural selection.
Likewise, when you explore your set of Big Data, you will be informed by what you know about your business.
Rather than prove that this or that is working, look for evidence to disprove what you’re doing (your hypothesis or theory). The strength of any theory or hypothesis isn’t revealed when you set out to prove that it is correct. The strength only shows when someone attempts to disprove it.
The reason disproving is a better approach is simple. The data or evidence that disproves it also may point to a new, potentially paradigm-shifting, theory that’s better than any you could have predicted.