I’ve been sifting through a lot of content about Big Data recently, and most of it can be summed up as “A cautionary tale about the perils of Big Data” along the following motifs:
- Big Data can be biased
- Big Data will be misused
- People will get Big Data wrong, and it will cause disaster. Result: The earth will be consumed in a fiery ball. (I may be exaggerating about that last part.)
Some are certainly worth reading, particularly if you’ve never read anything about this. A recent piece in ReadWrite.com nicely sums up the major talking points, although I do have serious questions about what’s going on with that FBI/Excel example.
Other articles border on the hysterical and clueless. Andrew McAfee nicely sums those up, while enthusiastically cutting them down to size.
I’m not saying there aren’t major ethical and legal questions created by Big Data technologies and what they can do. I’ve written about these concerns myself, and it still freaks me out when I realize how much Google and Facebook can ascertain about me — all while I realize much of that is my own fault.
We get it: With Big Power comes Big Responsibility or whatever.
But let’s face it: Once a tech breakthrough happens, there’s just no getting the horse back in the barn. So why flail our arms about it while yelling, “The sky is falling”? After all, these things aren’t happening in a vacuum.
What this conversation needs is a bit more focus. Here’s what I suggest:
Go back to Data Management 101. For some reason, you put “Big” in front of data and everybody starts reinventing the wheel. It may be Big, but it’s still data — and data management professionals know a thing or two about how to manage it.
In your rush to find someone with “Hadoop” on their resume, don’t overlook the classical data management skills and expertise. A smart data professional can learn the technology, but data quality, metadata, business rules, data governance, data cleansing — these are skills that deepen with experience.
They’re also like black dress pumps: They style well with any and every type of data.
“Those of us in the BI and data warehousing community see it as commodity skills, but in the Big Data community, this stuff is new to people,” Jill Dyché , vice president of Thought Leadership at SAS, said. “They know how to get data into a cluster and use MapReduce and parallelize stuff and acquire and customize certain open source code and list service providers for that. But they don’t necessarily know how to manage data.”
Realize Big Data is only part of the picture. Here’s a lesson from a journalist: Statistics make for a dull story. Truth: It’s darn hard to make people care about numbers or graphs or percentages or any of that. You always want to go for the human element — that’s where the real story is.
That’s also why some experts argue a data scientist’s real job is to humanize Big Data through storytelling.
The flip side of this is don’t ignore the human impact of how you’re using Big Data. Ask yourself, “If it’s leaked that we’re using data this way, will it cause a Twitter storm? Will it make the evening news? Will it wind up in court?” If the answer to the first question is yes, go have a long discussion with your boss. If the answer to the last two is yes, go have a long discussion with your PR and legal departments.
Let’s get real about regulations and legislation. Before you put Big Data to use, sit down with legal and have a long conversation about how compliance and privacy issues might factor into your work.
Likewise, push for real political discussions and education about the use of Big Data.
Yes, we’ve all heard the classic arguments about how the industry will regulate itself, because in the long run, that creates a competitive advantage.
Two problems with that: Big Data doesn’t belong to only one industry and we all know by the time business gets its act together on this, the damage will be done — as witnessed by the fact that the damage is already being done, according to the ReadWrite Enterprise article and others like it.
So let’s just go ahead and admit that, yes, technology sometimes needs to be coupled with responsible regulations, and this is one of those times. You might even want to write your own letter to Congress.
Big Data is exciting, there’s no doubt about that. I hope it does cure cancer. I hope it does make new robotics possible. I hope technologists and scientist haven’t figured out even a third of the ways we’ll eventually use Big Data.
Still. You obviously can’t get there by being careless with Big Data.
“… the more data has the potential to impact our organizations, the more humble and circumspect we should become in using it,” warns Matt Asay of ReadWrite Enterprise.
Agreed. Let’s just avoid being so circumspect that we’re paralyzed into helplessness.