I remember the first time I heard the terms “business intelligence” and “analytics.” Business. Intelligence. Yep, that was something I could get behind.
Then I figured out that it really amounted to business statistics, automated to a certain extent by a computer. It was a bit of a bummer, really.
It seems the term “data science” is likewise overrated.
IT consultant Robin Bloor, in a fabulous piece, points out that there’s really no such thing as “data science.” In fact, what we’re calling data science has very little to do with science and everything to do with mathematics — specifically, statistics.
“If you are already tired of the term ‘big data,’ but not yet tired of the term ‘data science,’ let me help you get there as swiftly as possible,” Bloor writes. “If there were a particular activity devoted to studying data, then there might be some virtue in the term ‘data science.’ And indeed there is such an activity, and it already has a name: it is a branch of mathematics called statistics.”
It’s an excellent point from a devoted word pendant. Bloor loves to explore language, as I’m sure you know if you follow Bloor’s Twitter feed.
Bloor says he’s writing a philosophy rant, but it has very real, practical implications for those of you pursuing Big Data or really any kind of data analysis:
- You won’t hire a “data scientist” to perform “data science,” because neither exists in any practical way. So quit worrying so much about data scientists. While Bloor acknowledges that it is possible that someone could have the actual credentials of a data scientist —one who manages data and data flows, understands a specific business industry, and knows how to use statistics —that person would be “almost as rare as hen’s teeth.”
- What you will do is create a team to handle this kind of business statistics.
- The team will probably fall under “research & development,” rather than IT.
“If you wish to develop such a capability, the sensible way to proceed is to put together a multi-disciplinary team of individuals with a set of well-defined goals who collectively possess the required skills,” Bloor writes. “The one in charge should have a title like Project Director or Research Director. He is not obliged to wear a white coat.”
The term “data scientist” isn’t the only data-related term that suffers from a philosophical definition problem. In a recent Information Management column, Forrester analyst Michele Goetz contends that “data,” “data quality,” and “data roles” are terms that differ between IT and marketing.
“When IT talks about data, it is talking of the physical elements stored in systems,” Goetz states. “When marketing talks about data, it is referring to the totals and calculation outputs from analysis.”
Data quality refers to completeness, not whether the rows and columns translated correctly. And data rules mean algorithms, not transformation.
Once again, the terms we use are creating a disconnect between IT and the business.
So where does this leave IT?
“IT still needs to support the heavy data integration requirements across marketing channels and 3rd party data sources,” Goetz writes. “Creating consistency through data integration and quality best practices are critical to ensure integrity in behavioral and attribute linkage.”
IT is still responsible for data quality in terms of data integration, she writes. But that’s different from what marketing means when it talks about data quality as completeness, which is one reason that data quality matters less when it comes to Big Data.
If you’d like to learn more about taking a team approach to “data science,” check out Information Management’s On-Demand viewing of “Building the Data Science Team,” which features analyst Evan Levy, vice president, Business Consulting, at SAS Institute.