Jill Dyché (@jilldyche), vice president of Thought Leadership at SAS, explains to IT Business Edge’s Loraine Lawson how Big Data is changing what’s possible with BI and advanced analytics — and how the tools are evolving to make Big Data usage easier.
Lawson: Primarily when we talk about Big Data and how enterprise systems, even with CRM (customer resource management), it's more bringing data out rather than changing how they function. Is that correct?
Dyché: Yes, exactly. There are two applications of Big Data. One is to do something brand new that just hasn't been cost effective before and the other is to make something that we're currently doing faster.
Just in the world of SAS, for instance, SAS customers have been doing credit risk scoring for years, but now SAS also has a Big Data platform, it's called HPA. Now with HPA, these guys can do it in seconds. They can predict a risk score and give somebody a credit rating in seconds, as opposed to flogging through this long business process.
So you're right in the sense that we are leveraging existing data, in that BI is still in a lot of ways isolated. But if you look at BI, there's the reporting and dashboard sort of environment and then there’s the advanced analytics environment. From the advanced analytics standpoint, the Big Data message is a lot more compelling than just putting data into Hadoop so that I can actually access it from my dashboard.
Lawson: Can you talk about how that advanced analytics piece of Big Data works and what you see people doing with that?
Dyché: Let's just presume that when we're talking about Big Data, we're talking about the new big data infrastructure landscape, like Hadoop, open source and all that stuff, because there's still people in the data warehousing community that are insisting that Big Data is no more than 1 terabyte on a data warehouse. So just level sitting on that one.
If I'm saying based on yesterday's transaction, who has the potential to defraud me, Big Data’s low-cost commodity hardware, open-source software environment can pay for itself in weeks or months. Depending on what the business is trying to do with products, what we’ve seen in the automobile industry is it can pay for itself in terms of getting out in front of product defects and avoiding recalls.
I think the general rule of thumb -- and I don't want to generalize too much with Big Data because there's all sorts of new stuff going on -- but the more computational-intensive the processing is, the more compelling the new technologies are.
Lawson: We talk a lot about a talk or skill gap with Big Data. I've seen offerings for templates. Are BI tools changing the way they work to handle some of that? I know there are connectors, but what else is being done to make Hadoop user-friendly?
Dyché: Two things there. The first one is that most of the BI vendors are supporting connectors into Hadoop now, so essentially the Cognos and the Business Objects of the world can use Hadoop as a data source. If I've got specialized data on Hadoop that may or may not be on my data warehouse platform, which is pretty normal in Hadoop environments, then my BI tools can go against Hadoop as well.
The other scenario is that the BI vendors also understand with the Big Data infrastructure, there's a consumability issue, hence the rise of the data visualization tools. SAS has visual analyzer, there's ClickView, there's Tableau -- all are making their user interfaces a lot friendlier because of the need to be able to consume the data by the average user.
There’s this term, the “democratization of data analytics,” because we can't rely on specialty "data scientists" to help us translate the data in our Hadoop environments to make it meaningful. The scale is just too massive to rely on individual talent sets for that.
I'm writing a research report right now with Tom Davenport and we've talked about the data scientist role. What we're finding on the ground with our customers is the expectations for that individual role are so lofty, it's just become completely impractical to expect any one person to understand the data, understand the data sources, understand the data integration rules, understand the business rules, understand the meta data, understand the data access, understand data privacy, understand data security, understand how to cleanse the data, et cetera, et cetera.
It's a fun thing to talk about, but on the ground, the specifics of the role are very unclear. In the worst-case scenario, we're setting people up for failure.
Lawson: It sounds like distributing computing needs distributed human resources.
Dyché: One of my messages at Gartner (conference) was the whole data governance, master data management, data quality era is in the early adopter stage. Those of us in the BI and data warehousing community see it as commodity skills, but in the Big Data community, this stuff is new to people. You go to a Big Data conference and people are talking about either platform or applications. They're not talking about data, ironically.
It’s fascinating -- they don't have these skills. They know how to get data into a cluster and use MapReduce and parallelize stuff and acquire and customize certain open source code and list service providers for that. But they don't necessarily know how to manage data. So it's a huge career growth opportunity for people with classic data management training.