John Lucker, principal for Deloitte’s Advanced Analytics and Modeling Sector, says there are best practices companies can follow to ensure the best outcomes with business intelligence. He outlines those practices to IT Business Edge’s Loraine Lawson in the second part of her interview with him. In the first half of this interview, Lucker explained how data virtualization helps with BI.
Lawson: Many organizations struggle with integration. Are there best practices? Is there a way to think through this issue before you approach it so that you can avoid these problems?
Lucker: There are a variety of best practice areas around how to prepare an organization for that type of integration and BI success.
A lot of organizations really lose sight of the whole 80/20 rule around data hygiene and what data needs to be available to them. What’ll end up happening is organizations never quite get done with what they want to achieve with BI and making data available to an end user. So focus on the 80/20 rule around data availability and integration is important.
The other thing, as far as best practice goes, is a lot of organizations don’t look broadly across the organization to bring together the rich array of internal data and external data that they have. They don’t often look for ways to create synthetic information from the internal and external data to paint a picture for either the business user or for the BI user to use to paint with that gets at the key performance indicators or the management metrics in a more of an accurate or streamlined way. What ends up happening is that a lot of these tools and systems tend to be more complicated than a end users are able to work with, because the data is integrated and then just presented to the end user as if they had to be a very deep BI user.
In my experience, from a best-practice perspective, it’s best to not presume that you're going to end up with a bunch of BI techies all over your company. People have a somewhat limited aptitude for this stuff and while each division may have one or two or three people who become their go-to people, if you expect any level of broad adoption of BI and information management tools and precepts inside an organization, there has to be a lot of thought around how this data is integrated and how it’s, to some degree, spoon fed for consumption.
I would say a lot more time is spent on the technical nuances, delivery mechanisms and tool availability, and not enough on, “So how are we going to help our business users ask the really tough, vexing questions.” And, “Is the tool really intuitively usable to people who don’t have degrees in computer science?”
Lawson: You mentioned “synthetic information.” Can you just clarify for me what you meant? I’ve never heard that term.
Lucker: Sure. Synthetic information is creating new data variables or observations from data that doesn’t exist in a raw form. So, it’s taking internal and external data, and creating often very elaborate calculations with it and coming up with a new way of looking at data that’s more secondary to the primary data itself.
I’ll give you an example: In some of the advanced analytics work we do, we end up getting into geospatial-type of observations around the way people behave or the way customers behave. An example of things might be the consumption of a product based on the distance they live from the nearest outlet for that product. The distance is typically not stored anywhere, but you do have the address of where the product is available and you have the address of the person.
So you can create a synthetic variable that calculates either their travel distance or a crow’s flies across space. Then maybe you combine that with other information to create the ratio of sales relative to distance traveled. That might be a very important metric for certain types of retailers. I’m just making this up as an example. You might see that sales decline as distance increases.
That’s a synthetic data field, which doesn’t exist anywhere, but it uses potentially three or four different types of information from different data sources and then it ends up being one piece of information.
Lawson: What are the lines in terms of giving users access to BI tools? My thought is, as we start to open up the data, we are also creating a situation where the data can be misused to prove someone’s pet project. Are there ways to ensure people aren’t misusing data, by creating the wrong kind of synthetic information or adding up two and two and getting five?
Lucker: That is a huge issue, and some type of quality control over reports should be the part of any governance. Every business needs to have not just technical data governance and IT governance, but they need to have a business information or analytics governance process as well.
You’ve got 10 different people banging away on the data and you don’t necessarily have any idea of knowing, well, how’d you calculate this. If you're making critical business decisions from information that’s presented to someone, then there needs to be an accountability and an audit process back to where’d this data come from and was it calculated correctly. I think that’s an important question and an important management topic is creating analytic governance.
Lawson: Is that something businesses are aware of and do?
Lucker: I think generally they are for production-level reports. Usually, I’m creating a report that’s going to be used to routinely run and people are going to look at it and make decisions from it, or at least monitor the progress of a company using it. I think most companies understand the importance of having a peer review, QC process for that.
But if you’ve got a bunch of people who are acting like cowboys all over a company —probably not a good choice of words — but people who are business analysts, data analysts, analytic analysts who are out there querying and summarizing and reporting, I think there needs to be a governance process to make sure that before a decision is made using some of those ad hoc pieces of information, that the data has been checked and validated in some way.
That becomes very challenging when things are done in real time and decisions and people and executives are expecting questions to be asked and decisions to be made from the answers quicker than ever before.
We see examples of this in real life. If you think about the last election cycle, there was a lot of criticism around some of the information gathering mechanisms for exit polls and about how wrong they ended up being. When they traced that back, it had to do with a combination of things ranging from sampling methods of the data all the way to how some of the calculations were being made.
It illustrates something that I think we’re all familiar with. We’ve seen a lot of that going on, where there’s the media or business itself is making instantaneous decisions and sometimes those decisions are made based on thinking that isn't as mature as it should be — like some of the program trading, as an example. You know how every once in a while you read about these giant movements in the stock market that are triggered by some kind of anomaly in some data that triggers a program to sell a bunch of stock or buy a bunch of stock.