My friend and I have a running joke. We’ve decided we liked 35 so much, we’re going to stick with it for a decade or so.https://o1.qnsr.com/log/p.gif?;n=203;c=204663295;s=11915;x=7936;f=201904081034270;u=j;z=TIMESTAMP;a=20410779;e=iOkay, it’s not particularly clever, but to us, it was worth a quick laugh. It probably wouldn’t be so funny if I handled data quality at Paytronix, a company that manages customer loyalty programs for restaurant chains.
When Paytronix analyzed its data quality, it found that approximately 10 percent of customers lie about their age. Another 18 percent leave it blank. Couple that with about 25 percent of restaurants that don’t even ask, and you’ve got a real problem with a significant demographic identifier.
That’s not an unusual issue with Big Data, according to Gartner Research Director Svetlana Sicular, who shared Paytronix’s approach to solving the problem in a recent post.
Like Paytronix, many individual companies are struggling to find their own way with Big Data and quality issues, she writes. These problems can introduce unintended bias into the data, she adds.
While that’s not unexpected, it is troubling, especially when you couple it with Tom Davenport’s recent remarks about how much time data scientists already must spend just trying to analyze Big Data sets.
While writing “Big Data@Work: Dispelling the Myths, Uncovering the Opportunities,” Davenport observed data scientists at work. During his talk at VentureBeat’s DataBeat conference, Davenport said data scientists would need better data integration and data cleansing tools before they’d be able to keep up with the demand within organizations.
I’m sure Davenport will receive a number of vendor pitches after a statement like that. Feel free to send them straight to him.
Bottom line: There’s still a lot to work out with Big Data, from tooling to best practices. What can you do in the meantime?
Here’s a tip: Check out Rick Sherman’s Big Data implementation checklist, which appeared on TechTarget’s SearchBusinessAnalytics. Sherman is a data management expert, and while he doesn’t specifically address data quality, he does offer some valuable and applicable advice: Set realistic expectations and manage them proactively.
If data quality is a problem, be forthcoming about that. If your data scientist is too busy with the basics to handle multiple projects, let that be known, too.
“If you let expectations get out of hand and then can't meet them, your big data implementation could be viewed as a failure regardless of the business value it does produce,” Sherman warns. “Constrain expectations to realistic levels at the outset — and continue to do so throughout the project.”