Ian J. Kalin, the new chief data officer for the Commerce Department, certainly seems to understand that data is the new oil. That makes sense, given his roots at a small energy startup company and his work with the U.S. Energy Data Initiative. Surely, if anyone can understand the value of data as fuel, it would be Kalin.
So when TechRepublic contributor Alex Howard asked him to compare an “infinitely replicable digital commodity” to a natural resource like oil, you’d expect nuance. His answer doesn’t disappoint:
If I could argue the analogy, just for fun, data may be infinitely usable, in terms of its accessibility on the internet, so to speak, but in the technology tools to perform the ETL [Extract, Transform, and Load], the insights that will allow good data cleansing to be performed, the stewardship and obligation to ensure the data set that is part of a certain standard maintains the standard, when the standard revises, there's still labor, there's still a scarcity that prevents data quality from being instantaneous. That principle, that reality that you don't have enough people who know how to do this stuff and make it better, I think drives data to be a type of scarce resource, or at least quality data, and brings it closer to the analogy of fuel.
I know what you’re thinking: Finally! Someone who understands how much integration and data quality matter! You can imagine how thrilled I was, after writing about this for nearly 10 years.
Still, it’s a concept that may be problematic for a government CDO — and possibly any CDO. And it’s not something he mentions in passing: This theme comes up two more times in the interview.
The second instance is when he relays an encounter during an international panel on open data. He noted that the French pay for government data that other countries might consider open data, (i.e., data sets freely available to all citizens). The French became angry at his mention of this idea, and informed him that their quality would go down if these data sets were open.
“I have to confess that was a lesson for me. It was an assumption that I had wrong,” Kalin says. “I learned a lot in that one meeting about some of the international differences for such a product.”
Finally, Howard pushes the point and asks him point-blank about a service level agreement for government APIs based on quality — raw, dirty data is free, while clean data costs. He doesn’t dismiss it; instead, he points to an NOAA experiment that he says is engaging people in a “very fair process to determine how they should start to think about it.”
Kalin worked at an organization that essentially repackaged and sold open data, so it makes sense for him to think this way. But it’s an idea that merits deep debate for CDOs in other organizations — and especially government CDOs.
On one hand, he’s right: Data quality does cost money and it adds value. Why shouldn’t the Commerce Department, or for that matter, any organization, be able to offload the costs, both internally and externally? On the other hand, shouldn’t data quality be a priority for the Commerce Department or any organization from the start? You have to wonder, what kinds of wayward decisions might the department make if data quality isn’t a priority within the organization? And, to borrow a line from ZDNet Editor-in-Chief Larry Dignan’s sermon on government data, how much money might the government save if it leveraged data quality for cost savings? After all, that’s the primary reason organizations take up data quality in the first place: Bad data costs organizations real money in concrete ways.
There are also broader organizational goals to consider. In the Commerce Department’s situation, you have to ask how paying for data quality will play out against the administration’s stated strategic objective, which is to deliver not just open data, but “quality, timely and well-described open data?”
Hopefully the question will be more philosophical debate than practical. While some countries (ahem, the UK) have experienced major issues with data quality within published open data sets, that hasn’t been a major problem here so far. One data start-up even told me privately that U.S. open data sets aren’t as “unclean” as enterprise data sets.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.