The government released 4.4 million medical payment records this week as part of the Open Payments database, and it’s already attracting national headlines and criticisms for being incomplete and slow.
It’s a major reminder that while open data may be free, it isn’t necessarily clean.
NPR, the Wall Street Journal and Forbes, have all reported on the controversial data release, which is required under a provision of the Affordable Care Act. The records show $3.5 billion in payments made by pharmaceutical and device companies to doctors.
The release is being criticized for a number of reasons, but the actual data problems boil down to two major issues:
- There’s no context. The records don’t show whether these payments represent potential conflicts of interest or legitimate financial relationships, as this Fierce Health IT article explains.
- The data has major data quality problems. Data is missing, incomplete and, in some cases, potentially wrong. CMS has already omitted one-third of the payment records submitted last year due to data problems that could lead to mistaken identification, NPR reports. On top of that, a chunk of the current data is incomplete. About 64 percent of the total spending listed doesn’t specify which doctor or hospital received the money, according to ProPublica.
“The release could be viewed two ways: as a detailed view of the underbelly of U.S. medicine, or a flawed, sloppy release of partial information that will confuse rather than elevate understanding,” Politico writes.
I don’t know about that. Why can’t it be both? Even with these significant data quality problems, ProPublica’s first deep dive into the records provides the public with some insight into spending trends that are noteworthy.
Also, this open data isn’t the only or even first to face major criticism over data quality and usability issues. Experts have long warned that open data faces two major problems:
- Data quality. Open data is often raw data, and as a result, it’s often unusable and inaccessible.
- The potential for misuse. The Wall Street Journal covered how this is a major point of contention over the Open Payments data, but it’s actually an issue with other open data sets, experts say.
“Big data is our generation’s civil rights issue, and we don’t know it,” Alistair Croll, an industry watcher and analyst, wrote back in 2012.
It seems we’re finally learning about it the hard way.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.