Are Big Data Solutions the Key to Solving Health Care's Data Problems?

Loraine Lawson

Oracle recently released a report noting, among other things, that health care isn’t prepared to manage Big Data. That’s hardly shocking, since health care seems largely inept at managing any data, much less Big Data, which is generally defined as having one or more of these characteristics:


  • Variety, meaning structured, semi-structured and unstructured data
  • Velocity, meaning you want it moved at high speeds
  • Volume, think petabytes and terabytes


I happened to be discussing this definition with a friend who writes about health care IT recently. And the more I thought about it, the more I realized that maybe health care IT doesn’t have a data problem so much as it has a Big Data problem.

What do I mean? Well, most health care records actually fall into the domain of Big Data more than your typical, relational database kind of data. Specifically:


  • Most health care records are actually unstructured data, e.g., text documents or images. Doctor’s notes on patients, nurse’s care plans, lab results, x-rays and MRI results all fall well outside the domain of structured data. In fact, except for billing data, most of what we consider health care records would seem to fall into a variety of data types other than structured data. So, clearly, health care IT is dealing with a variety of data types.
  • Health care data is often high volume, particularly when you’re talking about a state or national electronic health records system. What’s more, when you deal with images, like x-rays or other scans, you’re increasing the data’s volume in terms of storage requirements.
  • Finally, most health care records need to be moved relatively quickly, and as individual records. So, if I’m having a consult tomorrow with a surgeon, then the x-rays need to be at the office by morning. Right now — I kid you not — this is handled by me, driving between the two locations. But there’s no reason with the right technology that those files couldn’t be sent electronically. Besides individual records, being able to process medical records at high speeds across a geographical area would help doctors identify health trends and possible disease outbreaks sooner. So, velocity will need to play a role in any effective health records system.


I’m in no way an expert, but after writing and reading about Big Data and health care for a few years, it looks like there’s a clear use case for Hadoop and other Big Data technologies in health care.

In fact, if I may be so bold, maybe health care’s data problems are not entirely caused by niche vendors, data silos and a lack of investment. Maybe the reason health care IT is such a mess is because the existing tools couldn’t handle Big Data needs in an affordable way.

If that’s true, then emerging technologies such as the Apache Hadoop stack could be just what the doctor ordered.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
Aug 21, 2012 11:31 AM Dr George Margelis Dr George Margelis  says:
There is a lot of days in healthcare, and as you rightly state in various formats. The greatest challenge is to develop systems that separate the clinicians from the technology so that their normal record system generates the days snobs puts it into the relevant data system and at the same time takes the relevant data from the big data analytic system and delivers it to the clinician at the point of care to improve their quality of care. The other challenge is to ensure that the days that goes on us accurate and reliable, which is a skill we must teach our clinicians. Reply
Sep 18, 2012 10:08 AM Ruan O'Tiarnaigh Ruan O'Tiarnaigh  says:
I agree with all the comments above but would suggest that having the data is not necessarily the solution. Having a mechanism to create reliable knowledge from this data is. There then remains another question. Who is best positioned to do this task. Clinicians may not be statisticians or conversant with the tools of data science required to glean that knowledge from the data. Perhaps what is requried along with better tools is better communication channels to allow clinicians to work more closely with the data scientists. A suitable forum, with access to the right data and a mechanism for a body of clinicians to cooperate to a) select the most relevant answers to be looked for as well as b) reviewing the results would be marvelous. However this relies on a level of altruism which may not be prevelant. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.