The Pros and Cons of Combining Master Data Management with Big Data

    Slide Show

    Four Steps to Ensure Your Big Data Investment Pays Off

    Confusion surrounds the topic of how to bring some sense of order to Big Data. Depending on the day, the discussion might come down to data quality, data governance or master data management.

    Here’s a hint: One of these is much less necessary than the others. You should always understand the quality of your data — big or otherwise. And it’s just basic legal smarts to create governance rules about data lest you fall afoul of regulatory compliance.

    But when it comes to master data management and Big Data, you may be better off leaving each to its own. If you’re not clear on why, I recommend this post by veteran integration technologist Kumar Gauraw, who takes you through his thought process on why MDM and Hadoop don’t match.

    “Master data is, usually, relatively small in volume when compared with transactional data and definitely much smaller than the big data when it comes to volume,” he writes. “Big data, on the other hand, is about much larger volumes. In fact, big data is about processing a data volume so large that the current RDBMS databases struggle to handle it.”

    So, Big Data might fill in MDM gaps or MDM might serve as a reference source for customer or product data when running Big Data analytics. But for the most part, Gauraw concludes that they’re two separate systems with different goals.

    Until recently, that seemed to be the conclusion most people reached about MDM and Big Data. Then last fall, Capgemini’s Christophe Leroquais penned a piece demonstrating three possible points of intersection between these two systems:

    1. Big Data can feed MDM, he points out, by enriching the data in the MDM hub.
    2. MDM can feed Big Data by “providing the data model backbone to bind the Big Data facts.”
    3. MDM can help navigate within Big Raw Data. This fits in with a relatively new use case for Big Data called exploratory analysis, he writes.

    “The user does not always know exactly what he is looking for; he may discover insights by navigating through the raw data,” writes Leroquais. “In order to avoid getting lost within the data ocean, it is necessary to navigate while keeping sight of reference points. This is where the exploration must call on the MDM database.”

    He adds that this last use case presents major challenge: MDM tools would need to evolve.

    That certainly helps in the short term, but I think it actually leaves the real question unanswered. What everybody really wants to know is whether there might one day be an MDM tool designed for Big Data sets.

    The real barrier here, as Justin Risch points out in a recent blog post, is that MDM is designed to work with structured, relational data, and Big Data sets are often unstructured data. Risch focuses specifically on how MDM might work with NoSQL databases, such as Cassandra.

    Big Data Analytics

    “Unfortunately, NoSQL and MDM do not go well together conceptually – Master Databases are relational by definition,” Risch writes. “So how, then, do we get the two to play nicely without feeling like we’re shoehorning concepts together?”

    His reason for wanting to merge MDM with NoSQL is simple: Master data is found in all web applications of sufficient size and complexity, he writes. For instance, an order will have at least customer and product data. There’s also the coming data deluge from the Internet of Things, which is getting attention from MDM vendors, as Software AG’s Charlie Greenberg points out.

    The problem is, how do you capture master data without giving up the scalability and flexibility of NoSQL? Risch envisions several options, including using Key-Value schema and a Document database, but finally settles on a graph database as a potential means of merging both systems.

    Risch found one example of a NoSQL graph-based solution, which is the Spectrum MDM Hub by Pitney Bowes. Alas, he wasn’t able to vet it and the documentation he finds is a year old. But Risch, at least, is hopeful.

    “In the end, Graph Databases may be the key to merging the best of both of these systems,” he writes. “There may not be a well-known, easily implemented, out-of-the-box-solution for integrating the two concepts at the moment– but the future is bright. I look forward to seeing more from the best in the MDM industry.”

    Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

    Loraine Lawson
    Loraine Lawson
    Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.

    Latest Articles