The Pros and Cons of Combining Master Data Management with Big Data

Loraine Lawson
Slide Show

Four Steps to Ensure Your Big Data Investment Pays Off

Confusion surrounds the topic of how to bring some sense of order to Big Data. Depending on the day, the discussion might come down to data quality, data governance or master data management.

Here’s a hint: One of these is much less necessary than the others. You should always understand the quality of your data — big or otherwise. And it’s just basic legal smarts to create governance rules about data lest you fall afoul of regulatory compliance.

But when it comes to master data management and Big Data, you may be better off leaving each to its own. If you’re not clear on why, I recommend this post by veteran integration technologist Kumar Gauraw, who takes you through his thought process on why MDM and Hadoop don’t match.

“Master data is, usually, relatively small in volume when compared with transactional data and definitely much smaller than the big data when it comes to volume,” he writes. “Big data, on the other hand, is about much larger volumes. In fact, big data is about processing a data volume so large that the current RDBMS databases struggle to handle it.”

So, Big Data might fill in MDM gaps or MDM might serve as a reference source for customer or product data when running Big Data analytics. But for the most part, Gauraw concludes that they’re two separate systems with different goals.

Until recently, that seemed to be the conclusion most people reached about MDM and Big Data. Then last fall, Capgemini’s Christophe Leroquais penned a piece demonstrating three possible points of intersection between these two systems:

  1. Big Data can feed MDM, he points out, by enriching the data in the MDM hub.
  2. MDM can feed Big Data by “providing the data model backbone to bind the Big Data facts.”
  3. MDM can help navigate within Big Raw Data. This fits in with a relatively new use case for Big Data called exploratory analysis, he writes.

“The user does not always know exactly what he is looking for; he may discover insights by navigating through the raw data,” writes Leroquais. “In order to avoid getting lost within the data ocean, it is necessary to navigate while keeping sight of reference points. This is where the exploration must call on the MDM database.”

He adds that this last use case presents major challenge: MDM tools would need to evolve.

That certainly helps in the short term, but I think it actually leaves the real question unanswered. What everybody really wants to know is whether there might one day be an MDM tool designed for Big Data sets.

The real barrier here, as Justin Risch points out in a recent blog post, is that MDM is designed to work with structured, relational data, and Big Data sets are often unstructured data. Risch focuses specifically on how MDM might work with NoSQL databases, such as Cassandra.

Big Data Analytics

“Unfortunately, NoSQL and MDM do not go well together conceptually – Master Databases are relational by definition,” Risch writes. “So how, then, do we get the two to play nicely without feeling like we’re shoehorning concepts together?”

His reason for wanting to merge MDM with NoSQL is simple: Master data is found in all web applications of sufficient size and complexity, he writes. For instance, an order will have at least customer and product data. There’s also the coming data deluge from the Internet of Things, which is getting attention from MDM vendors, as Software AG's Charlie Greenberg points out.

The problem is, how do you capture master data without giving up the scalability and flexibility of NoSQL? Risch envisions several options, including using Key-Value schema and a Document database, but finally settles on a graph database as a potential means of merging both systems.

Risch found one example of a NoSQL graph-based solution, which is the Spectrum MDM Hub by Pitney Bowes. Alas, he wasn’t able to vet it and the documentation he finds is a year old. But Risch, at least, is hopeful.

“In the end, Graph Databases may be the key to merging the best of both of these systems,” he writes. “There may not be a well-known, easily implemented, out-of-the-box-solution for integrating the two concepts at the moment– but the future is bright. I look forward to seeing more from the best in the MDM industry.”

Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

Add Comment      Leave a comment on this blog post
Mar 31, 2015 1:59 PM Ilya Geller Ilya Geller  says:
Language has its own Internal parsing and statistics. For instance, there are two sentences: a) ‘Fire!’ b) ‘In this amazing city of Rome some people sometimes may cry in agony: ‘Fire!’’ Evidently, that the phrase ‘Fire!’ has different importance into both sentences, in regard to extra information in both. This distinction is reflected as the phrase weights: the first has 1, the second – 0.12; the greater weight signifies stronger emotional ‘acuteness’. First you need to parse obtaining phrases from clauses, for sentences and paragraphs. Next, you calculate Internal statistics, weights; where the weight refers to the frequency that a phrase occurs in relation to other phrases. After that data is indexed by common dictionary, like Merriam, and annotated by subtexts. There is no Big Data: there is some not yet structured data Reply
Apr 7, 2015 11:22 AM @Jmichel_franco @Jmichel_franco  says:
This is a very insightful article, thanks Loraine. In my opinion, Master Data Management should take care of the system or record. And a system of record should be relatively stable over time and focus in its core capability (which is to be a point of reference in the case of MDM). But, the business goal behind MDM also mandates to assemble a 360° view, which is wider in scope than the system of record . For example, a 360 view of customer should include the system of interactions/transactions (history or customers interactions, purchases...), the system of insights (scoring, segmentation, next best actions), etc. MDM is a prerequisite to create this 360° view. But this view is a moving picture that needs to be augmented over time, and for that you need Big Data as well. This 360° view may typically contain the history of all customers interactions such as click-stream in order to reconcile the customer journey. It would include analytical capabilities too. And this information has to be highly shared in real time since this history is key to define the next best action across customer touchpoints. So combining them is a way forward, and you need technologies enablers for that. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.