I’m starting to see more pieces about using master data management (MDM) with Big Data.
If the very idea gives you a headache, you’re in good company — but stick with me. I’ve been juicing vegetables, and feel energetic enough to tackle some questions about the topic.
Do people really combine MDM and Big Data, or are vendors just piling hype on top of hype?
Let’s not limit ourselves: Why can’t it be both?
Last December, analyst firm The Information Difference tackled this very question, surveying 209 companies about Big Data and its link to MDM.
Fifty-nine percent of the responding organizations asserted a link between MDM and Big Data, with only 7 percent reporting no link. Indeed, Information Difference CEO Andy Taylor noted there were “plenty of companies with both MDM and active Big Data projects.” He didn’t offer a specific statistic on that, but points to statistical overlap in this Computer Weekly column.
So, clearly, real companies are combining or thinking about combining the two.
And yet, most of the recent pieces on this trend are written by vendors offering MDM solutions, so it’s hard to argue there’s no hype about it.
I’ll leave it to a research firm to determine whether or not that’s driving the growth trends for investments in MDM and Big Data, but it’s worth noting that the survey found 67 percent said MDM was driving Big Data, while 17 percent in the survey saw Big Data producing new master data.
Why is this happening? Do CIOs just love torturing IT?
Once again, I say: Why can’t it be both?
Seriously, as far as I know, this has nothing to do with torturing IT.
Interestingly enough, a 2012 survey identified a few reasons behind this combination:
- Existing MDM data can be used to help drive Big Data searches.
- A desire to automatically identify master data with Big Data sets, such as spotting customer accounts. It’s easy to see how that would be a useful capability when it comes to social media data.
That makes sense for customer-centric organizations, which were the first to embrace combining Big Data with MDM. But it turns out the idea has spread to areas such as manufacturing and logistics, according to Yves de Montcheuil, vice president of Marketing at Talend, an open source integration company.
“Big data is now driving an essential part of the requirements for MDM programs as incorporating new types of data becomes a strong requirement,” de Montcheuil wrote in a recent Wired column. “Big data augments conventional MDM sources to provide a complete view of the required domain.”
How does combining MDM and Big Data work from a technology stand-point?
Yves de Montcheuil does a great job of explaining how this can work:
Adding ‘big’ to MDM does not mean that the master data hub will be stored in Hadoop (although some organizations are exploring the use of NoSQL databases), nor does it mean that its size will grow exponentially in a short timeframe. Rather, it means that some of the Big Data (or new data) will be managed in the MDM hub itself, linked from the MDM hub in a federated approach, or will simply benefit from the consistency, resolution and enrichment services that MDM provides.
To do this, you’ll need to shift how you think about MDM. Instead of focusing on the master data hub, you’ll need to see it as a key component of the larger application architecture, he says. And to do that, he argues, the MDM system needs to be services-based (he even uses the nearly taboo “SOA”).
In this scenario, master data is published to other applications, including Big Data tools, through services rather than simply stored in a hub.
Obviously, for some, this will mean rethinking how you approach MDM tools in the first place; de Montcheuil’s piece includes a lot of great information about this.
You might also read “Find MDM Success in a Big Data World.” It’s written by the CEO of MDM provider Stibo Systems, but like de Montcheuil’s piece, contains some helpful, vendor-neutral information.
As with all projects, the best approach is going to be to start small and build up — but from the beginning, you need to keep in mind the end goal of handling Big Data.