Newsletters Welcome, Guest Log In | Register

Subscribe

Sign up now and get the best business technology insights direct to your inbox.

  • Daily Edge
  • CTO Edge Update
  • Business Tools & Templates
  • Aligning IT & Business Goals
  • Maximizing IT Investments

0

Where Data Warehouses End and Master Data Management Begins

by Loraine Lawson, IT Business Edge
Aug 25, 2009 8:22:37 AM

 

Last week, Evan Levy, a partner at Baseline Consulting and an instructor at The Data Warehousing Institute, told our Loraine Lawson why master data management transcends the data warehouse and its staff. This week, Levy explains the architecture of data integration and more about master data management (MDM).

 

Lawson: There seems to be some controversy in the enterprise architecture space about whether there should be a separate data integration architecture practice. Is that correct? Why is it important to focus on architecture with integration?
Levy: Well, I’m not naive or arrogant enough to believe there is only one good answer for everything. I don’t have a whole lot of patience for religious battles, but to your point about, “Gee do we really need an integration system,” I believe yes.

 

“Listen, reporting off of a data warehouse will always make sense, but if you take a look at the volatility of data sources, that’s what’s changing. What a lot of people don’t realize is the need to integrate new sources or remove sources is dramatically accelerating.”


Evan Levy
Baseline Consulting and DWI instructor

The challenge is if you could have the rules for integration socialized across all your different systems and standardize the data itself across all the different systems, then you wouldn’t need a centralized integration server or management method. Unfortunately, as long as people buy packaged apps that have defined data criteria within each app, then it’s impossible to have consistent data across all systems.

 

A good example is that one system may call something “Booked revenue,” another system may call it “Received revenue,” a third system could call it “Revenue,” and another system may call it “Billed revenue.” Now, the fact is, from a business vernacular perspective, two or three of those may be identical.

 

The premise of MDM or an integration server isn’t dissimilar from the way that the stock market works. The closing price of a stock at the end of the day is really no big deal -- knowing where the data resides, no big deal. That’s not the challenge. The challenge is, when it changes, how do you know if the value you have is the most recent value? And the whole premise of a lot of these integration servers is ensuring that the rules associated with the data itself exist. Otherwise, you run into the problem of those rules being coupled with the application and not with the data itself. The thing that strikes me is that those that argue haven’t had to deal with pulling data out of 60 different systems having 10 years of history online.

 

Lawson: What are you talking about when you talk about architectural options?
Levy: The typical view, if I’m a data warehousing guy, is I get a copy from this system at the end of the day or at the end of a time window, I bring the data together, and I merge it. The rules are fairly static. The challenge that you run into in that type of environment is that it’s fairly painful to add a third system, a fourth system and a fifth system.

 

I’ll give you a perfect example: I have customer information and I have it on my online system where orders are placed, I have it from a customer support environment where people call in to complain, and I have it from another area where orders are placed. So I’ve got online orders, phone orders and phone support. If someone receives an NCOA (National Change of Address), now you wonder should we accept that or not? It could be anything from a fraud to a mistake to, hey, it may actually be real.

 

That particular use case illustrates that, depending upon the source, there may actually be different rules to determine what the resultant of survivor record is. The rules by which you determine, do we change the value or not, should be consistent across the company. The problem, however, is that in the way that things are typically built in this day and age, those rules exist on a system-by-system basis based upon how the developer decided to implement the rules.

 

Now that works just fine if I’m doing things like the end of the day and I don’t have highly volatile data, but now what happens if values change multiple times during a day? What happens if I’m a dispatcher and someone has just called in and their phone is on the fritz and I want to dispatch a phone service technician. I’ve got to figure out the status of them or who is the closest, so I’m going to pull data from several different places. That’s a level of integration that a data warehouse couldn’t handle.

 

One could argue, whatever, that should just be its own operational system. Well, you know, that’s fine, but for every business-use case that you can identify that’s a single system, I’m ultimately going to have to share that type of operational content across multiple systems. So how do I solve that if I don’t have one big system? And that’s where the alternatives become important, because not every need for data integration is a reporting need. It may be time-based business action or business decision making.

 

There are three or four basic ways of moving data and the whole idea in the Architectural Options for Data Integration class (taught at TDWI) is to get people to understand that I may have general-purpose data integration needs that aren’t time-centric, and I may have operational needs that are very time-centric. So I need to get the data quickly, I need to make sure it’s cleansed quickly, and I need to get it to its destination very quickly.

 

Then the (next) aspect is what about the rules - so I’m not having to bury the cleansing or transformation or security types of things at each point that receives the data. I want that done centrally. So you start considering all this complexity of moving data around.

 

What happens today is they take a bucket of raw data, they throw it over to the guy that wants it, and then he has to basically replicate cleansing, standardization, security and everything else to then determine, can he use the data or not? So what ends up happening is you have a lot of replicated and, in fact, ineffective methods.


Previous Page Next Page

Add a comment Leave a comment on this blog post.

There are no comments on this post

Lowering Your IT Costs with Oracle Database 11g Release 2

This white paper identifies the key capabilities a database management solution needs to successfully deliver more information with higher quality of service, make more efficient use of IT budgets, and reduce the risk of change in data centers.

Software Forum: Information On Demand Virtual Experience

This interactive virtual forum presents leading IT experts providing the insights you need to turn your information into a strategic driver for innovation, business optimization and competitive differentiation.

Budget & Finance Toolkit for IT - 2010 Edition

What kind of year are you planning in 2010?  Growth or continued "survival mode"?  Download a comprehensive collection of templates, forms, instruction and advice that will help you to plan and submit your 2010 IT Budget.

Learn more >

The IT Service Catalog Management Toolkit

Bridge the it-business gap once and for all! A well documented IT services catalog is the conduit for IT services to the rest of the company.

Learn more >