Evan Levy explains the role of metadata in integration and why data management, which entails defining your metadata, is key to master data management. Levy is a partner at Baseline Consulting and an instructor at The Data Warehousing Institute. He will be a presenter at this year's TDWI World Conference, May 9-14 in Chicago. Last week, Levy and IT Business Edge's Loraine Lawson discussed the business problems created by poor metadata practices.
Lawson: How does metadata relate to integration, either in terms of problems it creates or ways it helps?
Levy: The reason metadata is important is, if I have two files, how do I link them? Metadata is going to tell you which columns mean which things.
"Without metadata, it's like looking in a medicine cabinet with a bunch of unlabeled bottles. Unfortunately, with databases we feel that somehow people are supposed to be able to figure that out. That's what's so ridiculous."
Customer ID, first name, last name - some metadata is pretty obvious. If I take a look at an address, it's a fairly common piece of information. You can pretty much pick out a street address in a list of characters without any question.
It's only when you see things like product ID, product description and product code - those could be three different pieces of information. The metadata will offer "what do I call them," "how are they represented," and describe them to you. I can use the product ID to link two different files, so I can integrate data.
Unfortunately, what a lot of people do is they actually eyeball two files. They have an idea of which columns are different because of tabs or whatever, and they guess, "Oh, I think I can link this to this." And they have to do a little bit of guesswork because there's a lot of data in Excel spreadsheets, there's a lot of data in Access databases, where no one goes to the effort of identifying the column names and inputting the metadata.
Think about what happens if you go get a prescription filled. You have these white tablets that go into a bottle and no label is put on. Yeah, OK, maybe you'll remember, but if you were to hand that bottle to someone else, there's no chance in hell they'll know what it's used for, how to use it, or anything like that. And, in many instances, the drugs could be misused. So, the metadata is like the label on the prescription. It tells you how to use it, when to use it, and what it's for; what it's called.
Without metadata, it's like looking in a medicine cabinet with a bunch of unlabeled bottles. Unfortunately, with databases we feel that somehow people are supposed to be able to figure that out. That's what's so ridiculous.
Lawson: Isn't metadata used in semantic technologies?
Levy: Metadata is used in semantic technology. What a lot of people don't realize is the discipline of describing data and giving it rigor is called data management. Part of data management is creating metadata.