Normally, I try not to delve too deeply into the technical. Most business people don't care and "Lord of the Rings" fans know where that got the dwarves.
I've been thinking about metadata since Friday, when I interviewed Evan Levy about master data management. He explained that people talk about master data as though it's key business data, but in fact, what's being mastered with MDM is metadata.
When you integrate data, he said, you're not actually integrating the data-you're matching up and integrating the metadata.
Hold up. What?
To understand why integration is such a constant challenge, and why new initiatives and technologies to deal with it are always showing up in your budget requests, you have to understand that much of integration and the problems we have with data is about the metadata - and the fact it's been neglected, omitted and created without standardization, lo these many, many years of storing data.
And, increasingly, metadata is being talked about because it's essential to discussions of semantics and semantic technology, which focuses on standardizing metadata (and other information) via ontologies. There's a bit more to it than that, of course, but for our purposes today, that's the important part.
And, it turns out, metadata is also key to understanding some of the discussions around the cloud and standards.
Metadata is commonly defined as the data about data. If that makes your head spin, think of it this way: You know when you set up an Excel spreadsheet, and you label one column "Date" and another column "Last name" and another column "First Name" and so on? Well, those labels are metadata-they describe the data in the columns.
Another example you may be familiar with: Web sites. Web sites use metadata to tell browsers what's on the page. It's hidden in the code for the browsers and Web crawlers to find.
I guess I'd always thought of metadata as sort of optional, depending on how well you want your Web page to turn up in a browser. So, when I designed Web pages, I would put the metadata in on home pages and the major subpages, but to be honest, I wasn't meticulous about it with other pages.
It turns out, the same is true of a lot of other situations where metadata comes into play. Programmers weren't always meticulous about putting in the metadata. And there was no standard approach to organizing data, so metadata, which wasn't even visible to users, was even more esoteric.
You know how they're always comparing IT to utilities or the phone company? Well, it seems to me that metadata is a lot like money. Money isn't standardized across countries. In fact, money is pretty esoteric: Each country defines the look, the feel and value of its own currency. That's how you get situations like the dime, which is roughly the size of a penny but worth more than the larger nickel, a situation that befuddles some foreigners.
These separate systems of money work pretty well until you want to trade with another country. Obviously, you're not going to just adopt their currency-I mean, you've got all these coins made already.
Same thing with metadata. Your programmers did their own thing with the metadata, and it worked pretty well -- until you wanted to trade between systems or with business partners.
And, just as with money, no one is going to toss out all that metadata that's been created over the years. The expense would be enormous, and everybody has better things to do. So, we're mostly stuck with the metadata we have.
And yet ... we need some way to translate between these two currencies or we can't trade information. That's where integration comes in.
Essentially, semantics and some standards are about agreeing upon metadata or metalanguage. If all money looked the same, there would be no need for currency conversion. Likewise, if metadata were standardized, interoperability wouldn't be an issue-or, at least, it'd be much less of an issue than it is now.
While it might be possible to do that within a single company, it becomes harder when you're talking about standardization between businesses, governments or fields, such as medicine, engineering, and so on. As a former IBM researcher and developer and cofounder of VivoMind Intelligence, John F. Sowa, observed in a recent paper on ontolologies (which, again, essentially define, classify and standardize information, including metadata):
Defining formal ontologies for all the kinds of information used by a single business enterprise is an enormous undertaking. Generalizing those ontologies for an entire industry with all the competing and supporting companies, suppliers, and customers is far more difficult. Defining universal ontologies to support all science, engineering, business, medicine, politics, law, and the arts will not be achieved for a long time, if ever.
I'm not expert nor am I trying to play one on the Internet, but as I read it, that means integration as an expensive and time-consuming challenge is likely to be around for a long, long time.
So, metadata matters, because it helps you truly see why integration is an ongoing challenge. Metadata matters when you're talking about integrating databases. It also matters when you're talking about cloud standards and interoperability, as Lori MacVittie recently explained in an excellent post on the cloud standards battle. And it will matter even more when organizations start to dig into semantic technologies.