Why the Push for Do-It-Yourself Metadata?

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  

It's interesting: Writing custom code for data integration is nearly universally frowned upon by experts-and yet, I'm seeing a lot of discussion about creating your own metadata solutions for support with integration.


My question is: If you're getting away from hand-coding, why would you want to delve into customized metadata solutions?


As usual for me, it's a question that occurred while I was reading random items about integration. The first was a recent SOA Data Integration Architecture Group LinkedIn discussion started by integration/SOA expert David Linthicum. "Does Data Integration Require Too Much Coding?" he asked the group, then offered his own answer. "My take: You should avoid coding, at all cost. That is what data integration technology is for." (You have to be a member of LinkedIn and join the group to see these discussions, but both processes are free.)


His post included a link to this short item on the topic by James Taylor (no, not that James Taylor). Tayor is an independent consultant and IT veteran, and he's suggesting two ways to reduce the complexity that can drive custom code.


Right after reading these items, I found "Blueprint for Do-it-Yourself Metadata," an Information Management article that goes into painful detail about using SQL Server to build a custom metadata database. This is not the first time I've seen such an article, though it is by far the most detailed. Earlier this month, I shared how building your own metadata management solution was a popular topic on LinkedIn. And Information Management has already offered a piece in July called "Roll your own Metadata with ETL Tools."


But this time, as I was shuffling through this very technical article, I started to think how painful it looked and wonder if this was a wise course of action, particularly in the age of moving away from custom code.


Of course, the article gives a few reasons why you might want to do this. First and foremost, the author, CapTech Ventures consultant Bob Lambert, points out that this is meant to address a gap currently unmet by major integration tools:

Major ETL vendors like Informatica, IBM WebSphere DataStage and others include metadata solutions that document mappings and transformations, enabling impact analysis in the event of interface changes, database design changes and data quality problems. However, metadata associated with development tools only kicks in when development starts. A significant part of data integration effort happens before the virtual pen meets paper to build ETL maps.

That seems reasonable enough.


He also offers a list of benefits to doing an integration metadatabase, including the fact you're putting data integration requirements, data modeling and interfaces together into one place, which should pave the way for a smoother data-integration projects.


A metadatabase also gives you the ability to run basic queries against it rather than to conduct a spreadsheet-to-spreadsheet comparison; better change management; improved communication between IT and the business; ensuring consistency between design artifacts-you can read Lambert's complete listing of benefits on the third page of the article. He also lists some of the things you should consider first; not cons, per se, but rather issues such as requiring resources and rigorous management.


It is completely possible that I'm missing something. After all, I am a tech journalist, not a data-integration expert. But I do wonder if part of the explanation for the build-your-own-metadata-management trend may lie in something Paul Fingerman, a San Francisco-based enterprise, technical, and solutions architect, posted in response to Linthicum:

I agree! Avoid code at all costs. However, the availability of OTS data integration technology to handle difficult semantic integration is just about nil. Even in the custom world, frameworks for working with implicit semantics are weak and extremely complex. Things have improved in the last five years, but this is still an unsolved problem.

It will take wiser heads than I to sort this out. Consider this my public query for more information (no metadata required).