Why the Push for Do-It-Yourself Metadata?

Loraine Lawson

It's interesting: Writing custom code for data integration is nearly universally frowned upon by experts-and yet, I'm seeing a lot of discussion about creating your own metadata solutions for support with integration.

 

My question is: If you're getting away from hand-coding, why would you want to delve into customized metadata solutions?

 

As usual for me, it's a question that occurred while I was reading random items about integration. The first was a recent SOA Data Integration Architecture Group LinkedIn discussion started by integration/SOA expert David Linthicum. "Does Data Integration Require Too Much Coding?" he asked the group, then offered his own answer. "My take: You should avoid coding, at all cost. That is what data integration technology is for." (You have to be a member of LinkedIn and join the group to see these discussions, but both processes are free.)

 

His post included a link to this short item on the topic by James Taylor (no, not that James Taylor). Tayor is an independent consultant and IT veteran, and he's suggesting two ways to reduce the complexity that can drive custom code.

 

Right after reading these items, I found "Blueprint for Do-it-Yourself Metadata," an Information Management article that goes into painful detail about using SQL Server to build a custom metadata database. This is not the first time I've seen such an article, though it is by far the most detailed. Earlier this month, I shared how building your own metadata management solution was a popular topic on LinkedIn. And Information Management has already offered a piece in July called "Roll your own Metadata with ETL Tools."


 

But this time, as I was shuffling through this very technical article, I started to think how painful it looked and wonder if this was a wise course of action, particularly in the age of moving away from custom code.

 

Of course, the article gives a few reasons why you might want to do this. First and foremost, the author, CapTech Ventures consultant Bob Lambert, points out that this is meant to address a gap currently unmet by major integration tools:

Major ETL vendors like Informatica, IBM WebSphere DataStage and others include metadata solutions that document mappings and transformations, enabling impact analysis in the event of interface changes, database design changes and data quality problems. However, metadata associated with development tools only kicks in when development starts. A significant part of data integration effort happens before the virtual pen meets paper to build ETL maps.

That seems reasonable enough.

 

He also offers a list of benefits to doing an integration metadatabase, including the fact you're putting data integration requirements, data modeling and interfaces together into one place, which should pave the way for a smoother data-integration projects.

 

A metadatabase also gives you the ability to run basic queries against it rather than to conduct a spreadsheet-to-spreadsheet comparison; better change management; improved communication between IT and the business; ensuring consistency between design artifacts-you can read Lambert's complete listing of benefits on the third page of the article. He also lists some of the things you should consider first; not cons, per se, but rather issues such as requiring resources and rigorous management.

 

It is completely possible that I'm missing something. After all, I am a tech journalist, not a data-integration expert. But I do wonder if part of the explanation for the build-your-own-metadata-management trend may lie in something Paul Fingerman, a San Francisco-based enterprise, technical, and solutions architect, posted in response to Linthicum:

I agree! Avoid code at all costs. However, the availability of OTS data integration technology to handle difficult semantic integration is just about nil. Even in the custom world, frameworks for working with implicit semantics are weak and extremely complex. Things have improved in the last five years, but this is still an unsolved problem.

It will take wiser heads than I to sort this out. Consider this my public query for more information (no metadata required).



Add Comment      Leave a comment on this blog post
Aug 30, 2010 10:28 AM Mark Montgomery Mark Montgomery  says:

Hello Loraine,

It's easy to become confused when fishing in LinkedIn discussion groups, in part because their data structure is still fairly primitive for sorting how who understands what. For example a lot of turf wars between camps are far from transparent.

I run into similar confusion fairly often, particularly with analysts who are accustomed not to studying emerging technologies, but rather mature categories, and by then they are often doing more harm than good in my experience.

Let me try to simplify-- semantic languages provide structure to data, as in universal translation of meaning-- but the translation is for machine to machine automation and understanding to prevent just this sort of confusion.

Programming code can be used to develop applications that manipulate structured data, but I think in this context folks are referring to the code work in integrating proprietary code that isn't interoperable.

So for example the statement "use the least code possible" can be substituted with "use as many standard data languages as possible" and it will "require less integration work" (expensive and wasteful to customers, but necessary when needing to share data between data silos locked in by proprietary code).

We support independent standards for a variety of reasons, not least of which include ownership and control of data, prevention of extortion tactics with proprietary code (for public networks) that leads to lock-in, macro/sustainable economics, and efficiency/productivity of organizations and their knowledge workers.

I write about the topic on blog with some frequency-- another extensive article on the topic will be posted within a day or two (it's evening of 8-30-2010 as I write this.

Hope this helps a bit.

Mark Montgomery

Founder & President

Kyield

Reply
Aug 31, 2010 11:19 AM Juergen Brendel Juergen Brendel  says:

It's easy to say "avoid coding". In reality, however, most serious data integration work requires some customizations, which are most expediently expressed with a few lines of code: Why try to express in XML (or a clumsy UI tool) what can be expressed very succinctly in 3 lines of code?

Of course, a lot of code in that field should be reusable and that is exactly what data integration solutions are trying to provide. However, the difficulty arises when we reach the end of what is possible with those provided components. How easy is it to add custom integration code if we really have to? Many solutions fall short in that space: Nice component libraries, but it's too complex to add the occasional snippets of custom logic when you really have to.

RESTx ( http://restx.org ) is a fully open source data publishing and integration project I am working on. It's stated goal is to allow for reuse of code and integration logic, but to make the writing of custom components as simple and easy if possible. To that end, it gives you a choice to write your integration logic in either Java, Python or server-side JavaScript. It's all entirely RESTful and new RESTful resources can be created even by non-developers, merely by filling out a form in the browser.

Custom code can be made to work, if a conscious effort is placed on providing simple and intuitive APIs. And the resulting system is probably more concise and easier to maintain than purely 'configuration' driven solutions.

Reply
Sep 2, 2010 12:49 PM Paige Roberts Paige Roberts  says: in response to Juergen Brendel

I agree that integration projects inevitably require a certain amount of customization, and I believe that€™s the answer to Lorraine€™s main question, €œWhy the big push for Do-It-Yourself Metadata?€

 

The metadata management needs of individual enterprises are so widely varied that trying to create a one-size-fits-all metadata solution borders on the impossible.  But writing the whole thing from scratch isn€™t the answer either. Custom coding huge chunks of technology that have already been built by a dozen companies seems like picking up a hammer and chisel to invent a wheel because you want a custom car.

Tom Spitzer at EC Wise had the right idea in the IM article, €œRoll your own Metadata with ETL Tools.€  Start with ETL tools that have already done a fair amount of the work for you, then add the additional functionality that you need.  He gives a good set of reasons why starting with an ETL-based metadata strategy makes sense:

€œETL products were built to address the various dimensions of the metadata repository population problem. They come equipped with connections to myriad systems and typically are already programmed to interrogate whatever internal metadata those systems contain. They tend to support a wide range of communications and connectivity protocols so that they can track down information wherever it happens to be located€. Finally, these products can be automated to periodically visit the systems they have cataloged and look for new or changed information.€

Another good reason is simply that a fair amount of the metadata you need is embedded in the ETL processes themselves.  One source of metadata you have to get to is the source, target, and transformation information from those processes, and nothing is better suited to capturing that data than the ETL tool itself.

So, starting with an ETL tool gives you a huge head start over coding everything from scratch, but it doesn€™t get you to the finish line because there are guaranteed to be individual semantic matching, or business rule requirements that are unique to the business.

The key then is extensibility.  Like Juergen Brendel pointed out, if you reach the end of the capabilities of the ETL tool, and you need more, then what?  If the tool is not extensible, then the answer is, give up, start over, and build it all from scratch.  That€™s the problem with a lot of ETL metadata €œsolutions.€  They€™re proprietary, and not extensible.  That makes them just another freakin data silo.  You would think that ETL companies would know better.

On the other hand, if the tool allows you to modify, add to, and utilize its metadata storage for additional metadata, allows you to customize business rules, add code modules, etc. then all you need to code is the bit that€™s special, the aspects that are unique to your needs.

If West Coast Customs had to build a new car from the ground up every time someone wanted a unique paint job or body style, they€™d go out of business.  Start with the stock code, soup it up, modify to your heart€™s content, and get your dream solution at the end.

Paige

Reply
Sep 21, 2010 8:58 AM Bob Lambert Bob Lambert  says: in response to Paige Roberts

Painful detail?  Ouch!  Actually I found it rather fun, and was encouraged by another database geek like myself.

Lorraine I like your post and agree with most of the points you make (except the "painful" part).  Essentially the problem is that, at least in the circles in which I operate, commonly used metadata tools don't currently support ETL analysis, leaving analysts to document their sources and targets in Excel, which is actually the practice recommended by at least one ETL tool vendor in their public training!

Sure, the custom solution I describe requires someone with some SQL Server chops to build and maintain, but if you don't have a solid database developer on your ETL team then you have bigger problems than poor source to target definition.

The best outcome in this case will be for ETL vendors to add analysis support to their metadata tools making this kind of custom source-to-target metadata unnecessary. 

Thanks for your comments!  

Reply

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

null
null

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.