Tough Choice: Move the Data or Use Virtual Integration


Virtualization is so pervasive, fellow IT Businesss Edge blogger Arthur Cole recently wondered whether the approach has jumped the shark as a cutting edge approach.


And yet, using virtual data integration is still something of a sideshow in integration circles-or at least that's how it seems to me. Once every fat quarter, I run across an article on the topic.


The latest piece, "Virtual Versus Physical Data Integration: How to Decide," was published Friday on DM Review. It gets into the nitty gritty of evaluating when you'd be better off using ETL to physically move the data or to leave the data where it is and opt for virtual integration. As it turns out, there's also a third option, a hybrid that combines both approaches. (More on that later.)


The article explains two decision-making processes for determining which option is the best for your project.


The first process is integration pattern matching, which focuses on comparing a specific use case with "typical deployed" data integration patterns. The second is integration factor analysis, which is described as a more bottom-up approach that evaluates a number of project-specific factors. There's even a detailed .pdf to help you conduct an integration factor analysis.


The article also provides some insight into why there are so few articles about virtual data integration. It's not that companies aren't doing it-in fact, the piece notes that a 2008 Gartner survey found more than 50 percent of organizations are creating "virtual integrated views of data from disparate databases via data federation techniques."


No, the problem isn't usage. Apparently, the problem is terminology. Data virtualization projects, which use middleware, are also called virtual data federation, high-performance query or EII (enterprise information integration).


It made me wonder if maybe virtualized data was a vendor-specific term. As it turned out, Cole looked at data virtualization's history last September, questioning whether it was just a repackaging of older IT concepts. As with most things nowadays, he discovered that, while it's similar, it's more of an evolution:

"A lot of this may sound like reinventing the wheel to some of you, but according to software analyst Wayne Kernochan, data virtualization is both a logical extension of virtualization in general and offers clear benefits to the enterprise, although it may not be as easy to implement as some vendors would have you believe."

I did a little Google digging and found other references beyond Composite. Many of the articles are older - say, earlier than 2003-so perhaps the term just never caught on. More recently, it has been applied to semantic technologies, such as this paper submitted for the 2008 European Semantic Web Conference.


But by and large, most of the pieces on virtual data integration can be traced back to the vendor Composite Software, a company that's not shy about introducing new concepts and terms. Just last fall, I did a Q&A with Robert Eve, the company's vice president of marketing, about a new "next step" in data integration he called "data discovery."


In fact, the piece published on DM Review Friday was written by Eve.


Whatever you want to call it-virtual data integration, EII, on-demand integration or the term du jour - it doesn't change the very real question of whether you should physically move data via ETL or leave the data where it is but perform a middleware-based integration. The DM Review article offers two good tools for reaching a decision.


As for the hybrid approach, if you'd like to learn more about that, you might want to check out this BeyeNETWORK audio interview, which features-you guessed it-Composite Software's VP of Marketing, Robert Eve.


You might also want to check out this older, but still useful, TechTarget article, "Data virtualization: The answer to the integration problem?"