Can Data Virtualization Solve the BI Integration Problem?

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  

Composite Software just released a new version of its data virtualization platform. The news is still new, but I'm sure at some point someone will test drive its new functions and report back.


It's called the Composite Information Server Data Virtualization Platform, and it includes some enticing new goodies, including:

  • A relationship-discovery tool used to prepare data models.
  • Tighter integration with Composite Information Server to support intelligent queries.
  • A new adapter that lets SQL developers simplify access to Oracle Essbase source data.
  • A new, GUI management console.


You can read all about it in the press release.


To be honest, what caught my eye wasn't so much the new release, but this little tidbit of information buried in the second half of the press release quoting from a May 10, 2010 Forrester Research report, "The State of Business Intelligence Software and Emerging Trends, 2010":

Anecdotal evidence reveals that close to 80 percent of any BI effort lies in data integration. .... And 80 percent of that effort (or more than 60 percent of the overall BI effort) is about finding, identifying, and profiling source data that will ultimately feed the BI application. Yet that pre-extract, transform, and load (ETL) step today is largely manual.

Forrester clients can purchase the report for $499, which may explain why I couldn't find the information mentioned anywhere but the press release.


It isn't exactly a news flash that integration is a huge obstacle for BI. In April, the father of BI, Howard Dresner, warned that data integration was the chief complaint about BI implementations, but he didn't specify how big a problem.


Obviously, when you're talking 80 percent-even an anecdotal 80 percent-and you've got a big problem. The question is: How do you solve it?


The Composite press release suggests virtualization can solve the problem. "Composite Discovery 2.0 features faster algorithms to automate the business-critical steps of finding and understanding source data, thus accelerating the data modeling step of data virtualization," it says.


Composite is one of the chief advocates for using data virtualization to solve integration challenges in general. Google "data virtualization integration" and you'll find Composite Software at the top, along with three other articles (including one of my posts-go me!) that mention either Composite Software or Robert Eve, executive vice president of marketing at Composite (check out the piece he wrote for our sister site Data Center Edge on "Seven Secrets to Data Virtualization Success" ). Two of the remaining 10 are links to other data virtualization vendors-Denodo and Ipedo. (Progress Software also offers a virtualization solution, and John Goodson, executive leader of the Enterprise Data Solutions group at Progress, has recently written about the issue.)


In fact, almost all of the discussion about virtualization comes from vendors. I've found two exceptions. The first is a 2007 TechTarget article on data virtualization, which quoted a Forrester analyst (it doesn't provide his first name, but the last name is Yuhanna, so I'm thinking it's Noel Yuhanna.) The second is this April blog post by David Linthicum, promoting an Informatica-sponsored webinar on the topic, in which Linthicum writes that data virtualization is "the single most beneficial concept of architecture, including SOA, and it's often overlooked by the rank-and-file developers and architects out there."


I'm 95 percent positive that's because what vendors call data virtualization is the same thing that analysts call "data services." Data federation is another term you're likely to hear. And, as I've shared before, it's a mix of something old, something new. I think it's also worth noting that data virtualization's benefits can extend beyond BI to SOA, the cloud and even MDM.


But I'm just not clear on the extent to which data virtualization, data services, whatever you want to call it, solves the BI challenge. I know it's also been suggested that mashups can help solve the BI data problems, but do you need data services to support mashups?


Certainly, data virtualization is worth covering if you're looking to invest in a BI solution. Then again, IT Business Edge's Ann All points out, integration in general should top the list of issues you discuss with BI vendors.


Who knows, it might even pay off in a new revenue stream: Gartner predicts that by 2014, 20 percent of global organizations will create a product or service based on some portion of data derived from their BI systems. At the very least, the same TechTarget article notes, customers will probably expect you to provide some data or additional services based on BI.

I'm still have a lot of questions about this issue. Am I the only one?