The Pros and Cons of Data Virtualization

Loraine Lawson
Slide Show

Top 10 Benefits of Virtualization

Virtualization has taken a firm hold at most enterprises these days, but the fact is we've only just begun to unleash the true potential of the technology.

Listening to Informatica's virtual forum and a corresponding TweetJam yesterday, you couldn't help but be impressed with data virtualization. I only managed to hear one of the forum's three hours, but I caught Forrester Research analyst Noel Yuhanna's discussion of the business benefits of data virtualization.


Data virtualization is key to integrating the "many houses of data" that plague enterprises, Yuhanna said, whether that silo is a legacy app or a mobile device. There are a number of use cases, but 15-20 percent involve that ever-allusive "single version of the truth," he said. Data virtualization can support it because it integrates with MDM. Data quality and governance are also given a boost by virtualization technology, he said, adding that security is a key driver for data virtualization, although it seems security experts aren't sold on that.


Not only that, but data virtualization can solve your cloud integration troubles, support integration of structured and semi-structured data, speed up your data delivery thanks to caching ("Ever wonder why Google is faster than your internal ERP," challenged Yuhanna) and, of course, it integrates with Big Data solutions like Hadoop and MapReduce.


And it's not just for enterprises. Yuhanna said all sizes of organizations can benefit from data virtualization.


It's also fast to implement - we're talking weeks to months instead of months to years - and provides a quick ROI. In some cases, Yuhanna shared, his clients have reported ROIs in under six months. And - here's the best part - it's doing all that in real- or near-real-time. That's actually a key component of Forrester's definition of data virtualization, according to his slide presentation:

Data virtualization is process of integrating data from many disparate sources in real-time or near real-time to support various business requirements. It involves integrating, transforming and delivering data as a service to support applications and processes.

Great at integration, quick ROI, speeds up delivery of data, improves data quality AND real-time-how could I not be impressed?


Hearing his litany of data virtualization benefits, I couldn't help but think, "What are drawbacks of data virtualization? When should you NOT use it?" Because we know there's always a trade-off, right? There's use cases, and then there are the much less-discussed "Do Not Use" cases. So, what's the deal with data virtualization, I wondered. Where does it fall from grace?


One thing I've read on Twitter is that it can cause problems in update scenarios, which one would think could create problems with real-time situations if you need the information to flow two ways.


I guess it depends by what your real-time demands are. Back in 2007, I asked Composite Software Vice President of Product Marketing Peter Tran about the use and non-use cases for data virtualization. He pointed out Pfizer has used it successfully for real-time data, but added there are times when the old tools work best:

Data warehouses have been around for 20 years and they'll be around for another 20 years at least. And in those cases where you need time series analysis, historical analysis, where you actually have to capture information that's historical - what happened yesterday, what happened two weeks ago - and analyze it, data warehouses are best.

Dick Weisinger wrote about the challenges of data virtualization on Formteck's blog not long ago. He quoted Ian Watts, senior technical manager of BT Americas, on the drawbacks:

Virtualization is a bean counter's dream, but it can be an operational nightmare. Change management is a huge overhead, as any changes need to be accepted by all applications and users sharing the same virtualization kit. While many organizations are seeing benefits from virtualization, such as reduced hardware spending and improved server utilization, these benefits often get overshadowed by the lack of productivity improvements in data center staffing and operations.

There are also what I would call "expectation challenges" that arise from misunderstandings about the technology. Blue Mountain Labs CTO and data/SOA/cloud expert David Linthicum, who actually participated in yesterday's forum, pointed out recently that companies wrongly believe data virtualization handles integration. I can certainly see where they'd get that idea, but Linthicum says that's a myth:

For some reason there are those who sell virtualization software and cloud computing enablement platforms who imply that data integration is something that comes along for the ride. However, nothing gets less complex and data integration still needs to occur between the virtualized data stores as if they existed on their own machines. They are still storing data in different physical data structures, and the data must be moved or copied, and the difference with the physical data structures dealt with, as well as data quality, data integrity, data validation, data cleaning, etc.

Again, I point out that I didn't participate in the entire three hours, so maybe the panel did discuss the shortcomings and "myth-conceptions" surrounding data virtualization. And from what I've read and heard, data virtualization seems like a smart approach for many use cases - particularly for handling data in a service-oriented space like the cloud or SOA. But it's worth keeping in mind that while data virtualization (aka, data services) may be the future of managing data, it's not without its challenges.

Add Comment      Leave a comment on this blog post

Oct 10, 2011 10:51 AM Robert Eve Robert Eve  says:

Loraine -

You are correct to point out the data virtualization as a data integration approach has both pros and cons. 

In our nearly ten years of enterprise deployments, the approach and enabling technology has proven most effective for BI data federation, data warehouse extension and data virtualization / data abstraction layered architectures, with big and cloud data integration patterns now emerging. 

We do see some "update" scenarios, but far fewer than "query" scenarios.  For example, Comcast uses the Composite Data Virtualization Platform to both query and update subscriber addresses.  If your readers want to learn more about this deployment, we included it on page 71-80 in our upcoming (to be announced tomorrow) data virtualization book.

Regards, Bob Eve

EVP Marketing, Composite Software.

Oct 11, 2011 6:31 AM Ash Parikh Ash Parikh  says:


Thanks for attending Informatica's Data Virtualization Expert's Forum on Oct 6.

A key point that stood out throughout the live online event was that agility and time to delivery were key drivers. What also stood out was that this needed a way to involve the business user early and often in the data integration process, with role-based tools and common metadata. It's about cutting the wait and waste, as described in great detail in the Lean Integration book:

According to customers, Informatica's data virtualization solution is the only technology that can truly cut the wait and waste in the process. This is because it supports the following with a single environment for physical data integration and virtual data integration:

Access ALL data

Federate data without data movement

Preview federated data anytime during the life cycle

Profile federated data without staging or additional processing

Apply ETL-like rich data transformations in real time, not limiting users to only what SQL or XQuery do

Apply data quality & data masking rules in real-time without staging or additional processing

Deliver data services that can be instantly reused for SQL, Web services AND batch

More information on our soultion can be found at:


Ash Parikh


Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Resource centers

Business Intelligence

Business performance information for strategic and operational decision-making


SOA uses interoperable services grouped around business processes to ease data integration

Data Warehousing

Data warehousing helps companies make sense of their operational data