For some time now, I've followed virtualization, particularly as it pertains to data. I could see how it would create a need for data integration-but I wasn't sure how it could be used as a data integration tool.
Apparently, I'm not the only one who needed help connecting the dots. This month, I've found two resources that help explain how data virtualization helps with integration.
The first is a TDWI-published paper that explains exactly how data federation-aka, data virtualization-is an important tool for data integration teams. It's written by the director of TDWI Research, Wayne Eckerson, and sponsored by Composite Software.
I know some of you have worked in integration since Noah came over on the ark; honestly, this paper isn't for you. But for those who hail from the business side or who are new to data integration, this paper is an excellent resource for understanding what data virtualization is and is not, as well as its technology heritage and the potential business use cases.
I found the history particularly enlightening. Data virtualization is positioned as a relatively new development, but data federation reaches back to the early 1990s, with the virtual data warehouse, according to Eckerson. By the early part of this decade, the technology was coupled with more robust computing resources, marketed for general-purpose data integration and labeled "Enterprise Information Integration." My fellow ITBE blogger, Arthur Cole, pointed out this fact last year when he posted an overview of data virtualization vendors.
These days, you'll find this technology marketed as data virtualization, data services or distributed query solutions, Eckerson writes. Of course, it's not just the name that's changed. The tools "have broadened their capabilities, Eckerson continues:
"They are used in a variety of situations that require unified access to data in multiple systems via high-performance distributed queries, such as data warehousing, reporting, dashboards, mashups, portals, master data management, data services in a service-oriented architecture (SOA), post-acquisition systems integration, and cloud computing."
The first checklist explains what data federation is and why vendors sometimes call it "data virtualization" instead:
"When users submit a query, data federation software calculates behind the scenes the optimal way to fetch and join the remote data and return the result. Its ability to shield users and application developers from the complexities of distributed SQL query calls and back-end data sources is why some vendors call this technology 'data virtualization' software."
The second resource is an ebizQ webcast, available for replay, featuring blogger and consultant David Linthicum and Bradley Wright, senior marketing managers for data services at Progress DataDirect. The discussion focuses on the business value of data virtualization, but as part of that, both Linthicum and Wright explain how virtualization supports integration.
I particularly liked Wright's definition of data virtualization as a "data consumption approach that integrates and transforms data from multiple data sources into a logical or virtual business-friendly data model that really hides the details of the physical sources and the data in those physical sources from the consumers and are also accessed on demand through some partiuclar API by those data consumers."
Wright also provides a concrete example of how several client companies have used virtualization, including a health care insurer whose call center representatives couldn't access information on a member without switching between multiple applications. They didn't want to migrate the data out of the existing systems, but thanks to data virtualization, they were able to present an integrated, single-view of the member without physically moving the data.
If you'll like to learn even more about virtualization, there are a lot of great resources here on IT Business Edge, including these free book excerpts: