Real-Time Data Integration Techniques

    In my previous post, I looked at federation, services and events as ways of achieving real-time data integration.

    TDWI’s Phillip Russom identified these as the three “most aggressively adopted” integration techniques for achieving real-time data in a recent article, “10 Rules for Real-Time Data Integration.” That’s why he made Rule # 4, “Expect to do RT DI with more federation, services, and events.”

    Real-time data integration is a new approach for many organizations, but it turns out, the next three real-time operations techniques are pretty old hat. Let’s take a look at these approaches.

    Replication. This is pretty much what it sounds: Changes to the source of the data are sent out to all the other systems, which are then updated. Replication is second only to ETL when it comes to performing integration, Russom notes. This makes it a popular starting point for real-time data integration because most likely, someone already knows how to do it. He actually wrote a full article on how data replication methods support real-time data integration. There’s also a whitepaper available that explains it in greater depth.

    Change data capture. This is when changes to the data transaction logs are captured (hence the name). The difference between CDC and replication confuses even IT database pros, and they’re often talked about as one and the same. So, to simplify it for us non-DBAs, what you need to know is that it’s a way of accessing changes without running a complete ETL batch process. I found an older article that explains some of the use cases for CDC. The difference between CDC and replication is confusing even IT database pros, but they’re often talked about as one and the same.

    Microbatch ETL. This is like traditional ETL, except it’s done in small, more regular batches and stores the results in a real-time partition. This partition is copied once a day to the static data marts. It sounds great, but it has some pretty intense requirements, in terms of job control, scheduling, error-mitigation methods, etc.

    That said, TDWI called it “by far the simplest approach for delivering near real-time data warehousing reporting,” in the 2004 edition of “The Data Warehouse ETL Toolkit,” which is available on Google Books. You can also read the full chapter that explains it on this site.

    Russom’s article comes on the heels of a recent TDWI webinar, “Real-Time Data Integration: An Infrastructure That Enables Fast-Paced Business,” which goes into more depth about the technology issues. That webinar is available for free download as a .wmv, by the way, and linked to from within the article.

    Loraine Lawson
    Loraine Lawson
    Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.

    Latest Articles