Top 10 Best Practices for Data Integration
Use these guidelines to help you achieve more modern, high-value and diverse uses of DI tools and techniques.
Integration isn't one-size-fits-all. There are actually a lot of options about how you can handle integration, depending on your needs, but it can be a bit confusing if you don't deal with it day in and day out.
Heck, if you're not an engineer, it can be confusing even when you do.
A recent B-eye Network article breaks down your integration options. It's written by Chris Bradley, a UK business and data management consultant, who does an excellent job of simplifying what you can expect from each integration approach.
First, there's physical movement and consolidation, which is where you want to replicate data from one database to another. Basically, there are two "genres" of physical data movement, he writes: ETL and CDC.
Most people know about ETL. It's been around since Noah had to run a batch process to create the ark's grocery list. ETL - extract, transform, load - is used to move a lot of data in bulk, and it's usually done on a schedule, often only once a day.
CDC stands for "change data capture." Honestly, I feel guilty about change data capture - nobody talks about it much, and I feel like I've neglected it in this blog. Basically, it's accomplishing the same thing as ETL, but using a completely different approach. While ETL is slow and huge, changing all the data seldom but at once, change data capture works in real time, doing incremental changes. It's also event-driven.
Its name makes more sense when you think about it as capturing the changes to the data as they happen and then applying those changes in real time. Kin Cheung, a product marketing manager at Informatica, recently wrote a short post about change data capture and the reasons why you might use it over ETL, if youd like to read more.
If you need to "physically" migrate, change and consolidate data, ETL and change data capture are your main options.
Sometimes, you don't want to move the data. Sometimes, you want integration to target the applications. For this, Bradley writes, you'll need integration style two: message-based synchronization and propagation.
Message-based synchronization and propagation comes in two varieties: ESBs (enterprise service buses) and EAI (enterprise application integration solutions). These tools are also event-driven and are used for business process automation, explains Bradley. ESBs, for instance, are a popular way to deliver services in service-oriented architecture.
Finally, there's the abstract approach to integration, aka data virtualization or federation.
This is a very trendy approach right now, with most major vendors offering some form of data virtualization.
Bradley says "the key point with data virtualization is that the form of the underlying source data is isolated from the consuming application." Theoretically, this means you can pull data from a variety of sources without worrying about all those tedious details that bog down traditional integration - like whether things match here and there.
It's fast, agile and simple - and to hear virtualization fans, it will solve any and all integration problems. But, as you might expect, that's an overzealous view: Data virtualization, like all the other approaches, has its time and place, its pros and cons. One con, according to data integration expert David Linthicum, is it's not actually integration.
Bradley goes on to talk about the points you should consider when choosing between your integration options. Although he doesn't actually tie those considerations to recommendations, you get the gist after reading his explanations. It's definitely worth a quick read if you're unsure which approach would work best for your integration needs.