More

    Top 10 Best Practices for Data Integration

    The data management discipline known as data integration (DI) has undergone an impressive expansion over the last decade. Today it has reached a critical mass of multiple techniques used in diverse applications and business contexts. Vendor products have achieved maturity; users have grown their DI teams to epic proportions; competency centers regularly staff DI work; and DI as a discipline has earned its autonomy from related practices like data warehousing and database administration.

    Given all this change, it’s not surprising that people in the field might not be up to speed on the current incarnation of DI. Even DI specialists and the colleagues who depend on them sometimes forget the new techniques, diversity, independence, collaboration, and governance typical of modern DI practices. Many suffer misconceptions and out-of-date mindsets that need adjustment.

    The 10 practices in this slideshow, from a TDWI report sponsored by SAS, paint a modern landscape of current DI practices. They also bust a few DI myths that are still too common. Moreover, they raise the bar on DI, showing how sophisticated and powerful a DI solution can be—at least when DI is driven by modern best practices using up-to-date tools.

    If you let it all soak in, this checklist  will redefine DI for you and your peers. And it will help you set higher goals and aspirations for DI work and its outcome. The practices listed here can be the guidelines that help you achieve more modern, high-value, diverse, independent, well-designed, far-reaching, green, collaborative, and well-governed uses of DI tools and techniques.

    Top 10 Best Practices for Data Integration - slide 1

    Click through for 10 data integration best practices from TDWI and SAS.

    Top 10 Best Practices for Data Integration - slide 2

    Data integration is a family of techniques, most commonly including ETL (extract, transform, and load), data federation, database replication, data synchronization, sorting, and changed data capture. All these techniques require support for a wide range of interfaces, so the resulting DI solution can access databases, applications, and files to extract or load data. Solutions based on these techniques may be hand-coded, based on a vendor’s tool, or a mix of both.

    Top 10 Best Practices for Data Integration - slide 3

    Data integration techniques are practiced in support of a variety of business initiatives and technology implementations. Hence, in many ways, DI best practices are defined by their associated initiatives and implementations.

    • Analytic data integration (AnDI) is where one or more DI techniques are applied in the context of business intelligence (BI) or data warehousing.
    • Operational data integration (OpDI) involves the access and integration of data among operational applications and databases, whether within one organization or across multiple ones.
    • Hybrid data integration (HyDI) practices fall in the middle ground between AnDI and OpDI. Hybrid DI includes master data management (MDM) and similar practices like customer data integration and product information management.

    The three practice areas of DI span analytics and operations, plus the overlap between them. Across these practices, any DI technique and tool type may be used and all practices assume core skills for databases, data models, interfaces, and transformations.

    Top 10 Best Practices for Data Integration - slide 4

    Data integration’s autonomy is a relatively new—and still evolving— development. After all, DI has a long history of being staffed and managed by larger, related data management teams. For example, in some old-fashioned organizations, DI (especially the ETL technique) is still considered a subset of data warehousing or database administration. Luckily, DI can still be practiced successfully when subsumed by a larger team. But some organizations are moving toward independent teams of DI specialists who perform a wide range of DI work, whether analytic, operational, or hybridized.

    Given the growing amount and breadth of DI work, DI specialists and the people who depend on them need to rethink how they organize, staff, train, tool, and coordinate DI work. This is a time of great change for DI, and now’s the time to plan for DI’s future.

    Top 10 Best Practices for Data Integration - slide 5

    True DI is about transforming data, as the T in ETL reminds us. The transformation can be simple, as when a federated table join changes source schema into a common data model so the tables can be merged. A transformation may also be complex, as when a legacy data set is completely remodeled during its migration to a modern database platform. Hence, DI is defined primarily by how it transforms data. But the access, copy, and transfer of data are secondary, as are the details of an individual DI solution, such as interface types and their speed or frequency of operation (based on data latency requirements).

    Transforming data is a technical task that supports a business goal—namely, repurposing data for a business use that differs from the one for which the data originated. When defining DI, stay focused on the value proposition seen in transforming and repurposing data; avoid definitions that stress the secondary access, copy, and transfer of data.

    Top 10 Best Practices for Data Integration - slide 6

    Think about how a manufacturing process consumes material in various states of rawness or completeness, processes the material to make it suited to a new purpose, and combines processed material into a product that’s more valuable than the original material. The data transformation and repurposing mentioned previously have an effect similar to manufacturing, in that something truly new (and usually more valuable) results.

    Data integration specialists should always raise the bar by looking for ways to add further value to data as they integrate and repurpose it, whether the data integration solution is analytic, operational, or a hybrid.

    Top 10 Best Practices for Data Integration - slide 7

    Recent climate changes and the rising cost of electricity have led many people to revisit the sustainability of data centers. In response, corporations are reducing power consumption and the physical footprint of data centers and server rooms by consolidating redundant data and virtualizing hardware servers. Data integration tools and techniques are instrumental in the consolidations that make IT more sustainable.

    The data migration techniques of OpDI can consolidate and collocate redundant databases, thereby reducing the number of servers, plus the budgets and resources they consume. Furthermore, DI techniques such as data federation and data services can assemble data sets on the fly, as they are needed, without spawning new, permanent databases that burn up server resources.

    Top 10 Best Practices for Data Integration - slide 8

    If you don’t fully embrace the existence of DI architecture, you can’t address how architecture affects DI’s scalability, staffing, cost, and ability to support real-time capability, MDM, services, and interoperability with other tools.

    Recognize that DI architecture exists. Although it overlaps with data warehousing architecture and interacts with the entire BI technology stack, DI architecture is an autonomous structure required for an autonomous practice. After all, other types of IT solutions have architecture.

    Top 10 Best Practices for Data Integration - slide 9

    TDWI Research defines collaborative data integration as a collection of user best practices and software tool functions that foster collaboration among the growing number of technical and business people involved in DI projects and initiatives. In a recent TDWI survey, two-thirds of organizations surveyed reported that collaboration is required for DI.

    Recognize that DI has collaborative requirements. The greater the number of DI specialists and people who work closely with them, the greater the need for collaboration around DI. Head count aside, the need is also intensified by the geographic dispersion of team members, as well as new requirements for regulatory compliance and data governance.

    Top 10 Best Practices for Data Integration - slide 10

    In most organizations today, data and other information are managed in isolated silos by independent teams using various data management tools for DI, data quality, data governance and stewardship, metadata and MDM, database administration, data architecture, and so on. In response to this situation, some organizations are adopting enterprise data management (EDM), a best practice for coordinating diverse data management disciplines, so that data is managed according to enterprise-wide goals that promote technical efficiencies and support strategic, data-oriented business goals.

    In many ways, EDM is similar to collaborative DI, except that EDM involves several data management disciplines—not just DI. Furthermore, EDM demands far greater guidance from business management, so that all data management work is aligned to support strategic, data-driven business objectives, including fully informed operational excellence and BI, plus related goals in governance and compliance. The challenge of EDM is to balance its two important goals—uniting multiple data management practices and aligning them with business goals that depend on data for success.

    Top 10 Best Practices for Data Integration - slide 11

    When executed broadly, data governance (DG) influences almost all data management practices, including DI, quality, warehousing, standards, administration, architecture, and so on. Data governance typically requires that adjustments be made in these practices, in support of the data usage policies developed by the DG board.

    Get the Free Newsletter!

    Subscribe to Daily Tech Insider for top news, trends, and analysis.

    Latest Articles