The Challenges of Integrating and Mastering Big Data

    There are a lot of predictions about how Big Data will change things in the coming year. But the most unexpected trend prediction I’ve seen is that Big Data will trigger an increase in hand-coded integration.

    I guess we should’ve seen it coming — I mean, MapReduce jobs are largely hand-coded. I’ve just never seen someone put it in quite those terms.

    It’s already happening to some extent, according to Jorge Lopez, senior management, data integration, Syncsort, in a recent column for TDWI.

    In 2012, more companies needed high-performance data integration, since conventional, batch-processing tools couldn’t keep up with Big Data’s scale and performance requirements, he writes. So, more companies started to “rely on constant manual tuning” to get the results.

    This remark caught my eye, because it was only in late 2011 that the TDWI reported that the scales had shifted with more organizations abandoning hand-coding for data integration solutions. That certainly didn’t last long.

    Anyway, based on this shift back to hand-coding, Lopez is predicting more hand-coding in 2013, even a veritable “spike” in hand-coding “as organizations seek alternative approaches to meet performance and scalability requirements.”

    What’s interesting is that he doesn’t see the whole process being hand-coded, but in some cases, the hand-coding will be necessary to complete the integration. Hadoop in particular creates a demand for manual coding because MapReduce relies on Java, Pig or Hive — not the SQL that’s foundational for so many enterprise solutions.

    As you might expect, there will be consequences to all this new hand-coding.

    “As organizations retreat to manual coding, they will face development and maintenance challenges, especially as big data continues to raise the requirements bar,” Lopez writes.

    Lopez suggests that the solution will be “using friendly graphical user interfaces that leverage existing IT skills and highly scalable, self-tuning engines to help reduce the complexities of designing for performance.”

    That’s a not so-subtle push for Synscort’s own solution, which accelerates integration and is built for large datasets. But I’m going to assume it’s not the only solution.

    So, Big Data’s not going away, but it may change the data integration conversation even more than it already has.

    Talend’s vice president of marketing, Yves de Montcheuil, recently explained how Big Data will intersect with another major integration technology, Master Data Management:

    “Adding ‘Big’ to MDM does not mean that the master data hub will be stored in Hadoop (although NoSQL could enable this sooner than one thinks), nor does it mean that its size will grow exponentially in a short timeframe. Rather, it means that some of the big data (or new data) will be managed in the MDM hub itself, linked from the MDM hub in a federated approach, or will simply benefit from the consistency and resolution services that MDM brings to the table.”

    Informatica’s Vice President of Cloud Data Integration Darren Cunningham agrees that MDM and Big Data will intersect this year. In general, he predicts a shift toward making Big Data more relevant by ensuring it’s available in a timely matter and that the data is high quality.

    “Data scientists will continue to play an important role in the enterprise, but the bulk of the work to make Big Data useful is handled by the right approach to tackling the strategic requirement for better data integration and data quality,” Cunningham said during a recent Q&A with IT Business Edge.

    Loraine Lawson
    Loraine Lawson
    Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.

    Latest Articles