The Challenges of Integrating and Mastering Big Data

Loraine Lawson

There are a lot of predictions about how Big Data will change things in the coming year. But the most unexpected trend prediction I’ve seen is that Big Data will trigger an increase in hand-coded integration.

I guess we should’ve seen it coming — I mean, MapReduce jobs are largely hand-coded. I’ve just never seen someone put it in quite those terms.

It’s already happening to some extent, according to Jorge Lopez, senior management, data integration, Syncsort, in a recent column for TDWI.

In 2012, more companies needed high-performance data integration, since conventional, batch-processing tools couldn’t keep up with Big Data’s scale and performance requirements, he writes. So, more companies started to “rely on constant manual tuning” to get the results.

This remark caught my eye, because it was only in late 2011 that the TDWI reported that the scales had shifted with more organizations abandoning hand-coding for data integration solutions. That certainly didn’t last long.

Anyway, based on this shift back to hand-coding, Lopez is predicting more hand-coding in 2013, even a veritable “spike” in hand-coding “as organizations seek alternative approaches to meet performance and scalability requirements.”

What’s interesting is that he doesn’t see the whole process being hand-coded, but in some cases, the hand-coding will be necessary to complete the integration. Hadoop in particular creates a demand for manual coding because MapReduce relies on Java, Pig or Hive — not the SQL that’s foundational for so many enterprise solutions.

As you might expect, there will be consequences to all this new hand-coding.

“As organizations retreat to manual coding, they will face development and maintenance challenges, especially as big data continues to raise the requirements bar,” Lopez writes.


Lopez suggests that the solution will be “using friendly graphical user interfaces that leverage existing IT skills and highly scalable, self-tuning engines to help reduce the complexities of designing for performance.”

That’s a not so-subtle push for Synscort’s own solution, which accelerates integration and is built for large datasets. But I’m going to assume it’s not the only solution.

So, Big Data’s not going away, but it may change the data integration conversation even more than it already has.

Talend’s vice president of marketing, Yves de Montcheuil, recently explained how Big Data will intersect with another major integration technology, Master Data Management:

“Adding ‘Big’ to MDM does not mean that the master data hub will be stored in Hadoop (although NoSQL could enable this sooner than one thinks), nor does it mean that its size will grow exponentially in a short timeframe. Rather, it means that some of the big data (or new data) will be managed in the MDM hub itself, linked from the MDM hub in a federated approach, or will simply benefit from the consistency and resolution services that MDM brings to the table.”

Informatica’s Vice President of Cloud Data Integration Darren Cunningham agrees that MDM and Big Data will intersect this year. In general, he predicts a shift toward making Big Data more relevant by ensuring it’s available in a timely matter and that the data is high quality.

“Data scientists will continue to play an important role in the enterprise, but the bulk of the work to make Big Data useful is handled by the right approach to tackling the strategic requirement for better data integration and data quality,” Cunningham said during a recent Q&A with IT Business Edge.



Add Comment      Leave a comment on this blog post
Jan 15, 2013 4:32 AM Jay Jay  says:
Good article and advice.Companies integrating various systems and implementing new technology need to perform comprehensive due-diligence prior to adoption.Read a whitepaper about this very topic " Y2k 12 integrating next generation technology to transform business" it offers good information on benefits of migrating from legacy and integrating new technology @ bit.ly/S8p1W4 Reply
Jan 16, 2013 9:02 AM Yves de Montcheuil Yves de Montcheuil  says:
@Loraine, thanks for sharing your thoughts and quoting from my MDM Predict. On the topic of hand coding, I would retort that there is no more need to hand code big data integration jobs than there is to hand code "regular" data integration jobs - assuming you have the right tools of course. Look for a tool that generates not only MapReduce but also Pig, Hive, Sqoop, and HBase. Look for metadata integration (HCatalog). Look for scheduling integration (Oozie). Look for data cleansing & deduplication. And, probably as important, look for integration between Hadoop and conventional technologies. Too many vendors pay only lip service to big data integration, giving you only an HDFS connector and a framework to write your own MapReduce code. These vendors are doing a disservice to big data by making it harder to get results. Disclaimer: Talend offers all of the above... and I believe is the only one to do so. But I do welcome different opinions! http://www.talend.com/products/big-data Reply

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

null
null

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.