On the face of it, MarkLogic seems like exactly the kind of company that would be worried about Hadoop.
It's a proprietary enterprise-class database that uses XML and XQuery (as opposed to SQL). It specializes in large amounts of unstructured data. One of its implementations - though not the biggest by any stretch, adds MarkLogic Vice President of Solutions Marketing Bill Veiga during a recent interview- is a couple of hundred terabytes, so it's well established as a Big Data solution. The company even recently won an award for its innovative approach to Big Data.
LexisNexis, which runs all of its electronic business on MarkLogic, has 275 unique customers with 500 live implementations, most of which are in publishing and the federal government, particularly the intelligence community. The company ended last year at $50 million, and expects to wrap up 2010 with anywhere between $85 and $90 million, according to Veiga.
And now along comes open source elephant, Hadoop, with its buddy MapReduce. Together, they can store and process massive amounts of unstructured data. Hadoop is the "king" of the NoSQL movement, according to The Register, and a force to be reckoned with, even for mega-vendor Oracle. Information is now doubling 18 months and Gartner predicts unstructured data will account for 80 percent of the total available capacity by 2015. GigaOM reports that anywhere from half to all of the world's data is expected to be stored in Hadoop in the years to come. Hadoop may be in the depths of its height cycle, but all signs point to it as an enduring true tech heavyweight.
You would think MarkLogic, along with other database/data warehouse vendors, would be sweating Hadoop.
If they are, they're not admitting it. Instead, they're rushing to integrate with Hadoop, embracing it as a companion technology. As Derrick Harris of GigaOm recently observed, "everyone with a data-driven business - Informatica, Microstrategy, HP, EMC, Oracle, ParAccel, IBM, Dell, Pentaho, Jaspersoft, you name it - has a Hadoop story to tell customers."
How and where does Hadoop fit in with these veteran solutions? Experts told ZDNet that Hadoop adds:
But there are areas where Hadoop doesn't compare favorably to traditional databases. It has a high latency (long delay), which makes it bad for traditional business intelligence functions that require a low latency (small delay). It also comes with a steep learning curve and a workload that can "vary a lot," according to Hari Vasudev, vice president of cloud platform group at Yahoo.
MarkLogic's recent release, MarkLogic 5.0, includes a Hadoop connector, as well as enterprise hardening that should broaden the solution's appeal. To MarkLogic, it's a combination that makes perfect sense because Hadoop and MarkLogic complement each other's weaknesses.
"MarkLogic spends its time trying to NOT touch every document so it can be fast at giving you answers, and then Hadoop spends its time touching every document and doing something to them, and so together, it's a good match," explained Deputy CTO Jason Hunter.
While the integration has potential for impacting data that comes into or is stored in MarkLogic, Veiga's most excited about how Hadoop can be used to run new batch transformations between systems running between two companies or business partners. In effect, he sees Hadoop's potential in acting as a sort of middleware or ETL (my words, not his) between two MarkLogic servers.
"Remember, this (publishing) is a market where syndication is king, where moving data back and forth and adding your own personal twist to it is where you make all the money," Veiga said. "So you can imagine that a Hadoop system sitting on either the departure side of the data or on the arrival side for the data and being able to do batch jobs against that data very quickly and then being able to move it into production in MarkLogic just opens up some possibilities."