Want More Openness in Enterprise Search? Open Source May Fill Bill


One of the fastest-growing areas in enterprise software is search. It's no surprise really, given the huge piles of data being accumulated by companies of all sizes and their sense that somewhere in all of that information are the kinds of insights that will lead to competitive advantage.


When I wrote about the market last January, Microsoft had just purchased search specialist Fast Search and Transfer, which it hoped would give it an advantage in the upper end of the search market. According to Lucid Imagination CEO Eric Gries, who spoke with IT Business Edge's Lora Bentley last week, the enterprise search category grew about 28 percent in 2008. (Gries cited figures from IDC.) Lucid Imagination is the first commercial company offering support and services to the growing numbers of users of the open-search search projects Apache Lucene and Solr.


As Gries told Lora, corporate users of Lucene include Ticketmaster, FedEx.com, Netflix and MySpace. The free software appears to be somewhat of a phenomenon, now being downloaded nearly 7,000 times a day. In essence, the company sounds like it's trying to become the Red Hat of search. Its value proposition is helping users optimize the free software. As Gries said:

The nature of enterprise search is that performance goes down and relevancy goes down if you don't take care of it. You have to tune it up. It's like a car. You have to take it into the garage every so often. If you don't, the results are bad. And one of the things we are offering is what we call health checks. Once a year, or twice a year, or on a quarterly basis, we will do a complete health check on a customer's system. We'll provide them with a detailed report of the 60 points we're checking -- index, relevancy, where performance is lagging, and so on -- and give them suggestions on what to do about it.

The idea of open source search may strike a chord with enterprise users. As CMS Watch principal Theresa Regli told me in an interview earlier this month, companies are insisting upon more openness from traditional search vendors so they can ensure they are buying solutions that can effectively search multiple types of content. Noting that search vendors once were "notoriously secretive," Regli said:

They felt like they could say, "We're going to be the best search engine for Microsoft-based environments." They didn't have to reveal the secret of how it worked. But now it's not good enough to be a specialized search vendor. You need to be able to adapt to different situations, and let your customers adapt to them.

In addition, said Regli, the success of Google in consumer search inspired companies to ask more questions about enterprise products. And as more vendors crowd into the space, users are more curious about what differentiates one solution from another.


So now many vendors are not only opening their application programming interfaces (APIs) but also seeking more direct interaction with their clients to help them tweak the software to suit their needs. Knowing how a search algorithm works is "a pretty big deal," said Regli, because clients can ask vendors specific questions about results and then modify a tool if necessary. The example she offered: Certain documents may appear near the top of search results because the algorithm looks for words in document titles. However, a company that use dates or other numbers in its document titles may want to tweak the tool so it focuses instead on words found in abstracts.


Though Regli only briefly mentioned open source search in our interview, I discovered an Intelligent Enterprise article written by CMS Watch Kas Thomas, in which he discusses Apache Lucene. Confirming what Gries told Lora, Thomas writes that its "traction in the enterprise world is accelerating by the day." In addition to improved stability and the addition of new features, Thomas notes that budget constraints are prompting more companies to consider an open source search tool.


Among Lucene's strengths: strong oversight provided by the Apache Foundation, mature code, an active development community and (again echoing Lora's Interview with Gries) a number of high-profile users with industrial-strength search needs, such as Wikipedia. While a lack of commercially-available support and training "have been a bit of a hindrance," writes Thomas, the entry of Lucid Imagination could help change that.


Thomas notes that Lucid Imagination intends to provide certified versions of Lucene, which it has tested and debugged. Eventually, it also plans to offer a developer certification program. It will sell three different levels of support subscriptions, the most expensive of which will cost a reasonable $18,000 a year. (Consider that the Google Search Appliance costs $30,000.)