In Some Search Software, It All Depends on the Meaning of 'It'


From Matthew Broderick's War Games to the Tom Cruise movie The Minority Report, films like to make it seem like IT has human qualities. But you know different.


IT doesn't infer anything from the number, character and color coding it pushes around inside the box. Or relate one set of bits to the other. Or conceptualize. It just moves zeros and ones around the way IT guys tell it to. In other words, forget the movies, IT can't "think."


But I talked with a company CEO on Sept. 26 who is working on the character-code part of the getting-computers-to-think challenge. Or at least getting them to infer, relate and conceptualize.


Jeff Catlin is CEO of Massachusetts-based Lexalytics, which is about to become part of London-based Lexalytics Limited, a subsidiary of Infonic (LSE: IFNC). The two companies have been working separately to develop similar, and they hope symbiotic, technology for semantic search and text analytics.


In the near term, the applications will be working with character codes that are strung into words. And they have to be English words. And Catlin's software is still having a bit of a problem with the word "it." Lexalytics' ability to work with little words might sound trivial but it is the real challenge in text analytics. Take for example my use of the pronoun "it" halfway through the previous sentence. I used the word rather than repeating "ability." But some text analytics software might have inferred I meant Lexalytics, the company, or its software instead.


Despite the "it" issue, Lexalytics' latest software, due out in October, has a real good handle on "he" and "she." Which means semantic search and text analytics software is coming to an IT application near you sooner rather than later. Both IT managers and marketers want to integrate it into enterprise applications for the same reason they have been integrating numbers-oriented analytical applications for the last decade: make data cleaner, extract metadata and spot trends and patterns in unstructured data.


Text analytics goes further than traditional analytics because it can enhance the power of now-ubiquitous search engines, whereas traditional analytics work primarily with ERP and like applications dependent on structured data. Lexalytics' technology does not stand alone but works with partners such as Microsoft FAST and Endeca. Its engine works fundamentally like a human; that is, inferentially. It is not scoring-based, for example, like a Fair Issac Corp (FICO)-based credit application. Instead, Lexalytics' software is based on sentence structure and has a part-of-speech tagger, draws pertinent metadata from content (people, dates, etc.) and mines relationships (e.g., relating a person's job title to the company he or she works for).


One final factoid to consider: In handling the he-she-it thing mentioned above, so far the Lexalytics software does best understanding "she" in context. That's because "she" is mentioned less often in unstructured data. Another glass ceiling to crack, ladies.