It’s been two years since IBM’s supercomputer Watson beat Jeopardy’s two best champions. We’ve long known that IBM intended to repurpose Watson for the business world, but it turns out, IBM’s baby has done a lot of growing up in those two years, as writer Jo Best explains in a recent TechRepublic article.
You may not think of Watson as a Big Data solution, but it most certainly is. Actually, I suspect it’s one reason why Big Data has so captured the imaginations of business leaders and users. By design and from the start, Watson managed whopping amounts of unstructured data (except for images) better than structured data. And it managed it quickly, thanks to its in-memory data stores and processing. Watson could analyze Jeopardy’s clues in the required sub-three-second response times, according to eWeek.
That’s an important take-away for organizations who are embracing Big Data and advanced analytics. In a recent whitepaper paper, “Why In-Memory Technology Will Dominate Big Data,” veteran IT analyst Robin Bloor notes that the “rule of thumb” is that in-memory is more than 3,300 times faster than reading from a disk.
“A simple calculation would suggest that if it takes an hour to read a set of information from disk, it would take just over a second to read it from memory,” Bloor adds.
When you think about Big Data, you’re often presented with two technology options: Hadoop clusters or in-memory. That’s an oversimplification, of course, but if I were a CEO, I’d be puzzled about the two.
Obviously, in-memory comes at a premium. Bloor writes that the cost is about 100:1 that of reading from a disk, so that a terabyte of disk is about $50 compared to $4,500 for a terabyte of in-memory.
A Hadoop cluster—depending on the nodes—would be cheaper at the base price, but there’s more to it than cost and it’s not necessarily an either/or proposition.
“In simple architectural terms, you can either move the processing to where the data is or you can move the data to where the processing is,” Bloor writes. “When you have very large amounts of data, it will obviously be better to move the processing to the data. But Hadoop is not lightning fast, nor is it easy to program. This creates a definite challenge, especially if one needs the muscle that in-memory processing provides, because neither Hadoop nor any of its components have in-memory capabilities.”
He goes on to explain that one way to solve this problem is to simply use both Hadoop and in-memory for advanced analytics processing.
“A tight integration with Hadoop essentially creates a separation of concerns,” he writes. “In other words, Hadoop is left to store and filter data, and the analytical engine can focus on complex analytics.”
For its part, Watson still uses that in-memory data and processing power, but now, it’s capable of even more.
Under the tutelage of IBM’s Software group, the supercomputer’s systems had a major makeover, according to Best. What was once 41 separate subsystems the size of a master bedroom is now the slim equivalent of the vegetable drawer in a double-drawer refrigerator with more streamlined software. It’s also now approximately 240 percent faster.
It’s even working on a medical degree, according to Best. It’s already acquired the medical knowledge of a first year medical student and IBM plans for Watson to eventually pass the general medical licensing board exams.
There’ve been other changes, too, which Best details. For instance, Jeopardy allowed Watson to use only stored data to find the answers. Watson has retained all of that data, as well as the in-memory capabilities, but it can connect to the Internet, giving it can access even more unstructured data (IBM limits it to an approved site list, blocking sites like UrbanDictionary.com, for instance).
Watson is learning how to help oncologists with treatment plans at Sloane Kettering Medical Cancer Center. CitiGroup is also exploring how to use Watson in financial services.
And, of course, IBM is exploring new uses, such as support for sales or a beefier alternative to Siri. The company recently began embedding Watson into its Smarter Planet product line.
Definitely check out the TechRepublic article—it’s an interesting, if long read. If you’re curious about in-memory, Bloor’s whitepaper is available free through Kognitio. A warning though: Several pages are devoted to Kognitio’s in-memory analytical platform. It’s interesting from an integration standpoint, since Kognitio runs on each node rather than connecting to Hadoop through an adapter. But just know that it focuses on only one solution by design.