Can Semantics Tech Eliminate the Need for Data Integration?


Much of my adult life, I walk around in denial about what a complete nerd I am. I make excuses-I write about technology, I don't technically work in it. Yeah, I'm on Twitter, Facebook, LinkedIn, and LaLa, but I don't have a Kindle, Treo or Blackberry. And sure, the Origins gaming con was the best time I've had in a long time, and I kicked it old school with D&D -- but I did not dress up for live action role playing (LARP).


Ah, the splitting of semantic hairs. The truth is, I'm four levels in on the Geek Hierarchy, just above Trekkies who speak Klingon and LARPers.


I'm a lost cause, it seems, so I might as well fess up to the fact semantic technology fascinates me, particularly when you move beyond the over-hyped "Web semantics" stuff and look at real applications-like, for instance, this recent article on Bio-IT World.


It's an interview with Ted Slater, who heads a small group of informatics scientists at Pfizer's Indications and Pathways Center of Emphasis (IPCoE). Basically, this group supports Pfizer's pharmaceutical research. Using semantic technologies, the team developed a new system called "Pfizer Environment for Knowledge Engineering," or PEKE.


While we may be a decade or more out from a semantic Web, Gartner believes semantic technology will emerge as one of the 10 most disruptive technologies in the next four years. PEKE makes me think they could be right.


Interestingly, what drove the team to create the PEKE architecture was the simple realization that data integration is like trying to drink from a fire hose-there's too much data, silos are created too quickly, and integration work can't keep up. Or, as Slater explained it:

"We constantly hear that the Holy Grail is complete data integration ... I have bad news - it will never happen! Users are able to set up and start building new, independent repositories of data faster than we can integrate existing data. You will never be able to get it all in one place where it is integrated and usable. The goal instead should be data that are interoperable, even if they are not integrated.

PEKE addresses this by letting Pfizer create new knowledge bases simply and quickly, according to Slater.


The article offers an bird's eye view of how PEKE works. Slater adapted the semantic RDF format to represent the data as a mathematical graph. The team also used open source ontology development tools and Cytoscape, which is also open source, to view the data graphically. They also employed Oracle's RDF data model.


I admit I don't really understand it. I also still don't completely understand how a computer translates 0s and 1s to generate, say, this blog post or an Excel spreadsheet. But I can still appreciate how cool both are, in a completely geeky way.