The Most Interesting Data-Integration Challenge Right Now? The Web

Loraine Lawson

What's the most interesting data-integration problem imaginable?


Alon Halevy, a former computer science professor at the University of Washington, believes it's integrating the Web. And by Web he doesn't just mean the crawl-able information we all know and love. He means the down-deep data hidden in databases connected to the Web, but not accessible by a search engine. This is the information used to create dynamic pages, schedule flights and so on.


It's called the Deep Web, or sometimes the Invisible Web. Of course, this isn't news to technologists, but, this week, The New York Times gives us a peak at how some in the computer science field are approaching this mother-of-all integration challenges.


Halevy is now leading a team at -- where else? -- Google that's attempting to solve this. Google's tactic is to use a program to analyze the contents of every database it encounters on the Web. Every. Database. The thought process being that you need to know what's in the database before you can decide whether to search it for information. The program works by finding a form on a Web page and then guessing at likely query terms, based on the Web site's content. Once it gets a match, "the search engine then analyzes the results and develops a predictive model of what the database contains," explains the Times.


Of course, Halevy and Google aren't the only ones trying to solve the Deep Web integration problem, though, as reigning champ of search engines, Google may have the most at stake, because whoever solves this problem would become the next King of Search Engine Hill.


But there's more at stake here. As the article points out, this technology could be a major boon for businesses and the ever-elusive Semantic Web:

This level of data integration could eventually point the way toward something like the Semantic Web, the much-promoted - but so far unrealized - vision of a Web of interconnected data. Deep Web technologies hold the promise of achieving similar benefits at a much lower cost, by automating the process of analyzing database structures and cross-referencing the results.

More from Our Network
Add Comment      Leave a comment on this blog post
Feb 26, 2009 12:46 PM James James  says:

I agree Loraine, but semantic web's final vision still looks me so far away yet as its still in infancy stage.. only a few big companies have started implementing it on trial bases but are not relying on it finally.

Feb 27, 2009 7:06 AM Darcy Darcy  says:

Great post!  I work for a company called "Deep Web Technologies" and we've been seeing a flurry of activity on the web because of this article.  We've been doing deep web searching for years!  But we don't index, which can lead to stale results.  We actually search in real-time, so while it slows the process down a bit, it ensures the newest information is returned.

We have a free business site we'd love your feedback on!


Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.