Could HTTP Be the Key to Deep Web Data Integration?


Could HTTP and URI be the key to integrating that last, hidden piece of the Web-all the data hidden in databases, connected but currently inaccessible?


Yes, if we'll all do our part to make it happen, according to Sir Tim Berners-Lee, director of the World Wide Web Consortium, and the man widely recognized as inventing the World Wide Web.


ReadWriteWeb recently took a look at how Linked Data, a W3C project for adding all this information to the Web, would work and why you should care. The post includes an embedded, 15-minute presentation on the topic, which was given by Berners-Lee at the 2009 TED Conference.


It's a concept that would go a long way toward solving the so-called most challenging integration problem-the Deep Web, but it's also an important step toward the Semantic Web, according to Greg Boutin, founder of Growthroute Ventures and a regular blogger at Semantics Incorporated:


"The key idea of this post is that Linked Data offers a new medium to link structured data that is then more machine-readable. It does not by itself add any semantic meaning to the information, but it better carries that semantic information once you have it."


The idea is a pretty simple concept really, and that's part of why it's so fascinating. Basically, there are four rules to make data work:


  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names-or, as Berners-Lee explains in the presentation, http names are no longer just for documents, but also given to the people, places and things in those documents.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs, so that they can discover more things.


The W3C has more details on its Linked Data page, which was is also written by Berners-Lee. But for those of you who simply want an overview, the ReadWriteWeb article does a great job of explaining Linked Data's goals and how it would work. There's also a nice graphic representation showing how quickly the concept has attracted new participants.


It's pretty easy to see why making this information accessible would benefit lots of people once it's online. The big question, of course, is why anyone would go to all the trouble to make it available, and that's the very question Berners-Lee tackles in his presentation. I thought he made two really good points here:


  1. Governments should do it, because the public paid for that data and deserves to have access to it. To that end, Berners-Lee urges the audience at TED to embrace the slogan, "Raw data now!"
  2. People will do it because there are enough people who like to participate in this sort of project.


Berners-Lee believes it can be built bit by bit, but at some point, the effort will reach a critical mass that will draw the rest of us in:


"That is what Linked Data is all about-it's about people doing their bit to produce a little bit and it all connecting-that's how Linked data works. You do your bit and everybody else does theirs."