The idea of tagging web information has been around for awhile. Given that many popular web applications (e.g., Gmail, Yahoo! My Web) now support the use of tags. I believe that in the future more information on the Web will be tagged by some kind of text labels one way or the other.
So what does that mean in the context of the Semantic Web? Will the emergence of information tagging help to speed up the Semantic Web development?
I believe information tagging will bring us one step closer the Semantic Web. For a long time, Semantic Web researchers have been asking the question that given the vast amount of web information that is already existed, how are we suppose to annotate each and every single piece of this information?
I think tags can help to solve this problem.
Let’s assume that we can build automatic or semi-automatic computer programs, called pre-processors, to tag non-semantic web information. Once we have created such programs and run them against different non-semantic web information, we will be able to collect summaries of this information in the forms of tags.
Now, let’s assume that we can control this pool of possible tags that are produced by our pre-processors, and we can define ontologies (using RDF and OWL) to express the explicit semantics among these tags. For example, an ontology may define “blogging” is a subclass of “personal journal”, and “personal journal” is a subclass of “historic document”. All “personal journal” instances must have one and only one property called “author”, and its value must be a type of “person”.
Let’s say we run our pre-processor programs against my personal blog posts, and the set of tags produced by the program includes “blogging”, “harry chen”, “semantic web” etc.
Let’s also assume that people on the Web can make explicit statements about various tags. For example, as a part of my personal profile on blogger.com, it has an RDF statement says tag “harry chen” is a type of “person”.
Say some person Bob on the Web is interested to find all historic documents that are written by some person, and these documents are about the subject “the Semantic Web”.
To answer his question, Bob would define a query similar to this, “find all URI X that are instances of the historic document class, X must have one or more author, and the document that each URI X references must contain the ’semantic web’ tag.”
Because my blog posts have been tagged with “blogging”, so they are instance of the “blogging” class. Since “blogging” is a subclass of “personal journal” and “historic document”, all my blog posts are also instances of these two classes. Because I’m the author these blogs, these blogs have at least one author.
Given the above reasoning, my personal blog posts will be one of the many answers that match Bob’s query.
What do you think?
–
Posted by Harry Chen to semweb at 10/27/2005 12:01:09 AM