There is a fundamental disconnect between the humans that use the Internet and the computers that enable it. The problem is that we don't speak the same language. While computers are very structured and specific, human communication is chaotic and nuanced. This goes beyond the limitations electronic communications imposes on us by denying us most of the non-informational components of communication, like tone of voice or facial expression. Even with these limits humans have the ability to comprehend multiple levels of relevance beyond the 'raw data' (the specific words) in a statement.
This is not a huge problem when you are communicating directly with someone, convention and substitutes allow us to overcome some of the electronic limits and communicate some of the subtlety. As the communication is targeted, it is also reasonable to assume that the person we are communicating with will receive and read the information (the key word being person). Our culture has developed online to include electronically specific conventions to show when we are being sarcastic

if we are SHOUTING, or if we are amused LOL. We even add levels of nuance; ROFL does not mean I am literally rolling on the floor, but I am slightly more amused than if I am only LOL. If I am amused enough to actually chuckle I might ROFLMAO.
The inspecific nature of language becomes more of a problem when we are searching for information from an electronic source. While it is possible for information we store to include te level of specificity required for simple search, the extra effort this would require would be extreme. There is also the problem that the data we want may have been stored informally, the writer not expecting that it would have relevance. The market then appears for search engines and tools. These tools can collate large amounts of information to enable us to sort through it with the aid of a short search phrase to find only those bits that are relevant to us. Sometimes this works well while other times this may be an exercise in frustration. The problem is that the search engines are looking for specifics while we are looking for a context.
As an example below is an excerpt from the Wikipedia entry on
Ray Charles
"When he was six, Charles began to go blind, becoming totally blind by the age of seven. Charles never knew exactly why he lost his sight, though there are sources which suggest Ray's blindness was due to glaucoma. He attended school at the St. Augustine School for the Deaf and the Blind in St. Augustine, Florida. He also learned how to write music and play various musical instruments. While he was there, his mother died. His father died two years later."
While it includes the words "Ray" once and "Charles" twice it never directly combines the two. While Ray is directly referred two 7 more times in the sentence, it is in the form of a pronoun (he or his). A search ranking on base word count will rank this higher as an entry on St. Augustine or blind/ness due to the greater prevelance of these words. To a human this paragraph is obviously completely about Ray Charle's youth but a computer struggles to see this without extra information.
The goal of the
semantic web is to have information posted to the web available in a form that is understandable by both humans and machines. In order for the machines to understand it though, there needs to be a structure it can use for that understanding. Essentially this means embedding that structure into the document/posting that enables this. While languages like XML provide structures to enable this, it still depends on the old GIGO axiom. Garbage in, garbage out, if the information is not populated or incorrect than the context is not communicated or worse, a wrong context is given. For this to work the adding of this structure needs to be trivial or automated.
The automation of contextualisation is a huge focus for a number of companies. CRM (customer relationship management) and KM (knowledge Management) software/service vendors are particularly interested in this area and base an element of their value proposition on how well their products cope with context. Recently though Reuters has announced the
release of their semantic engine to the world. In providing a semantic engine that is available for developers to include in their products and services they have achieved two noble things. It will increase the speed of development of new methods and algorithms. It will also increase the range of products and services that will include semantics.
Tim Berners-Lee believes that the semantic web will be the next generation of the Internet. What I am waiting to see is some implementations. The proof will not be in whether it is possible. Instead the success of this new information categorisation engine will be in how easy it is to fool. It would be a definite benefit to have information correctly categorised and searchable by the true context of the information, if it is gameable there are people that will set tags or labels that falsely direct you to their dodgy sites just like they do with adwords and search engine games today. Even in this situation semantics engines will still be extremely useful for internal business tools.