Probability of semantic markup being correct

In a comment on my post on RDFa, Daniel Lemire says

The basic problem is that RDF is metadata. … The problem with metadata is that it is wrong, misleading, too general, too specific… you name it… there is never a good match between the metadata and the user query.

I want to focus on the probability of metadata being correct, whether or not it’s useful. I think that’s an interesting angle, one I’ve not heard much about.

Think about the analogy to comments in source code. Well-written comments can be a blessing to the person who inherits the code. But comments are so often wrong that it’s best to read them skeptically. They may start out as true and helpful statements but they often get out of sync with the code they document. If you want to be sure what the code is doing, you’ve got to read the code.

I could imagine a similar state of affairs for HTML pages marked up with RDF(a). Particularly if the metadata is manually added, it could easily get out of sync with the content. Metadata generated by web authoring tools would be more trustworthy, or at least would start out more trustworthy. It’s easy to imagine a page initially created by a tool then subsequently edited by hand.

Returning to software comments, the probability of error increases with the distance between the comment and the code it documents. Comments in external documents are almost certainly wrong. Comments in headers are likely wrong. But inline comments are more likely to start out correct and remain correct over time. I imagine RDFa might be more reliable than other ways of adding metadata just because it gives you a way to embed the metadata closer to the content it describes.

One thought on “Probability of semantic markup being correct

  1. In the early 80s I was a COBOL programmer, one of 72 programmers and analysts supporting a large, bespoke company payroll program that paid 500,000 people. One of our senior programmers once gave a talk on the subject of ‘structured programming’. In it,he mentioned that in one of our programs he had seen a variable called THREE that was set to equal 2. (Presumably, it had been set to 3 sometime in the past.)

Comments are closed.