You can’t force people to provide metadata

I ran across a long rant from Steve Yegge this evening about junior programmers. In a nutshell, Yegge says they like to play around with metadata rather than getting real work done.

Here’s an insightful observation Yegge makes along the way.

And Haskell, OCaml and their ilk … try to force people to model everything. Programmers hate that. These languages will never, ever enjoy any substantial commercial success, for the exact same reason the Semantic Web is a failure. You can’t force people to provide metadata for everything they do. They’ll hate you.

Related post: Probability of semantic markup being correct

9 thoughts on “You can’t force people to provide metadata

  1. Good point… some programmers tend to get greedy regarding metadata. Forcing users to consciously annotate and reveal isn’t realistic. There are some privacy concerns here, of course.

  2. The funny thing is, this is exactly what OO languages like C++, C# and Java force the programmer to do. Type definitions everywhere.

  3. Rick, Yegge mentioned C++ and Java as languages that invite the programmer to get lost in the type system. His complaint is not the familiar one that it takes more keystrokes to declare variable types. His complaint is that programmers are tempted to spend a tremendous amount of time creating custom types, hoping to get the type system to catch logic errors. But I think he likes Python’s light-weight OO.

  4. Ahh I see. I haven’t spent much time with Python. Although, MIT adopting it as their main language must mean good things :).

    In the past I’ve used Ruby, TCL, PHP and Javascript and found repeatedly that the lack of type safety let otherwise obvious bugs go undiscovered. Some of these statically typed FP languages are somewhere in the middle I think. The type inference lets the code feel more dynamic.

    Perhaps the issue was I had never even heard of TDD at that point. However, don’t tests cause the same kinds of structural rigidity that you are trying to avoid by using a dynamic language?

  5. Some people argue that static type checking is just a weak form of unit testing, and that dynamically typed code with unit tests is safer than statically typed code without. That has a ring of truth to it, but some people will write tests no matter what language they’re using, and some will not.

    That’s an interesting point about tests forcing you into structural rigidity. It would be hard to know, since it’s a behavioral issue. In theory, you should continually refactor your tests as you refactor your code. But I imagine you’re right: the more tests you have, the less willing you’re going to be to make a change that makes you revise your tests.

    The strongest criticism I’ve seen of unit tests is that they they cannot catch three out of four bugs. They’re valuable for the one in four bugs they can catch, but QA has got to be a lot more than unit testing.

  6. When you say “today,” do you mean sometime in February of 2008? Or am I reading the wrong blog entry?

  7. James, you’re absolutely right. I discovered the link today and thought it was a new post. I saw “Sunday, February …” at the top of the page and didn’t look closely enough to see that it said “Sunday, February 10, 2008.” I’ll edit the post to correct my mistake. Thanks for pointing out the error.

  8. It is too early to write off the Semantic Web as a failure, it is just passing the bottom of the Gartner hype curve. Also, the reason why we’re not seeing too much of it isn’t what you think. It isn’t that there isn’t enough useful data, and it isn’t that there aren’t enough incentives to provide useful data. There is more than enough data to do truly useful things. There are many reasons why we’re not seeing more of it are many, one important reason is that customers do not demand solutions to the problems they have tomorrow. But the big reason, I would say, is that there is a huge gap between what businesses label research and what scientists label research. In this gap, we find dynamic OWL-class to OO Class mapping tools, nice templating languages, software stacks for many environments that programmers actually like, and so on. These are not really research problems, but they are also problems that commercial problems tend to tackle by only ad hoc solutions. thus there has been very little progress.

  9. The point is that most meta-data is in fact useless, because even if it is there there are no tools to make use of it, like some automatic verification or code generation etc.

    Here’s an example. Assume that you have to write a simple text calculator that allows definitions of functions, storing variables, has reasonable number of builtins and provides meaningful error messages. Furthermore you have to implement it in c.

    So you are probably thinking flex & bison (or lex & yacc if you prefer). Of course this is the best option given current tools. The only excuse to do it by hand is to learn how algorithms in those tools actually work.

    But the flex and bison sources are meta-data for the resulting c program and this meta-data is very useful as it allows to get the work done. So it is not the point of meta-data itself. It is the question ‘will providing meta-data give me any benefit?’ that should be answered. If the answer is ‘yes’ then provide meta-data. If the answers is ‘no’ then why bother with doing something that won’t give any benefit?

Leave a Reply

Your email address will not be published. Required fields are marked *