Eight fallacies of declarative computing

Erik Meijer listed eight fallacies of declarative programming in his keynote address at YOW in Melbourne this morning:

  1. Exceptions do not exist.
  2. Statistics are precise.
  3. Memory is infinite.
  4. There are no side-effects.
  5. Schema don’t change.
  6. There is one developer.
  7. Compilation time is free.
  8. The language is homogeneous.

To put these in some context, Erik made several points about declarative programming in his talk. First, “declarative” is relative. For example, if you’re an assembly programmer, C looks declarative, but if you program in some higher level language, C looks procedural. Then he argued that SQL is not as declarative as people say and that in some ways SQL is quite procedural. Finally, the fallacies listed above correspond to things that can cause a declarative abstraction to leak.

(The videos of the YOW presentations should be available in January. I haven’t heard anyone say, but I imagine the slides from the presentations will be available sooner, maybe in a few days.)

Winston Churchill, Bessie Braddock, and Python

Last night I was talking with someone about the pros and cons of various programming languages and frameworks for data analysis. One of the pros of Python is its elegance. The primary con is that it can be slow.

The conversation reminded me of an apocryphal exchange between Winston Churchill and Bessie Braddock.

Braddock: Winston, you are drunk.

Churchill: Yes I am. And you, Bessie, are ugly. But I shall be sober in the morning, and you will still be ugly.

Python can be slow, though there are ways to improve its performance. But ugly code is just ugly, and there’s nothing you can do about it.

Quantum superposition of malice and stupidity

Last night, several of us at YOW were discussing professional secrets, inaccuracies and omissions that are corrected via apprenticeship but rarely in writing. We were arguing over whether these secrets were the result of conspiracy or laziness. Do people deliberately conceal information to keep the uninitiated from really knowing what’s going on, or do they wave their hands because being precise takes too much energy?

I argued for the latter, a sort of variation on Hanlon’s razor: Never attribute to malice that which is adequately explained by stupidity. In this case, I didn’t want to attribute to conspiracy what could adequately be explained by laziness. Sins of omission are more common than sins of commission.

Brian Beckman’s comment on Hanlon’s razor was that there is a sort of quantum superposition of malice and stupidity. That is, you have some indeterminate mixture of malice and stupidity (or in the context of our conversation, conspiracy and laziness) that leads to the same results. This closely resembles Grey’s law that any sufficiently advanced incompetence is indistinguishable from malice. Being a physicist, Brian used a physical metaphor. He commented later that it may be possible in retrospect to determine whether some action was malicious or stupid, collapsing a sort of wave function.

Related post: Hanlon’s razor and corporations

Water signs

There are strange signs about water usage all over Melbourne. For example:

Should I be worried? The typography implies I should. But unless you’re combining your own hydrogen and oxygen atoms, it’s all water recycled?

Here’s another one.

Again, the typography implies this is a dire warning. Rainwater in use! Beware! But rainwater is usually in use. It waters plants, cleans streets, etc. It’s very useful.

From what I gather, the intention of the signs is to convey something like this:

Don’t be upset with us during a drought because you see we have thriving plants or a beautiful lawn. We’re not using municipally treated water. We’re using rainwater we’ve captured, or gray water, etc.

Approximation relating lg, ln, and log10

My previous post about logarithms has generated far more discussion than I expected. One valuable comment cites Donald Knuth’s TAOCP. While looking up the reference, I stumbled on this curiosity:

lg x ≈ ln x + log10 x.

In words, log base 2 is approximately natural log plus log base 10. It’s a pleasant coincidence that there’s a simple relationship between the three most commonly used logarithms.

Knuth credits the approximation to R. W. Hamming and notes that the relative error is less than 1%. In fact, it’s easy to show that the relative error is exactly equal to

1 – (1 + 1/ln 10) ln 2 ≈ 0.0058

for all x.

Related post: The most interesting logs in the world

Digits in powers of 2

Does the base 10 expansion of 2^n always contain the digit 7 if n is large enough?

As of 1994, this was an open question (page 196 here). I don’t know whether this has since been resolved.

The following Python code suggests that the conjecture may be true for n ≥ 72.

def digits(n):
    s = set()
    while n > 0:
        s.add(n%10)
        n /= 10
    return s

for i in range(71, 10000):
    p = 2**i
    if 7 not in digits(p):
        print i, p

Update: It appears that 2^n contains every digit for n > 168. See this comment.

Related post: Open question turned into exercise

Rise and Fall of the Third Normal Form

relational database

The ideas for relational databases were worked out in the 1970’s and the first commercial implementations appeared around 1980. By the 1990’s relational databases were the dominant way to store data. There were some non-relational databases in use, but these were not popular. Hierarchical databases seemed quaint, clinging to pre-relational approaches that had been deemed inferior by the march of progress. Object databases just seemed weird.

Now the tables have turned. Relational databases are still dominant, but all the cool kids are using NoSQL databases. A few years ago the implicit assumption was that nearly all data lived in a relational database, now you hear statements such as “Relational databases are still the best approach for some kinds of projects,” implying that such projects are a small minority. Some of the praise for NoSQL databases is hype, but some is deserved: there are numerous successful applications using NoSQL databases because they need to, not because they want to be cool.

So why the shift to NoSQL, and why now? I’m not an expert in this area, but I’ll repeat some of the explanations that I’ve heard that sound plausible.

  1. Relational databases were designed in an era of small, expensive storage media. They were designed to conserve a resource that is now abundant. Non-relational databases may be less conservative with storage.
  2. Relational databases were designed for usage scenarios that not all applications follow. In particular, they were designed to make writing easier than reading. But its not uncommon for a web application to do 10,000 reads for every write.
  3. The scalability of relational database transactions is fundamentally limited by Brewer’s CAP theorem. It says that you can’t have consistency, availability and partition tolerance all in one distributed system. You have to pick two out of three.
  4. Part of the justification for relational databases was that multiple applications can share the same data by going directly to the same tables. Now applications share data through APIs rather through tables. With n-tier architecture, applications don’t access their own data directly through tables, much less another application’s data.

The object oriented worldview of most application developers maps more easily to NoSQL databases than to relational databases. But in the past, this objection was brushed off.  A manager might say “I don’t care if it takes a lot of work for you to map your objects to tables. Data must be stored in tables.”

And why must data be stored in tables? One reason would be consistency, but #3 above says you’ll have to relax your ideas of consistency if you want availability and partition tolerance. Another reason would be in order to share data with other applications, but #4 explains why that isn’t necessary. Still another reason would be that the relational model has a theoretical basis. But so do NoSQL databases, or as Erik Meijer calls them, CoSQL databases.

I’m not an advocate of SQL or NoSQL databases. Each has its place. A few years ago developers assumed that nearly all data was best stored in a relational database. That was a mistake, just as it would be a mistake now to assume that all data should now move to a non-relational database.

More database posts