Erik Meijer listed eight fallacies of declarative programming in his keynote address at YOW in Melbourne this morning:

Exceptions do not exist.

Statistics are precise.

Memory is infinite.

There are no side-effects.

Schema don’t change.

There is one developer.

Compilation time is free.

The language is homogeneous.

To put these in some context, Erik made several points about declarative programming in his talk. First, “declarative” is relative. For example, if you’re an assembly programmer, C looks declarative, but if you program in some higher level language, C looks procedural. Then he argued that SQL is not as declarative as people say and that in some ways SQL is quite procedural. Finally, the fallacies listed above correspond to things that can cause a declarative abstraction to leak.

(The videos of the YOW presentations should be available in January. I haven’t heard anyone say, but I imagine the slides from the presentations will be available sooner, maybe in a few days.)

Nassim Taleb popularized the phrase black swan in his book by that name. Taleb uses a black swan as a metaphor for rare events with significant impact. Europeans assumed that all swans were white until explorers reached Australia and saw black swans.

I took this photo at Albert Park in Melbourne this morning.

Last night I was talking with someone about the pros and cons of various programming languages and frameworks for data analysis. One of the pros of Python is its elegance. The primary con is that it can be slow.

The conversation reminded me of an apocryphal exchange between Winston Churchill and Bessie Braddock.

Braddock: Winston, you are drunk.

Churchill: Yes I am. And you, Bessie, are ugly. But I shall be sober in the morning, and you will still be ugly.

Python can be slow, though there are ways to improve its performance. But ugly code is just ugly, and there’s nothing you can do about it.

Last night, several of us at YOW were discussing professional secrets, inaccuracies and omissions that are corrected via apprenticeship but rarely in writing. We were arguing over whether these secrets were the result of conspiracy or laziness. Do people deliberately conceal information to keep the uninitiated from really knowing what’s going on, or do they wave their hands because being precise takes too much energy?

I argued for the latter, a sort of variation on Hanlon’s razor: Never attribute to malice that which is adequately explained by stupidity. In this case, I didn’t want to attribute to conspiracy what could adequately be explained by laziness. Sins of omission are more common than sins of comission.

Brian Beckman’s comment on Hanlon’s razor was that there is a sort of quantum superposition of malice and stupidity. That is, you have some indeterminate mixture of malice and stupidity (or in the context of our conversation, conspiracy and laziness) that leads to the same results. This closely resembles Grey’s law that any sufficiently advanced incompetence is indistinguishable from malice. Being a physicist, Brian used a physical metaphor. He commented later that it may be possible in retrospect to determine whether some action was malicious or stupid, collapsing a sort of wave function.

There are strange signs about water usage all over Melbourne. For example:

Should I be worried? The typography implies I should. But unless you’re combining your own hydrogen and oxygen atoms, it’s all water recycled?

Here’s another one.

Again, the typography implies this is a dire warning. Rainwater in use! Beware! But rainwater is usually in use. It waters plants, cleans streets, etc. It’s very useful.

From what I gather, the intention of the signs is to convey something like this:

Don’t be upset with us during a drought because you see we have thriving plants or a beautiful lawn. We’re not using municipally treated water. We’re using rainwater we’ve captured, or gray water, etc.

I would there were no age between ten and three-and-twenty, or that youth would sleep out the rest; for there is nothing in the between but getting wenches with child, wronging the ancientry, stealing, fighting.

My previous post about logarithms has generated far more discussion than I expected. One valuable comment cites Donald Knuth’s TAOCP. While looking up the reference, I stumbled on this curiosity:

lg x ≈ ln x + log_{10}x.

In words, log base 2 is approximately natural log plus log base 10. It’s a pleasant coincidence that there’s a simple relationship between the three most commonly used logarithms.

Knuth credits the approximation to R. W. Hamming and notes that the relative error is less than 1%. In fact, it’s easy to show that the relative error is exactly equal to

The ideas for relational databases were worked out in the 1970’s and the first commercial implementations appeared around 1980. By the 1990’s relational databases were the dominant way to store data. There were some non-relational databases in use, but these were not popular. Hierarchical databases seemed quaint, clinging to pre-relational approaches that had been deemed inferior by the march of progress. Object databases just seemed weird.

Now the tables have turned. Relational databases are still dominant, but all the cool kids are using NoSQL databases. A few years ago the implicit assumption was that nearly all data lived in a relational database, now you hear statements such as “Relational databases are still the best approach for some kinds of projects,” implying that such projects are a small minority. Some of the praise for NoSQL databases is hype, but some is deserved: there are numerous successful applications using NoSQL databases because they need to, not because they want to be cool.

So why the shift to NoSQL, and why now? I’m not an expert in this area, but I’ll repeat some of the explanations that I’ve heard that sound plausible.

Relational databases were designed in an era of small, expensive storage media. They were designed to conserve a resource that is now abundant. Non-relational databases may be less conservative with storage.

Relational databases were designed for usage scenarios that not all applications follow. In particular, they were designed to make writing easier than reading. But its not uncommon for a web application to do 10,000 reads for every write.

The scalability of relational database transactions is fundamentally limited by Brewer’s CAP theorem. It says that you can’t have consistency, availability and partition tolerance all in one distributed system. You have to pick two out of three.

Part of the justification for relational databases was that multiple applications can share the same data by going directly to the same tables. Now applications share data through APIs rather through tables. With n-tier architecture, applications don’t access their own data directly through tables, much less another application’s data.

The object oriented worldview of most application developers maps more easily to NoSQL databases than to relational databases. But in the past, this objection was brushed off. A manager might say “I don’t care if it takes a lot of work for you to map your objects to tables. Data must be stored in tables.”

And why must data be stored in tables? One reason would be consistency, but #3 above says you’ll have to relax your ideas of consistency if you want availability and partition tolerance. Another reason would be in order to share data with other applications, but #4 explains why that isn’t necessary. Still another reason would be that the relational model has a theoretical basis. But so do NoSQL databases, or as Erik Meijer calls them, CoSQL databases.

I’m not an advocate of SQL or NoSQL databases. Each has its place. A few years ago developers assumed that nearly all data was best stored in a relational database. That was a mistake, just as it would be a mistake now to assume that all data should now move to a non-relational database.

Suppose you have a regular pentagon inscribed in a unit circle, and connect one vertex to each of the other four vertices. Then the product of the lengths of these four lines is 5.

More generally, suppose you have a regular n-gon inscribed in a unit circle. Connect one vertex to each of the others and multiply the lengths of each of these line segments. Then the product is n.

I ran across this theorem recently thumbing through Mathematical Diamonds and had a flashback. This was a homework problem in a complex variables class I took in college. The fact that it was a complex variables class gives a big hint at the solution: Put one of the vertices at 1 and then the rest are nth roots of 1. Connect all the roots to 1 and use algebra to show that the product of the lengths is n. This will be much easier than a geometric proof.

Let ω = exp(2πi/n). Then the roots of 1 are powers of ω. The products of the diagonals equals

|1 – ω| |1 – ω^{2}| |1 – ω^{3}| … |1 – ω^{n-1}|

You can change the absolute value signs to parentheses because the terms come in conjugate pairs. That is,

For until this moment he had lived in a state of pure possibility, not knowing what sort of man he was or what he must do, and supposing therefore that he must be all men and do everything. But after this morning’s incident his life took a turn in a particular direction. Thereafter he came to see that he was not destined to do everything but only one or two things. Lucky is the man who does not secretly believe that every possibility is open to him.

As Lawler summarizes,

Without some such closure — without knowing somehow that you’re “not destined to do everything but only one or two things” — you never get around to living.

It’s taken me a long time to understand that deliberately closing off some options can open more interesting options.

If two random variables X and Y have the same first few moments, how different can their distributions be?

Suppose E[X^{i}] = E[Y^{i}] for i = 0, 1, 2, … 2p. Then there is a polynomial P(x) of degree 2p such that

|F(x) – G(x)| ≤ 1/P(x)

where F and G are the CDFs of X and Y respectively.

The polynomial P(x) is given by

V‘ M^{-1}V

where V is a vector of dimension p+1 and M is a (p+1) × (p+1) matrix. The ith element of V is x^{i} and the (i, j) element of M is E(X^{i+j}) if we start our indexes start from 0.

Reference: “Moments determine the tail of a distribution (but not much else)” by Bruce Lindsay and Prasanta Basak, The American Statistician, Vol 54, No 4, p. 248–251.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

I occasionally get comments from people who see “log” in one of my posts and think “log base 10.” They’ll say they get a different result than I do and ask whether I made a mistake. So to eliminate confusion, let me explain my notation.

When I say “log,” I always mean natural log, that is, log base e. This is the universal convention in advanced mathematics. It’s also the convention of every programming language that I know of. If I want to use logarithms to a different base, I specify the base as a subscript, such as log_{10} for log base 10.

The reason logs base e are called natural, and the reason they’re most convenient to use, is that base e really is natural in a sense. For example, the function k^{x} is its own derivative only when k = e. And the derivative of log_{k}(x) is 1/x only when k = e.

All logarithms are proportional to each other. That is, log_{b}(x) = log_{e}(x) / log_{e}(b). That’s why we can say something is logarithmic without specifying the base. So we might as well pick the base that is easiest to work with, and most people agree that’s base e. (There are some exceptions. In computer science it’s often convenient to work with logs base 2, sometimes written lg.)

Logarithms base 10 have the advantage that they’re easy to compute mentally for special values. For example, the log base 10 of a 1,000,000 is 6: just count the zeros. So it’s good pedagogy to introduce logs base 10 first. But natural logs are simpler to use for theoretical work, and just as convenient to compute numerically.

Along these lines, when I use trig functions, I always measure angles in radians. Just like all advanced mathematics and all programming languages.

As with natural logs, radians are natural too. For example, the derivative of sine is cosine only when you work in radians. If you work in degrees, you pick up a proportionality constant every time you differentiate a trig function.

Natural logs and radian measure are related: Euler’s formula e^{ix} = cos(x) + i sin(x) assumes the base e and assumes that x measured in radians.