The smallest uninteresting number and fuzzy logic

I’ve tried to think of something interesting about the number 2013 and haven’t come up with anything. This reminds me of the interesting number paradox.

Theorem: All positive integers are interesting.

Proof: Let n be the smallest uninteresting positive integer. Then n is interesting by virtue of being the smallest such number.

The interesting number paradox is semi-serious, and so is the resolution I propose below. Both are jokes, but they touch on some serious ideas.

“Interestingness” is not an all-or-nothing property. Some numbers are more interesting than others, so perhaps we should use fuzzy logic to quantify how interesting a number is, say on a scale from 0 to 1.

For a given ε > 0, define as interesting the set of numbers whose interestingness is greater than ε. Suppose the interestingness of numbers trails off after some point. (Otherwise, if the interestingness dropped sharply, the first number after the drop would be interesting.) The largest interesting number then is barely interesting. The number one larger than a barely interesting number is even less interesting. So the proof of the interesting number paradox doesn’t apply in the continuous setting.

On a more serious note, many paradoxes in mathematics can be resolved by replacing a binary criterion with a continuous one.

For example, the sum of a trillion continuous functions is continuous, but the infinite sum of continuous functions may not be. How can that be? The problem is that we’re viewing continuity as an all-or-nothing property. If you have a series of continuous functions that converges to a discontinuous limit, the degree of continuity must be degrading. The partial sum after some large number of terms is continuous, but not very continuous. The modulus of continuity of each partial sum is finite, but is getting larger, and is infinite in the limit.

Classical statistics is filled with yes-no concepts that make more sense when replaced with continuous measures. For example, instead of asking whether an estimator is biased, it’s more practical to ask how biased it is.

Computer science is often concerned with whether something can be computed (i.e. exactly). But sometimes it’s more important to ask how well something can be computed. Many things that cannot be computed in theory can be computed well enough in practice.

Related post: How to solve supposedly intractable problems

Extreme change is easier

This last week I ran across a TED video about a couple who had a house full of stuff and $18,000 in debt. They sold all their stuff except what could fit in a couple bags and went backpacking in Australia.

Good for them for having the courage to make a big change. I am impressed, but I’d be more impressed if they had sold their new home and moved into another one 20 years older and half the size.

It’s easier to get rid of all your stuff than half your stuff. If you get rid of all your stuff, you’re deciding to hire other people to meet your needs. You can get rid of your house if you’re willing to rent your shelter from hotels. You can get rid of your pots and pans if you’re willing to pay restaurants to prepare your food with their pots and pans. You can get rid of your car if you’re willing to pay a cab driver to take you everywhere you need to go. Moving into a smaller home, with fewer pots and pans, and selling one of your two cars may be harder.

I don’t know whether these folks are still living as tourists. But if they haven’t bought another house yet, they probably will some day, though maybe one much smaller than their first house. The sequence

large house -> no house -> small house

may be easier than

large house -> small house.

Extreme change is often easier than moderate change, for better or for worse. Extreme change can be more impressive, so people who sell everything get invited to talk at TED, whereas people who cut their living expenses by 20% and slowly pay off their debts get 30 seconds on the Dave Ramsey Show. People who sacrifice to achieve their goals slowly while maintaining their responsibilities are less impressive at first glance, but more impressive after more thought.

Extreme change can also be temporary. Lottery winners go bankrupt. People on starvation diets end up heavier than ever. One extreme change can lead to another extreme change in the opposite direction.

However, you can also use the ease of extreme change to your advantage. The book Change or Die is all about making extreme changes wisely. (The book grew out of this article.) Radical change requires fewer decisions, and leads to encouraging results sooner. Along those lines, I love the story of Eric Coyle, a mediocre student who suddenly became extremely motivated and took up to 64 credit hours in a semester.

Related posts:

Ideas for blog posts

When George Will began his career as a syndicated columnist, he asked his editor William Buckley how he could ever come up with two columns per week. Buckley replied that at least twice a week something would annoy him [Will], and he just needed to write about it.

I don’t write about what I find annoying, but rather what I find interesting. I’m always running into things I find interesting, and sometimes I write about them.

One strategy for coming up with ideas for blog posts is to fill in details from your reading. I’ve done this several times lately. And many of my programming posts come from my research to fill in gaps or resolve ambiguity in software documentation. You can also subtract detail, i.e. write summaries, but posts that add detail are likely to be more original.

Lucky house prices

Here’s an interesting tidbit on the least significant digits of house prices.

In Nevada, the last non-zero number in the selling price of a house is a lucky seven 37 percent more often than in the rest of the country. 777 is used three times more often than in the rest of the country. … In neighborhoods with a majority of Asian people, the asking price for homes ends in the lucky number eight 20 percent of the time, compared with 4 percent in other neighborhoods.

From “While we’re at it” by David Mills, First Things, January 2013.

Napier’s mnemonic

John Napier (1550–1617) discovered a way to reduce 10 equations in spherical trig down to 2 equations and to make them easier to remember.

Draw a right triangle on a sphere and label the sides a, b, and c where c is the hypotenuse. Let A be the angle opposite side a, B the angle opposite side b, and C the right angle opposite the hypotenuse c.

There are 10 equations relating the sides and angles of the triangle:

sin a = sin A sin c = tan b cot B
sin b = sin B sin c = tan a cot A
cos A = cos a sin B = tan b cot c
cos B = cos b sin A = tan a cot c
cos c = cot A cot B = cos a cos b

Here’s how Napier reduced these equations to a more memorable form. Arrange the parts of the triangle in a circle as below.

Then Napier has two rules:

  1. The sine of a part is equal to the product of the tangents of the two adjacent parts.
  2. The sine of a part is equal to the product of the cosines of the two opposite parts.

For example, if we start with a, the first rule says sin a = cot B tan b. (The tangent of the complementary angle to B is the cotangent of B.) Similarly, the second rule says that sin a = sin c sin A. (The cosine of the complementary angle is just the sine.)

For a more algebraic take on Napier’s rules, write the parts of the triangle as

(p1, p2, p3, p4, p5) = (a , b, co-A, co-c, co-B).

Then the equations above can be reduced to

sin pi = tan pi-1 tan pi+1 = cos pi+2 cos pi+3

where the addition and subtraction in the subscripts is carried out mod 5. This is just using subscripts to describe the adjacent and opposite parts in Napier’s diagram.

Source: Heavenly Mathematics

Related posts:

Spotting sensitivity in an equation

The new book Heavenly Mathematics describes in the first chapter how the medieval scholar Abū Rayḥān al-Bīrūnī calculated the earth’s radius. The derivation itself is interesting, but here I want to expand on a parenthetical remark about the calculation.

The earth’s radius r can be found by solving the following equation.

costheta = frac{r}{r + 305.1m}

The constant in the denominator comes from a mountain which is 305.1 meters tall. The angle θ is known to be 34 minutes, i.e. 34/60 degrees. Here is the remark that caught my eye as someone more interested in numerical analysis than trigonometry:

There is a delicate matter hidden in this solution however: a minute change in the value of θ results in a large change in the value of r.

How can you tell that the solution is sensitive to changes (i.e. measurement errors) in θ? That doesn’t seem obvious.

Think of r as a function of θ and differentiate both sides of the equation with respect to θ. We’ll convert θ to radians because that’s what we do. (Explanation at the bottom of this post.) We get

-sintheta = frac{305.1m}{(r + 305.1m)^2} frac{dr}{dtheta}


frac{dr}{dtheta} = -sintheta frac{(r + 305.1m)^2}{305.1m}

Now let’s get a feel for the size of the terms in this equation. θ is approximately 0.01 radians, and so sin θ is approximately 0.01 as well. (See explanation here.) The radius of the earth is about 6.4 million meters. So the right side of the equation above is about 1.3 billion meters, i.e. it’s big.

A tiny increase in θ leads to a large decrease in r. For example, if our measurement of θ increased by 1%, from 0.01 to 0.0101, our measurement of the earth’s radius would decrease by 130,000 meters.

I’d like to point out a couple things about this analysis. First, it shows how it can be useful to think of constants as variables. After measuring θ we could think that we know its value with certainty and treat it as a constant. But a more sophisticated analysis takes into account that while θ might not change, our measurement of θ has changed from the true value.

Second, we used the radius of the earth to determine how sensitive our estimate of the earth’s radius is to changes in θ. Isn’t that circular reasoning? Not really. We can use a very crude estimate of the earth’s radius to estimate how sensitive a new estimate is to changes in its parameters. You always have some idea how big a value is before you measure it. If you want to measure the distance to the moon, you know not to pick up a yard stick.

Click to find out more about consulting for numerical computing


Basics of Sweave and Pweave

Sweave is a tool for embedding R code in a LaTeX file. Pweave is an analogous tool for Python. By putting your code in your document rather than the results of running your code somewhere else, results are automatically recomputed when inputs change. This is especially useful with graphs: rather than including an image into your document, you include the code to create the image.

To use either Sweave or Pweave, you create a LaTeX file and include source code inside. A code block begins with <<>>= and ends with @ on a line by itself. By default, code blocks appear in the LaTeX output. You can start a code block with <<echo=FALSE>>= to execute code without echoing its source. In Pweave you can also use <% and %> to mark a code block that executes but does not echo. You might want to do this at the top of a file, for example, for import statements.

Sweave echos code like the R command line, with > for the command prompt. Pweave does not display the Python >>> command line prompt by default, though it will if you use the option term=TRUE in the start of your code block.

In Sweave, you can use Sexpr to inline a little bit of R code. For example, $x = Sexpr{sqrt(2)}$ will produce x = 1.414…. You can also use Sexpr to reference variables defined in previous code blocks. The Pweave analog uses <%= and %>. The previous example would be $x = <%= sqrt(2) %>$.

You can include a figure in Sweave or Pweave by beginning a code block with <<fig=TRUE, echo=FALSE>>= or with echo=TRUE if you want to display the code that produces the figure. With Sweave you don’t need to do anything else with your file. With Pweave you need to add usepackage{graphicx} at the top.

To process an Sweave file foo.Rnw, run Sweave("foo.Rnw") from the R command prompt. To process a Pweave file foo.Pnw, run Pweave -f tex foo.Pnw from the shell. Either way you get a LaTeX file that you can then compile to a PDF.

Here are sample Sweave and Pweave files. First Sweave:


Invisible code that sets the value of the variable $a$.

a <- 3.14

Visible code that sets $b$ and squares it.

<<bear, echo=TRUE>>=
b <- 3.15

Calling R inline: $\sqrt{2} = Sexpr{sqrt(2)}$

Recalling the variable $a$ set above: $a = Sexpr{a}$.

Here's a figure:

<<fig=TRUE, echo=FALSE>>=
x <- seq(0, 6*pi, length=200)
plot(x, sin(x))


And now Pweave:


import matplotlib.pyplot as plt
from numpy import pi, linspace, sqrt, sin

Invisible code that sets the value of the variable $a$.

a = 3.14

Visible code that sets $b$ and squares it.

b = 3.15
print b*b

Calling Python inline: $\sqrt{2} = <%= sqrt(2) %>$

Recalling the variable $a$ set above: $a = <%= a %>$.

Here's a figure:

<<fig=TRUE, echo=FALSE>>=
x = linspace(0, 6*pi, 200)
plt.plot(x, sin(x))


Related links:


Beethoven, Beatles, and Beyoncé: more on the Lindy effect

This post is a set of footnotes to my previous post on the Lindy effect. This effect says that creative artifacts have lifetimes that follow a power law distribution, and hence the things that have been around the longest have the longest expected future.

Works of art

The previous post looked at technologies, but the Lindy effect would apply, for example, to books, music, or movies. This suggests the future will be something like a mirror of the present. People have listened to Beethoven for two centuries, the Beatles for about four decades, and Beyoncé for about a decade. So we might expect Beyoncé to fade into obscurity a decade from now, the Beatles four decades from now, and Beethoven a couple centuries from now.


Lindy effect estimates are crude, only considering current survival time and no other information. And they’re probability statements. They shouldn’t be taken too seriously, but they’re still interesting.

Programming languages

Yesterday was the 25th birthday of the Perl programming language. The Go language was announced three years ago. The Lindy effect suggests there’s a good chance Perl will be around in 2037 and that Go will not. This goes against your intuition if you compare languages to mechanical or living things. If you look at a 25 year-old car and a 3 year-old car, you expect the latter to be around longer. The same is true for a 25 year-old accountant and a 3 year-old toddler.

Life expectancy

Someone commented on the original post that for a British female, life expectancy is 81 years at birth, 82 years at age 20, and 85 years at age 65. Your life expectancy goes up as you age. But your expected additional years of life does not. By contrast, imagine a pop song that has a life expectancy of 1 year when it comes out. If it’s still popular a year later, we could expect it to be popular for another couple years. And if people are still listening to it 30 years after it came out, we might expect it to have another 30 years of popularity.

Mathematical details

In my original post I looked at a simplified version of the Pareto density:

f(t) = c/tc+1

starting at t = 1. The more general Pareto density is

f(t) = cac/tc+1

and starts at t = a. This says that if a random variable X has a Pareto distribution with exponent c and starting time a, then the conditional distribution on X given that X is at least b is another Pareto distribution, now with the same exponent but starting time b. The expected value of X a priori is ac/(c-1), but conditional on having survived to time b, the expected value is now bc/(c-1). That is, the expected value has gone up in proportion to the ratio of starting times, b/a.

The value of typing code

Tommy Nicholas recently wrote a blog post advocating typing rather than copying-and-pasting code samples. I thought this was the most interesting paragraph from the post:

When Hunter S. Thompson was working as a copy boy at Time Magazine in 1959, he spent his spare time typing out the entire Great Gatsby by F. Scott Fitzgerald and A Farewell to Arms by Ernest Hemingway in order to better understand what it feels like to write a great book. To be able to feel the author’s turns in logic and storytelling weren’t possible from reading the books alone, you had to feel what it feels like to actually create the thing. And so I have found it to be with coding.

Joe Armstrong had similar advice:

Forget about the tools … buy a decent book and type in the programs by hand. One at a time thinking as you go. After 30 years you will get the hang of this and be a good programmer.

Typing code may be like riding a bicycle. I’m surprised how much more detail I see the first time I ride my bicycle over a road I’ve driven on, mostly because I’m moving slower but also because there’s an additional muscular dimension to the experience.

Another advantage to typing example code is that you’ll make mistakes, just as you will in real work. This will give you the opportunity to see what happens and to learn debugging, even though you may not appreciate the opportunity.

The Lindy effect

The longer a technology has been around, the longer it’s likely to stay around. This is a consequence of the Lindy effect. Nassim Taleb describes this effect in Antifragile but doesn’t provide much mathematical detail. Here I’ll fill in some detail.

Taleb, following Mandelbrot, says that the lifetimes of intellectual artifacts follow a power law distribution. So assume the survival time of a particular technology is a random variable X with a Pareto distribution. That is, X has a probability density of the form

f(t) = c/tc+1

for t ≥ 1 and for some c > 0. This is called a power law because the density is proportional to a power of t.

If c > 1, the expected value of X exists and equals c/(c-1). The conditional expectation of X given that X has survived for at least time k is ck/(c-1). This says that the expected additional life X is ck/(c-1) – k = k/(c-1), and so the expected additional life of X is proportional to the amount of life seen so far. The proportionality constant 1/(c-1) depends on the power c that controls the thickness of the tails. The closer c is to 1, the longer the tail and the larger the proportionality constant. If c = 2, the proportionality constant is 1. That is, the expected additional life equals the life seen so far.

Note that this derivation computed E( X | X > k ), i.e. it only conditions on knowing that X > k. If you have additional information, such as evidence that a technology is in decline, then you need to condition on that information. But if all you know is that a technology has survived a certain amount of time, you can estimate that it will survive about that much longer.

This says that technologies have different survival patterns than people or atoms. The older a person is, the fewer expected years he has left. That is because human lifetimes follow thin-tailed distributions. Atomic decay follows a medium-tailed exponential distribution. The expected additional time to decay is independent of how long an atom has been around. But for technologies follow a thick-tailed distribution.

Another way to look at this is to say that human survival times have an increasing hazard function and atoms have a constant hazard function. The hazard function for a Pareto distribution is c/t and so decreases with time.

Update: Beethoven, Beatles, and Beyoncé: more on the Lindy effect


Click to learn more about Bayesian statistics consulting

Small batch sizes II

A few days ago I wrote about an example from a presentation by Don Reinertsen on the benefits of small batch sizes. Nassim Taleb brings up similar ideas in Antifragile. He opens one chapter with the following rabbinical story.

A king, angry at his son, swore that he would crush him with a large stone. After he calmed down, he realized he was in trouble, as a king who breaks his oath is unfit to rule. His sage advisor came up with a solution. Have the stone cut into very small pebbles, and have the mischievous son pelted with them.

The harm done by being hit with a stone is a nonlinear function of the stone’s size. A stone half the size does less than half the harm. Cutting the stone into pebbles makes it harmless.

Related post: Appropriate scale

Most popular pages

Here are the most popular pages on my web site that are not blog posts.

Programming language notes:

Probability and statistics:



Two views of modernity

Here are a couple descriptions of modernity that I’ve run across lately and found interesting.

First, from Eva Brann:

Now what is actually meant by “modern times?” The term cannot just mean “contemporary” because all times are con-temporary with themselves. Modern is a Latin word which means “just now.” Modern times are the times which are in a special way “just now!” Modernity is just-nowness, up-to-date-ness.

… We live differently in our time from the way those who came before us lived in theirs. For instance, when we speak of something or even someone as being “up to date” we are implying that what time it is, is significant, that time marches, or races, on by itself, and we have the task of keeping up with it. Our time is not a comfortable natural niche within the cycle of centuries, but a fast sliding rug being pulled out from under us.

Furthermore, we have a sense of the extraordinariness of our times … Modernity itself is, apparently, a way of charging the Now with special significance.

Second, from Nassim Taleb:

Modernity corresponds to the systematic extraction of humans from their randomness-laden ecology. … It is rather the spirit of an age marked by rationalization (naive rationalism), the idea that society is understandable, hence must be designed, by humans. With it was born statistical theory, hence the beastly bell curve. So was linear science. So was the notion of “efficiency” — or optimization.

Modernity is a Procrustean bed, good or bad — a reduction of humans to what appears to be efficient and useful. Some aspects of it work: Procrustean beds are not all negative reductions. Some may be beneficial, those these are rare.