The approach to cancer research presented here sounds really exciting.

Watch on TED.com

(832) 422-8646

This morning at church a woman said she was running late because of a software issue. Her alarm clock was manufactured before the US changed the end date of daylight saving time. Her clock “fell back” an hour because daylight saving time would have ended today had the law not changed.

Here are a few thoughts about what went wrong and how it might have been prevented.

- Laws have unforeseen consequences. When the change was being debated, I doubt many asked about the impact on alarm clocks and other devices with embedded software.
- The clock tried to be helpful by automating the time change. It would have been better had it done nothing. Moderately smart software is often worse than no software.
- Should the clock have been designed to check for software updates? What would it have done to the cost to turn a simple clock into a computer with a network connection?
- The clock could depend on a radio signal for time. Some do, and they’re very accurate. But they’re also more expensive.
- Should we get rid of daylight saving time? It made more sense when nearly everyone had a 9:00 to 5:00 work schedule. But now that so different people work shifts or have flexible schedules, it doesn’t seem to add as much value.

**Related post**: Universal time

**Design**

The cup holder principle

Simplicity made easy

**Typography**

**Computing**

(McCarthyism)

Accuracy rates of novices in three languages

**Math**

Three Fermat trails to elliptic curves

Highlights in the history of spectral theory

**Infographics**

Scientific evidence for dietary supplements

NPR’s top 100 science fiction and fantasy books

“Of course, education has always aimed to be useful. The question has been, and continues to be, useful to what end?” — Richard Gamble

People who write Python choose to write Python.

I don’t hear people say “I use Python at work because I have to, but I’d rather be writing Java.” But often I do hear people say they’d like to use Python if their job would allow it. There must be someone out there writing Python who would rather not, but I think that’s more common with other languages.

My point isn’t that everyone loves Python, but rather that those who don’t care for Python simply don’t write it.

Since Python isn’t a common choice for enterprise software projects, it can resist the pressure to be all things to all people. Having a “Benevolent Dictator for Life” also helps Python maintain conceptual integrity. Python is popular enough to have a critical mass of users, but not so popular that it is under pressure to lose its uniqueness.

I don’t know much about the Ruby world, but I wonder whether the increasing popularity of Ruby for web development has created pressure for Ruby to compromise its original philosophy. And I wonder whether Ruby’s creator Yukihiro Matsumoto has “dictatorial” control over his language analogous to the control Guido van Rossum has over Python.

**Related posts**:

Joseph Fourier is perhaps best known for his work studying heat conduction. He developed what we now call Fourier series as part of this work.

I recently learned that Fourier had a personal problem with heat.

Even though Fourier conducted foundational work on heat transfer, he was never good at regulating his own heat. He was always so cold, even in the summer, that he wore several large overcoats.

Source: The Physics Book

Here’s an elegant proof from Paul Erdős that there are infinitely many primes. It also does more, giving a lower bound on π(*N*), the number of primes less than *N*.

First, note that every integer *n* can be written as a product *n = r s ^{2}* where

Next we over-estimate how many *r s ^{2}* factorizations there are less than or equal to

How many square-free numbers are there less than or equal to *N*? Every such number corresponds to a product of distinct primes each less than or equal to *N*. There are π(*N*) such primes, and so there are at most 2^{π(N)} square-free numbers less than or equal to *N*. (The number of subsets of a set with *m* elements is 2* ^{m}*.)

How many squares are there less than *N*? No more than √*N* because if s > √*N *then *s ^{2}* >

So if we factor every number ≤ *N* into the form *r s ^{2}* there are at most 2

2^{π(N)} √*N* ≥ *N*

Now divide both sides of this inequality by √*N* and take logarithms. This shows that

π(*N*) ≥ log(*N*) / log 4

The right side is unbounded as *N* increases, so the left side must be unbounded too, i.e. there are infinitely many primes.

Euclid had a simpler proof that there are infinitely many primes. But unlike Euclid, Erdős gives a lower bound on π(*N*).

Source: Not Always Buried Deep

**Related posts**:

As I write this, word has it that John McCarthy passed away yesterday. Tech Crunch is reporting this as fact, citing Hacker News, which in turn cites a single tweet as the ultimate source. So the only authority we have, for now, is one person on Twitter, and we don’t know what relation she has to McCarthy.

[**Update**: More recent comments on Hacker News corroborate the story. Also, the twitterer cited above, Wendy Grossman, said McCarthy’s daughter called her.]

I also have an unsubstantiated story about John McCarthy. I believe I read the following some time ago, but I cannot remember where. If you know of a reference, please let me know. [**Update 2**: Thanks to Leandro Penz for leaving a link to this article by Paul Graham in the comments below.]

As I recall, McCarthy invented Lisp to be a purely theoretical language, something akin to lambda calculus. When his graduate student Steve Russell spoke of implementing Lisp, McCarthy objected that he didn’t intend Lisp to actually run on a physical computer. Russell then implemented a Lisp interpreter and showed it to McCarthy.

Steve Russell is an unsung hero who deserves some of the credit for Lisp being an actual programming language and not merely a theoretical construct. This does not diminish McCarthy’s achievement, but it does mean that someone else also deserves recognition.

**Related posts**:

The idea of software maintenance sounds absurd. Why do you have to maintain software? Do the bits try to sneak off the disk so that someone has to put them back?

Software doesn’t change, but the world changes out from under it.

- People discover bugs. This does not change the software but rather our knowledge of the software.
- As people use the software, they get new ideas regarding how they want to use it.
- The human environment around the software changes. Organizational priorities change. Laws change. Project sponsors and users turn over.
- The technological environment of the software changes. Operating systems, networks, and hardware all change.
- New possibilities emerge and make us less content with old possibilities.

**People often perceive these changes as changes to the software**, like someone standing on a dock, eyes fixed on a ship, who feels the dock is moving. We speak of software as if it were some mechanical think that physically wears out. Of course it isn’t, but the effect may be the same.

**Related post**:

Some photos from the table next to me at lunch today. Alas, poor Yorick did enjoy a good burrito.

**Related post**: Not exactly rocket science

Suppose you take factorials of a lot of numbers and look at the leading digit of each result. You could argue that there’s no apparent reason that any digit would be more common than any other, so you’d expect each of the digits 1 through 9 would come up 1/9 of the time. Sounds plausible, but it’s wrong.

The leading digits of factorials follow Benford’s law as described in the previous post. In fact, factorials follow Benford’s law even better than physical constants do. Here’s a graph of the leading digits of the factorials of 1 through 500.

In the remainder of this post, I’ll explain why Benford’s law should apply to factorials, make an aside on statistics, and point out an interesting feature of the Python code used to generate the chart above.

Here’s a hand-waving explanation. One way to justify Benford’s law is to say that physical constants are uniformly distributed, but on a logarithmic scale. The same is true for factorials, and it’s easier to see why.

The leading digits of the logarithms depend on on their logarithms in base 10. The gamma function extends the factorial function and it is log-convex. The logarithm of the gamma function is fairly flat (see plot here), and so the leading digits of the log-gamma function applied to integers are uniformly distributed on a logarithmic scale. (I’ve mixed logs base 10 and natural logs here, but that doesn’t matter. All logarithms are the same up to a multiplicative constant. So if a plot is nearly linear on a log10 scale, it’s nearly linear on a natural log scale.)

Update: Graham gives a link in the comments below to a paper proving that factorials satisfy Benford’s law exactly in the limit.

This example brings up an important principle in statistics. Some say that if you don’t have a reason to assume anything else, use a uniform distribution. For example, some say that a uniform prior is the ideal uninformative prior for Bayesian statistics. But you have to ask “Uniform on what scale?” It turns out that the leading digits of physical constants and factorials are indeed uniformly distributed, but on a logarithmic scale.

I used nearly the same code to produce the chart above as I used in its counterpart in the previous post. However, one thing had to change: I couldn’t compute the leading digits of the factorials the same way. Python has extended precision integers, so I can compute 500! factorial without overflowing. Using floating point numbers, I could only go up to 170!. But when I used my previous code to find the leading digit, it first tried to apply `log10`

to an integer larger than the largest representable floating point number and failed. Converting numbers such as 500! to floating point numbers will overflow. (See Floating point numbers are a leaky abstraction.)

The solution was to find the leading digit using only integer operations.

def leading_digit_int(n): while n > 9: n = n/10 return n

This code works fine for numbers like 500! or even larger.

Imagine you picked up a dictionary and found that the pages with A’s were dirty and the Z’s were clean. In between there was a gradual transition with the pages becoming cleaner as you progressed through the alphabet. You might conclude that people have been looking up a lot of words that begin with letters near the beginning of the alphabet and not many near the end.

That’s what Simon Newcomb did in 1881, only he was looking at tables of logarithms. He concluded that people were most interested in looking up the logarithms of numbers that began with 1 and progressively less interested in logarithms of numbers beginning with larger digits. This sounds absolutely bizarre, but he was right. The pattern he described has been repeatedly observed and is called Benford’s law. (Benford re-discovered the the same principle in 1938, and per Stigler’s law, Newcomb’s observation was named after Benford.)

Benford’s law predicts that for data sets such as collections of physical constants, about 30% of the numbers will begin with 1 down to about 5% starting with 8 or 9. To be precise, it says the leading digit will be *d* with probability log_{10}(1 + 1/d). For a good explanation of Benford’s law, see TAOCP volume 2.

A couple days ago I blogged about using SciPy’s collection of physical constants to look for values that were approximately factorials. Let’s look at that set of constants again and see whether the most significant digits of these constants follows Benford’s law.

Here’s a bar chart comparing the actual number of constants starting with each digit to the results we would expect from Benford’s law.

Here’s the code that was used to create the data for the chart.

from math import log10, floor from scipy.constants import codata def most_significant_digit(x): e = floor(log10(x)) return int(x*10**-e) # count how many constants have each leading digit count = [0]*10 d = codata.physical_constants for c in d: (value, unit, uncertainty) = d[c] x = abs(value) count[ most_significant_digit(x) ] += 1 total = sum(count) # expected number of each leading digit per Benford's law benford = [total*log10(1 + 1./i) for i in range(1, 10)]

The chart itself was produced using `matplotlib`

, starting with this sample code.

The actual counts we see in `scipy.constants`

line up fairly well with the predictions from Benford’s law. The results are much closer to Benford’s prediction than to the uniform distribution that you might have expected before hearing of Benford’s law.

**Update**: See the next post for an explanation of why factorials also follow Benford’s law.

**Related posts**:

In my experience, software knowledge has a longer useful shelf life in the Unix world than in the Microsoft world. (In this post Unix is a shorthand for Unix and Linux.)

A pro-Microsoft explanation would say that Microsoft is more progressive, always improving their APIs and tools, and that Unix is stagnant.

A pro-Unix explanation would say that Unix got a lot of things right the first time, that it is more stable, and that Microsoft’s technology turn-over is more churn than progress.

Pick your explanation. But for better or worse, change comes slower on the Unix side. And when it comes, it’s less disruptive.

At least that’s how it seems to me. Although I’ve used Windows and Unix, I’ve done different kinds of work on the two platforms. Maybe the pace of change relates more to the task than the operating system. Also, I have more experience with Windows and so perhaps I’m more aware of the changes there. But nearly everything I knew about Unix 20 years ago is still useful, and much of what I knew about Windows 10 years ago is not.

**Related posts**:

How do you refer to the C# programming language in LaTeX? Simply typing `C#`

doesn’t work because `#`

is a special character in LaTeX. You could type `C#`

. That works, but it looks a little odd. The number sign is too big and too low.

What about using a musical sharp sign, i.e. `C$\sharp$`

? That also looks a little odd. Even though the language is pronounced “C sharp,” it’s usually written with a number sign, not a sharp.

Let’s look at recommended ways of typesetting C++ to see whether that helps. The top answer to this question on TeX Stack Exchange is to define a new command as follows:

\newcommand{\CC}{C\nolinebreak\hspace{-.05em}\raisebox{.4ex}{\tiny\bf +}\nolinebreak\hspace{-.10em}\raisebox{.4ex}{\tiny\bf +}}

This does several things. First, it prevents line breaks between the constituent characters. It also does several things to the plus signs:

- Draws them in closer
- Makes them smaller
- Raises them
- Makes them bold

The result is what we’re subconsciously accustomed to seeing in print.

Here’s an analogous command for C#.

\newcommand{\CS}{C\nolinebreak\hspace{-.05em}\raisebox{.6ex}{\tiny\bf #}}

And here’s the output. The number sign is a little too small.

To make a little larger number sign, replace `\tiny`

with `\scriptsize`

.

**Related posts**:

The previous post mentioned that Avogadro’s constant is approximately 24!. Are there other physical constants that are nearly factorials?

I searched SciPy’s collection of physical constants looking for values that are either nearly factorials or nearly reciprocals of factorials.

The best example is the “classical electron radius” *r*_{e} which is 2.818 × 10^{-15} m and 1/17! = 2.811 × 10^{-15}.

Also, the “Hartree-Hertz relationship” E_{h}/*h* equals 6.58 × 10^{15} and 18! = 6.4 × 10^{15}. (E_{h} is the Hartree energy and *h* is Plank’s constant.)

Here’s the Python code I used to discover these relationships.

from scipy.special import gammaln from math import log, factorial from scipy.optimize import brenth from scipy.constants import codata def inverse_factorial(x): # Find r such that gammaln(r) = log(x) # So gamma(r) = x and (r-1)! = x r = brenth(lambda t: gammaln(t) - log(x), 1.0, 100.0) return r-1 d = codata.physical_constants for c in d: (value, unit, uncertainty) = d[c] x = value if x < 0: x = abs(x) if x < 1.0: x = 1.0/x r = inverse_factorial(x) n = round(r) # Use n > 6 to weed out uninteresting values. if abs(r - n) < 0.01 and n > 6: fact = factorial(n) if value < 1.0: fact = 1.0/fact print c, n, value, fact