Contrasting Tolkien and Lewis

Ralph Wood gave a lecture for Big Ideas contrasting J. R. R. Tolkien and C. S. Lewis. Wood begins his lecture by explaining that though the two men have much in common, this commonality has been emphasized to the point of concealing some of their differences.

One way Tolkien and Lewis differed was in their writing processes. Tolkien would revise, revise, and revise. Friends would take papers from him and publish them to break the editing cycles. Lewis, on the other hand, would send first drafts to the publisher and later make only minor corrections to the proofs. Wood also discusses much deeper differences between the two authors.

(Big Ideas makes it difficult to link to show notes for individual podcast episodes. Here’s a link to the audio of Wood’s lecture.)

Related posts

How to compute log factorial

Factorials grow very quickly, so quickly the results can easily exceed the limits of computer arithmetic. For example, 30! is too big to store in a typical integer and 200! is too big to store (even approximately) in a standard floating point number. It’s usually more convenient to work with the logarithms of factorials rather than factorials themselves.

So how might you compute log( n! )? You don’t want to compute n! first and then take logs because you’ll overflow for moderately large arguments. You want to compute the log factorial directly.

Since n! = 1 × 2 × 3 × … × n,

log(n!) = log(1) + log(2) + log(3) + … + log(n).

That gives a way to compute log(n!), but it’s slow for large arguments: the run time is proportional to the size of n.

If you only need to compute log(n!) for n within a moderate range, you could just tabulate the values. Calculate log(n!) for n = 1, 2, 3, …, N by any means, no matter how slow, and save the results in an array. Then at runtime, just look up the result.

But suppose you want to be able to compute log(n!) for any value of n such that log(n!) won’t overflow. That requires a very large value of n. Since log(n!) is on the order of n log (n) for large n, log(n!) won’t overflow even for some very large values of n. You’ll run out of memory to store your table before you’ll run out of values of n such that log(n!) doesn’t overflow.

So now the problem becomes how to evaluate log(n!) for large values of n. Say we tabulate log(n!) for n up to some size and then use a formula to calculate log(n!) for larger arguments. I’m going to switch notation now and work with the gamma function Γ(x) because most references state their results in terms of the gamma function rather than in terms of factorial. It’s easy to move between the two since Γ(n+1) = n!.

Stirling’s approximation says that

log Γ(x) ≈ (x – 1/2) log(x) – x  + (1/2) log(2 π)

with an error on the order of 1/12x. So if n were around 1000, the approximation would be good to about four decimal places. What if we wanted more accuracy but not a bigger table? Stirling’s approximation above is part of an infinite series of approximations, an asymptotic series for log Γ(x):

log Γ(x) ≈ (x – 1/2) log(x) – x + (1/2) log(2 π) + 1/(12 x) – 1/(360 x3) + 1/(1260 x5) – …

This raises a couple questions.

  1. What is the form of a general term in the series?
  2. How accurate is the series when we truncate it at some point?

The answer to the first question is the term with x2m-1 in the denominator has coefficient B2m / (2m(2m-1)) where the B‘s are Bernoulli numbers. Perhaps that’s not very satisfying, but that’s what it is.

Now to the question of accuracy. If you’ve never used asymptotic series before, you might be tempted to use one like you’d use a Taylor series: the more terms the better. But asymptotic series don’t work that way. Typically the error improves at first as you take more terms, reaches a minimum, then increases as you take more terms. Another difference is that while Taylor series approximations improve as arguments get smaller, asymptotic approximations improve as arguments get larger. That’s convenient for us since we’re looking for an approximation for n so large that it’s beyond our table of saved values.

For this particular series, the absolute value of the error in truncating the series is less than the absolute value of the first term that was left out.  Suppose we make a look-up table for the values 1 through 256. If we truncate the series after 1/(12 x), the error will be less than 1/(360 x3). If x > 256, log(x!) > 1419 and the error in the asymptotic approximation is less than 1/(360×224) = 1.65 × 10-10. Since the number we’re computing has at least four digits and the result is good to 10 decimal places, we should have at least 14 significant figures, near the limits of floating point precision. (For more details, see Anatomy of a floating point number.)

In summary, one way to compute log factorial is to pre-compute log(n!) for n = 1, 2, 3, … 256 and store the results in an array. For values of n ≤ 256, look up the result from the table. For n > 256, return

(x – 1/2) log(x) – x + (1/2) log(2 π) + 1/(12 x)

with x = n + 1. This has been coded up here.

You could include the 1/(360 x3) term or higher terms from the asymptotic series and use a smaller table. This would use less memory but would require more computation for arguments outside the range of the table.

Related posts

John Cleese on creativity

Here’s a 10-minute talk by John Cleese on creativity:

From about 6:20 into the video:

If you’re racing around all day, ticking things off on lists, looking at your watch, making phone calls, and generally just keeping all the balls in the air, you are not going to have any creative ideas.

Related posts

What distribution does my data have?

“Which distribution describes my data?” Variations on that question pop up regularly on various online forums. Sometimes the person asking the question is looking for a goodness of fit test but doesn’t know the jargon “goodness of fit.” But more often they have something else in mind. They’re thinking of some list of familiar, named distribution families — normal, gamma, Poisson, etc. — and want to know which distribution from this list best fits their data. So the real question is something like the following:

Which distribution from the well-known families of probability distributions fits my data best?

Statistics classes can give the impression that there is a short list of probability distribution families, say the list in the index of text book for that class, and that something from one of those families will always fit any data set. This impression starts to seem absurd when stated explicitly. It raises two questions.

  1. What exactly is the list of well-known distributions?
  2. Why should a distribution from this list fit your data?

As for the first question, there is some consensus as to what the well-known distributions are. The distribution families in this diagram would make a good start. But the question of which distributions are “well known” is a sociological question, not a mathematical one. There’s nothing intrinsic to a distribution that makes it well-known. For example, most statisticians would consider the Kumaraswamy distribution obscure and the beta distribution well-known, even though the two are analytically similar.

You could argue that the canonical set of distributions is somewhat natural by a chain of relations. The normal distribution is certainly natural due to the central limit theorem. The chi-squared distribution is natural because the square of a normal random variable has a chi-squared distribution. The F distribution is related to the ratio of chi-squared variables, so perhaps it ought to be included. And so on and so forth. But each link in the chain is a little weaker than the previous. Also, why this chain of relationships and not some other?

Alternatively, you could argue that the distributions that made the canon are there because they have been found useful in practice. And so they have.  But had people been interested in different problems, a somewhat different set of distributions would have been found useful.

Now on to the second question: Why should a famous distribution fit a particular data set?

Suppose a police artist asked a witness which U. S. president a criminal most closely resembled. The witness might respond

Well, she didn’t look much like any of them, but if I have to pick one, I’d pick John Adams.

The U. S. presidents form a convenient set of faces. You can find posters of their faces in many classrooms. The U. S. presidents are historically significant, but a police artist would do better to pick a different set of faces as a first pass in making a sketch.

I’m not saying it is unreasonable to want to fit a famous distribution to your data. Given two distributions that fit the data equally well, go with the more famous distribution. This is a sort of celebrity version of Occam’s razor. It’s convenient to use distributions that other people recognize. Famous distributions often have nice mathematical properties and widely available software implementations. But the list of famous distributions can form a Procrustean bed that we force our data to fit.

The extreme of Procrustean statistics is a list of well-known distributions with only one item: the normal distribution. Researchers often apply a normal distribution where it doesn’t fit at all. More dangerously, experienced statisticians can assume a normal distribution when the lack of fit isn’t obvious. If you implicitly assume a normal distribution, then any data point that doesn’t fit the distribution is an outlier. Throw out the outliers and the normal distribution fits well! Nassim Taleb calls the normal distribution the “Great Intellectual Fraud” in his book The Black Swan because people so often assume the distribution fits when it does not.

Timing software

When measuring how long it takes to execute a program, people often report the best time out of several runs with the same input. That seems odd at first. Why report the best time? Why not report the average time?

Operating systems have more to do than just run your program. The time that elapses between the start and finish of your program may vary because of other demands on the operating system’s resources. By taking the best time, you (presumably) get an idea how much time it takes to run the program itself with minimal overhead from other operating system activity.

The best-time approach is appropriate when comparing the efficiency of two programs: compare the best time for Program A to the best time of Program B. But if you want to estimate how long you can expect wait for Program A itself to run, not comparing the efficiency of Program A to anything else, then averaging the run times is reasonable.

(Everything above assumes constant input. This post is not about analyzing execution times with varying inputs. For example, you might use best-case timing to compare two sorting algorithms by sorting the exact same list several times. The only thing changing from run to run is the state of the operating system, not the input. Of course you might also want to measure the average run time with multiple randomly generated data sets, but that’s a different matter.)

Emacs command to add HTML tags

A while back I asked Jason Fruit how to add HTML tags to text in Emacs, something like the format painter in Microsoft applications. He said he didn’t know of anything and wrote the following macro for me.

(defun tag-word-or-region (tag)
    "Surround current word or region with a given tag."
    (interactive "sEnter tag (without <>): ")
    (let (pos1 pos2 bds start-tag end-tag)
    (setq start-tag (concat "<" tag ">"))
    (setq end-tag (concat "</" tag ">"))
    (if (and transient-mark-mode mark-active)
        (progn
            (goto-char (region-end))
            (insert end-tag)
            (goto-char (region-beginning))
            (insert start-tag))
        (progn
            (setq bds (bounds-of-thing-at-point 'symbol))
            (goto-char (cdr bds))
            (insert end-tag)
            (goto-char (car bds))
            (insert start-tag)))))

I added the following line to my .emacs file to bind Jason’s macro to the key sequence C-x t.

    (global-set-key "C-xt" 'tag-word-or-region)

To add a tag to a single word, place the cursor before or in the word and execute the command. To tag a block of text, select the text first then execute the command.

More Emacs posts

The Stone Age didn’t end because we ran out of stones

According to Richard Sears, the world hit peak oil in 1985 in the sense that oil accounted for 50% of world energy in 1985 and the percentage has been declining since then. By that same measure, we hit peak coal in the 1920’s and peak wood in the 1820’s. Sears summarizes

For 200 years we have been systematically de-carbonizing our energy system.

We didn’t stop using wood as our primary energy source because we ran out of trees; we moved on to something better. Sears believes we are now in the process of transitioning from oil to renewable energy sources, and not because we’re running out of oil. He concludes his presentation (link rotted) by saying

… the Stone Age ended, not because we ran out of stones. It’s ideas, it’s innovation, it’s technology that will end the Age of Oil before we run out of oil.

A tale of two espresso machines

This post tells the story of two espresso machines: one in Los Angeles and one in Brenham, Texas. But it’s more about deciding what you do and do not want to control.

* * *

In his book Made by Hand, Mark Frauenfelder chronicles his quest to make great espresso at his home in Los Angeles. He reviews some of the tricks to make good espresso from a relatively inexpensive espresso machine. (The machine he describes, a Rancilio Silvia, was around $500 when Frauenfelder bought it. That’s a lot more than a Mr. Coffee, but it’s cheap compared to espresso machines that cost over $10,000.) The problem with inexpensive espresso machines is that they have poor temperature. The water temperature can vary as much as 40 °F. Frauenfelder hacked his espresso machine by replacing its controller with a more sophisticated proportional-integral-derivative controller or PID.

(Small changes in brewing temperature can have a large impact on coffee taste. This is because coffee is chemically complex. Brewing at different temperatures extracts these chemicals in different proportions. See this Scientific American article for details. [Update: looks like SA took the article down.])

The context of Frauenfelder’s story is his book on doing things yourself, not to save money, but to have more control of your environment. He describes his adventures from growing vegetables to making musical instruments for the joy of doing so.

* * *

Roger Sessions tells a very different story about why he does not have his own espresso machine in The ObjectWatch Newsletter. He describes why he drives 10 miles every day to the Starbucks in Brenham. He says he saves money, even though he spends more on gasoline than the price of his doppio macchiato. For starters, Sessions estimates it would take nearly four years to pay off his hardware investment. Then he lists the things that could go wrong:

  • Something might break down.
  • Something could short out his electrical system.
  • The roaster could burn his house down.
  • He might not be able to use the equipment well.

The context of Sessions’ story is the benefits of software as a service. Even when it appears to be more economical to create and host your own software, you may save money by paying someone else to offer that software as a service. Sessions pays Starbucks to make his espresso for him because they not only make the capital investment in hardware, they’re also responsible for operation and maintenance. (Update: See Roger Sessions’ comment below.)

* * *

Sessions makes as strong a case for not owning an espresso machine as Frauenfelder makes for owning one. Frauenfelder speaks of the confidence and joy that comes from having detailed knowledge and control of the things around you. Sessions speaks of the hassle and expense of being responsible for the things around you. Both have valid points. I’m sure there are areas in which Frauenfelder is happy for someone else to take responsibility, and areas in which Sessions enjoys fine-grained control. But they disagree which approach is preferable when it comes to making espresso.

One of the things that prompted me to buy Mark Frauenfelder’s book was an interview I heard on a podcast. He said that he was attracted to do-it-yourself projects after becoming editor of Make Magazine. The spirit of the writers rubbed off. At one point in the interview he says DIY is not about saving money; in fact, the DIY approach will often cost more money. I appreciated this comment since many DIY enthusiasts justify their projects with dubious financial arguments rather than simply say that they enjoy what they’re doing and that the extra expense is worth it to them. I suspect Frauenfelder may agree with Sessions on the economics of espresso making, though they have different perspectives of the non-monetary benefits.  And although Sessions only argues monetary benefits, he also has non-monetary benefits to visiting Starbucks. I imagine he enjoys getting out of his house, seeing familiar faces, etc.

* * *

Posts quoting Mark Frauenfelder:

Endless preparation
Getting women to smoke

Post quoting Roger Sessions:

Three quotes on software development