Random walk on a clock

Stand on a large clock, say on the 1. Now flip a coin and move ahead one hour if the coin turns up heads, and back one hour otherwise. Keep repeating the process until you’ve stood on all 12 numbers. How long, on average, will this random walk take? If you generalize to clocks with p positions, how does the expected time vary with p?

Here’s a little Python code to experiment with the problem.

from random import random

p = 12

def time_to_cover_circle():
    circle = [0 for i in range(p)]

    count, pos = 0, 0
    while True:
        count += 1
        pos += 1 if random() > 0.5 else -1
        pos %= p
        circle[pos] = 1
        if min(circle) == 1:
            return count

Examples bring a subject to life

Steve Awodey said of category theory

Material at this level of abstraction is simply incomprehensible without the applications and examples that bring it to life.

Michael Atiyah would say this is not unique to categories.

When someone tells me a general theorem I say that I want an example that is both simple and significant. It’s very easy to give simple examples that are not very interesting or interesting examples that are very difficult. If there isn’t a simple, interesting case, forget it. … Without knowing the examples, the whole thing is pointless. It is a mistake to focus on the techniques without constantly, not only at the beginning, knowing what the good examples are, where they came from, why they’re done. The abstractions and the examples have to go hand in hand.

More from Michael Aityah here.

Equation in stained glass

In the southwest corner of the Church of the Holy Spirit in Heidelberg there is a stained glass window containing Einstein’s equation E = mc2 near the bottom.

Most of the windows in the church are much lighter, both in color and in content. This window is not prominent; I only noticed it the second time I was in the church.

I think the text at the top of the window quotes 2 Peter 3:10 and Isaiah 54:10. The date is the day the atomic bomb was dropped on Hiroshima.

Oktoberfest

Last night was “Oktoberfest” night at the HLF. Part of the entertainment for the evening was traditional Bavarian dancing. Marlene Knoche, one of the German bloggers for the HLF, drew a couple of the dancers in the sketch below.

Bavarian dancers

I had a chance to try a local beer, Heidelberg Hefe Weizen. I thought “hefeweizen” was one word, but the label breaks it into two words. This seems ironic given the German proclivity for creating compound words.

When Fred Brooks wrote The Design of Design three years ago I interviewed him via email. Last night I had a chance to talk with him in person. He is warm and friendly, and also very sharp.

In Heidelberg

I’m in Heidelberg this week at the Heidelberg Laureate Forum and having a great time. I’m writing for the conference blog and may not blog much here. We’ll see. Maybe later in the week I’ll have more time.

The network connection in my hotel is down most of the time, so I’m writing this quickly while I have a window of connectivity.

I’ve posted a few photos on Google+.

Posts for the conference so far:

Who was Nevanlinna and what is his prize?
Saxophone quartets and probability

I interviewed Sir Michael Atiyah this morning and so that will be posted sometime.

 

Least understood bit of basic math

Logarithms may be the least understood topic in basic math. In my experience, if an otherwise math-savvy person is missing something elementary, it’s usually logarithms.

For example, I have had conversations with people with advanced technical degrees where I’ve had to explain that logs in all bases are proportional to each other. For example, if one thing is proportional to the natural log of another, the former is also proportional to the log base 10 or log base anything else of the latter [1].

I’ve also noticed that quite often there’s a question on the front page of math.stackexhange of the form “How do I solve …” and the solution is invariably “take logarithms of both sides.” This seems to be a secret technique.

I suspect that more people understood logarithms when they had to use slide rules. A slide rule is essentially two sticks with log-scale markings. By moving one relative to the other, you’re adding lengths, which means adding logs, which does multiplication. If you do that for a while, it seems you’d have to get a feel for logs.

Log tables also make logs more tangible. At first it seems there’s no skill required to use a  table, but you often have to exercise a little bit of understanding. Because of the limitations of space, tables can’t be big enough to let you directly look up everything. You have to learn how to handle orders of magnitude and how to interpolate.

If the first time you see logs is when it’s time to learn to differentiate them, you have to learn two things at once. And that’s too much for many students. They make mistakes, such as assuming logs are linear functions, that they would not make if they had an intuitive feel for what they’re working with.

Maybe schools could have a retro math week each year where students can’t use calculators and have to use log tables and slide rules. I don’t think it would do as much good to just make tables or slide rules a topic in the curriculum. It’s when you have to use these things to accomplish something else, when they are not simply an isolated forgettable topic of their own, that the ideas sink in.

Related posts:

The most interesting logs in the world
Approximation relating logs base 2, e, and 10

[1] That is, loga(x) = loga(b) logb(x). This says loga(b) is the proportionality constant for converting between logs in base a and b. To prove the equation, raise a to the power of both sides.

To memorize this equation, notice the up-and-down pattern of the bases and arguments: a up to x = a up to b down to b up to x. The right side squeezes an up and down b in between a and x.

 

Information theory misnamed

Interesting perspective on information theory:

To me, the subject of “information theory” is badly named. That discipline is devoted to finding ideal compression schemes for messages to be sent quickly and accurately across a noisy channel. It deliberately does not pay any attention to what the messages mean. To my mind this should be called compression theory or redundancy theory. Information is inherently meaningful—that is its purpose—any theory that is unconcerned with the meaning is not really studying information per se. The people who decide on speed limits for roads and highways may care about human health, but a study limited to deciding ideal speed limits should not be called “human health theory”.

Despite what was said above, Information theory has been extremely important in a diverse array of fields, including computer science but also in neuroscience and physics. I’m not trying to denigrate the field; I am only frustrated with its name.

From David Spivak, footnotes 13 and 14 here.

To err is human, to catch an error shows expertise

Experts are OK at avoiding mistakes, but they’re even better at recognizing and fixing mistakes.

If you ask an elementary student something like “How can we know that the answer to this problem cannot be 8769?” they might only be able to say “Because the correct answer is 8760.” That is, the only tool they have for checking an result is to compare it to a known quantity. A more sophisticated student might be able to say something like “We know the result must be even” or “The result cannot be more than 8764″ for  reasons that come from context.

Experienced programmers write fewer bugs than rookies. But more importantly, the bugs they do write don’t live as long. The experienced programmer may realize almost immediately that a piece of code cannot be right, while the rookie may not realize there’s a problem until the output is obviously wrong. The more experienced programmer might notice that the vertical alignment of the code looks wrong, or that incompatible pieces are being used together, or even that the code “smells wrong” for reasons he or she can’t articulate.

An engineer might know that an answer is wrong because it has the wrong magnitude or the wrong dimensions. Or maybe a result violates a conservation law. Or maybe the engineer thinks “That’s not the way things are done, and I bet there’s a good reason why.”

“Be more careful” only goes so far. Experts are not that much more careful than novices. We can only lower our rate of mistakes so much. There’s much more potential for being able to recognize mistakes than to prevent them.

A major step in maturing as a programmer is accepting the fact that you’re going to make mistakes fairly often. Maybe you’ll introduce a bug for every 10 lines of code, at least one for every 100 lines. (Rookies write more bugs than this but think they write fewer.) Once you accept this, you begin to ask how you can write code to make bugs stand out. Given two approaches, which is more likely to fail visibly if is it’s wrong? How can I write code so that logic errors are more likely to show up as compile errors? How am I doing to debug this when it breaks?

Theory pays off in the long run. Abstractions that students dismiss as impractical probably are impractical at first. But in the long run, these abstractions may prevent or catch errors. I’ve come to see the practicality in many things that I used to dismiss as pedantic: dimensional analysis, tensor properties, strongly typed programming languages, category theory, etc. A few weeks into a chemistry class I learned the value of dimensional analysis. It has taken me much longer to appreciate category theory.

Related posts:

Experienced programmers and lines of code
The Norris number
Important because it’s unimportant

The difference between machines and tools

From “The Inheritance of Tools” by Scott Russell Sanders:

I had botched a great many pieces of wood before I mastered the right angle with a saw, botched even more before I learned to miter a joint. The knowledge of these things resides in my hands and eyes and the webwork of muscles, not in the tools. There are machines for sale—powered miter boxes and radial arm saws, for instance—that will enable any casual soul to cut proper angles in boards. The skill is invested in the gadget instead of the person who uses it, and this is what distinguishes a machine from a tool.

Related post: Software exoskeletons

Ramanujan’s nested radical

Ramanujan posed the following puzzle to Journal of Indian Mathematical Society. What is the value of the following?

\sqrt{1+2\sqrt{1+3 \sqrt{1+4\sqrt{1 + \cdots}}}}

He waited over six months, and when nobody replied he gave his solution. The result is simply 3.

Now suppose you didn’t know an algebraic proof and wanted to evaluate the expression numerically. The code below will evaluate Ramanujan’s expression down to n levels.

def f(n):
t = 1.0
for k in range(n, 1, -1):
    t = sqrt(k*t + 1)
return t

(The range function above produces the sequence n, n-1, … 2.)

The sequence of f(n) values converges rapidly to 3 as the plot below shows.

The code above evaluates the expression from the inside out, and so you need to know where you’re going to stop before you start. I don’t see how to write a different algorithm that would let you compute f(n+1) taking advantage of having computed f(n). Do you? Can you think of a way to evaluate f(1), f(2), f(3), … f(N) in less than O(N2) time? Will your algorithm if some or all of the numbers in Ramanujan’s expression change?

More Ramanujan posts:

Ramanujan approximation for circumference of an ellipse
Ramanujan’s most beautiful identity
Open question turned into exercise

Visualizing category concept dependencies

Category theory has a high ratio of definitions to theorems, and it can seem like every definition depends on an infinite regress of definitions. But if you graph the definition dependencies, the graph is wide rather than deep.

I picked out 65 concepts and created a visualization using GraphViz. No definition is more than six degrees away from first principles, and most are closer. The graph is pretty big, so here’s a thumbnail.

The full graph is available here.

It’s possible to define concepts in different ways and obtain a different graph. I used definitions from here.

Here is a list of other math diagrams I’ve created.

Heidelberg Laureate Forum

I will be going to Germany in a couple weeks to be a blogger for the Heidelberg Laureate Forum. This conference will bring young researchers in math and computer science together with some of the top people in their fields, winners of the Abel Prize, Turing Award, Fields Medal, and Nevanlinna Prize. I’m looking forward to visiting Heidelberg and being part of an incredible conference.

Heidelberg castle photo via Wikipedia

How hospitals make decisions

From operations research professor Michael Carter on The Science of Better:

If I went into a company, a private company, and said  “I can save you a million dollars day,” the CEO would implement it tomorrow. If I went into a hospital and said “I can save you a million dollars a day,” they’d they have to think about it, because the CEO would have to convince people this was a good idea. It would have to be good for the doctors, good for the nurses, good for the bean counters, good for the patients, good for patient quality, good for media. … All problems become multi-criteria problems. The good new is that things are so bad in health care we can always do that.

That matches my experience from when I used to work for a hospital.

Carter’s last line may not be clear in the excerpt above. In context, he’s saying that health care is so inefficient that an operations research consultant can always find ways to improve things, even though decisions have to satisfy numerous constituencies.

Nicholas Carr said something similar in his book Does IT Matter? Carr argues for most of the book that although information technology is important, it has become a given, like electricity. Then near the end of his book, he says that because health care IT is so far behind, his remarks don’t apply there.