Random walks and the arcsine law

Suppose you stand at 0 and flip a fair coin. If the coin comes up heads, you take a step to the right. Otherwise you take a step to the left. How much of the time will you spend to the right of where you started?

As the number of steps N goes to infinity, the probability that the proportion of your time in positive territory is less than x approaches 2 arcsin(√x)/π. The arcsine term gives this rule its name, the arcsine law.

Here’s a little Python script to illustrate the arcsine law.

import random
from numpy import arcsin, pi, sqrt

def step():
    u = random.random()
    return 1 if u < 0.5 else -1

M = 1000 # outer loop    
N = 1000 # inner loop

x = 0.3 # Use any 0 < x < 1 you'd like.

outer_count = 0
for _ in range(M):
    n = 0
    position= 0 
    inner_count = 0
    for __ in range(N):
        position += step()
        if position > 0:
            inner_count += 1
    if inner_count/N < x:
        outer_count += 1

print (outer_count/M)
print (2*arcsin(sqrt(x))/pi)

Playing with continued fractions and Khinchin’s constant

Take a real number x and expand it as a continued fraction. Compute the geometric mean of the first n coefficients.

Aleksandr Khinchin proved that for almost all real numbers x, as n → ∞ the geometric means converge. Not only that, they converge to the same constant, known as Khinchin’s constant, 2.685452001…. (“Almost all” here mean in the sense of measure theory: the set of real numbers that are exceptions to Khinchin’s theorem have measure zero.)

To get an idea how fast this convergence is, let’s start by looking at the continued fraction expansion of π. In Sage, we can type

continued_fraction(RealField(100)(pi))

to get the continued fraction coefficient

[3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3]

for π to 100 decimal places. The geometric mean of these coefficients is 2.84777288486, which only matches Khinchin’s constant to 1 significant figure.

Let’s try choosing random numbers and working with more decimal places.

There may be a more direct way to find geometric means in Sage, but here’s a function I wrote. It leaves off any leading zeros that would cause the geometric mean to be zero.

from numpy import exp, mean, log
def geometric_mean(x):
    return exp( mean([log(k) for k in x if k > 0]) )

Now let’s find 10 random numbers to 1,000 decimal places.

for _ in range(10):
    r = RealField(1000).random_element(0,1)
    print(geometric_mean(continued_fraction(r)))

This produced

2.66169890535
2.62280675227
2.61146463641
2.58515620064
2.58396664032
2.78152297661
2.55950338205
2.86878898900
2.70852612496
2.52689450535

Three of these agree with Khinchin’s constant to two significant figures but the rest agree only to one. Apparently the convergence is very slow.

If we go back to π, this time looking out 10,000 decimal places, we get a little closer:

print(geometric_mean(continued_fraction(RealField(10000)(pi))))

produces 2.67104567579, which differs from Khinchin’s constant by about 0.5%.

Grand unification of mathematics

Greg Egan’s short story Glory features a “xenomathematician” who discovers that an ancient civilization had produced a sort of grand unification of their various branches of mathematics.

It was not a matter of everything in mathematics collapsing in on itself, with one branch turning out to have been merely a recapitulation of another under a different guise. Rather, the principle was that every sufficiently beautiful mathematical system was rich enough to mirror in part — and sometimes in a complex and distorted fashion — every other sufficiently beautiful system. Nothing became sterile and redundant, nothing proved to have been a waste of time, but everything was shown to be magnificently intertwined.

Natural optima occur in the middle

Akin’s eighth law of spacecraft design says

In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point.

When I first read this I immediately thought of several examples where theory said that an optima was at an extreme, but experience said otherwise.

Linear programming (LP) says the opposite of Akin’s law. The optimal point for a linear objective function subject to linear constraints is always at an extreme point. The constraints form a many-sided shape—you could think of it something like a diamond—and the optimal point will always be at one of the corners.

Nature is not linear, though it is often approximately linear over some useful range. One way to read Akin’s law is to say that even when something is approximately linear in the middle, there’s enough non-linearity at the extremes to pull the optimal points back from the edges. Or said another way, when you have an optimal value at an extreme point, there may be some unrealistic aspect of your model that pushed the optimal point out to the boundary.

Related postData calls the model’s bluff

Another reason natural logarithms are natural

In mathematics, log means natural logarithm by default; the burden of explanation is on anyone taking logarithms to a different base. I elaborate on this a little here.

Looking through Andrew Gelman and Jennifer Hill’s regression book, I noticed a justification for natural logarithms I hadn’t thought about before.

We prefer natural logs (that is, logarithms base e) because, as described above, coefficients on the natural-log scale are directly interpretable as approximate proportional differences: with a coefficient of 0.06, a difference of 1 in x corresponds to an approximate 6% difference in y, and so forth.

This is because

exp(x) ≈ 1 + x

for small values of x based on a Taylor series expansion. So in Gelman and Hill’s example, a difference of 0.06 on a natural log scale corresponds to roughly multiplying by 1.06 on the original scale, i.e. a 6% increase.

The Taylor series expansion for exponents of 10 is not so tidy:

10x ≈ 1 + 2.302585 x

where 2.302585 is the numerical value of the natural log of 10. This means that a change of 0.01 on a log10 scale corresponds to an increase of about 2.3% on the original scale.

Related post: Approximation relating lg, ln, and log10

Miscellaneous math resources

Every Wednesday I’ve been pointing out various resources on my web site. So far they’ve all been web pages, but the following are all PDF files.

Probability and statistics:

Other math:

See also journal articles and technical reports.

Last week: Probability approximations

Next week: Code Project articles

Disappearing data projections

Suppose you have data in an N-dimensional space where N is large and consider the cube [-1, 1]N. The coordinate basis vectors start in the center of the cube and poke out through the middle of the faces. The diagonals of the cube run from the center to one of the corners.

If your points cluster along one of the coordinate axes, then projecting them to that axis will show the full width of the data. But if your points cluster along one of the diagonal directions, the projection along every coordinate axis will be a tiny smudge near the origin. There are a lot more diagonal directions than coordinate directions, 2N versus N, and so there are a lot of orientations of your points that could be missed by every coordinate projection.

Here’s the math behind the loose statements above. The diagonal directions of the form (±1, ±1, …, ±1). A unit vector in one of these directions will have the form (1/√N)(±1, ±1, …, ±1) and so its inner product with any of the coordinate basis vectors is 1/√N, which goes to zero as N gets large. Said another way, taking a set of points along a diagonal and projecting it to a coordinate axis divides its width by √N.

Confidence

Zig Ziglar said that if you increase your confidence, you increase your competence. I think that’s generally true. Of course you could be an idiot and become a more confident idiot. In that case confidence just makes things worse [1]. But otherwise when you have more confidence, you explore more options, and in effect become more competent.

There are some things you may need to learn not for the content itself but for the confidence boost. Maybe you need to learn them so you can confidently say you didn’t need to. Also, some things you need to learn before you can see uses for them. (More on that theme here.)

I’ve learned several things backward in the sense of learning the advanced material before the elementary. For example, I studied PDEs in graduate school before having mastered the typical undergraduate differential equation curriculum. That nagged at me. I kept thinking I might find some use for the undergrad tricks. When I had a chance to teach the undergrad course a couple times, I increased my confidence. I also convinced myself that I didn’t need that material after all.

My experience with statistics was similar. I was writing research articles in statistics before I learned some of the introductory material. Once again the opportunity to teach the introductory material increased my confidence. The material wasn’t particularly useful, but the experience of having taught it was.

Related post: Psychological encapsulation


[1] See Yeats’ poem The Second Coming:

The best lack all conviction, while the worst
Are full of passionate intensity.

 

Probability approximations

This week’s resource post lists notes on probability approximations.

Do we even need probability approximations anymore? They’re not as necessary for numerical computation as they once were, but they remain vital for understanding the behavior of probability distributions and for theoretical calculations.

Textbooks often leave out details such as quantifying the error when discussion approximations. The following pages are notes I wrote to fill in some of these details when I was teaching.

See also blog posts tagged Probability and statistics and the Twitter account ProbFact.

Last week: Numerical computing resources

Next week: Miscellaneous math notes

Striving for simplicity, arriving at complexity

This post is a riff on a line from Mathematics without Apologies, the book I quoted yesterday.

In an all too familiar trade-off, the result of striving for ultimate simplicity is intolerable complexity; to eliminate too-long proofs we find ourselves “hopelessly lost” among the too-long definitions. [emphasis added]

It’s as if there’s some sort of conservation of complexity, but not quite in the sense of a physical conservation law. Conservation of momentum, for example, means that if one part of a system loses 5 units of momentum, other parts of the system have to absorb exactly 5 units of momentum. But perceived complexity is psychological, not physical, and the accounting is not the same. By moving complexity around we might increase or decrease the overall complexity.

The opening quote suggests that complexity is an optimization problem, not an accounting problem. It also suggests that driving the complexity of one part of a system to its minimum may disproportionately increase the complexity of another part. Striving for the simplest possible proofs, for example, could make the definitions much harder to digest. There’s a similar dynamic in programming languages and programs.

Larry Wall said that Scheme is a beautiful programming language, but every Scheme program is ugly. Perl, on the other hand, is ugly, but it lets you write beautiful programs. Scheme can be simple because it requires libraries and applications to implement functionality that is part of more complex languages. I had similar thoughts about COM. It was an elegant object system that lead to hideous programs.

Scheme is a minimalist programming language, and COM is a minimalist object framework. By and large the software development community prefers complex languages and frameworks in hopes of writing smaller programs. Additional complexity in languages and frameworks isn’t felt as strongly as additional complexity in application code. (Until something breaks. Then you might have to explore parts of the language or framework that you had blissfully ignored before.)

The opening quote deals specifically with the complexity of theorems and proofs. In context, the author was saying that the price of Grothendieck’s elegant proofs was a daunting edifice of definitions. (More on that here.) Grothendieck may have taken this to extremes, but many mathematicians agree with the general approach of pushing complexity out of theorems and into definitions. Michael Spivak defends this approach in the preface to his book Calculus on Manifolds.

… the proof of [Stokes’] theorem is, in the mathematician’s sense, an utter triviality — a straight-forward calculation. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions … There are good reasons why the theorems should all be easy and the definitions hard. As the evolution of Stokes’ theorem revealed, a single simple principle can masquerade as several difficult results; the proofs of many theorems involve merely stripping away the disguise. The definitions, on the other hand, serve a twofold purpose: they are rigorous replacements for vague notions, and machinery for elegant proofs. [emphasis added]

Mathematicians like to push complexity into definitions like software developers like to push complexity into languages and frameworks. Both strategies can make life easier on professionals while making it harder on beginners.

Related post: A little simplicity goes a long way

Problem solvers and theory builders

From Mathematics without Apologies:

It’s conventional to classify mathematicians as “problem solvers” or “theory builders,” depending on temperament. My experiences and the sources I consulted in writing this book convince me that curiosity about problems guides the growth of theories, rather than the other way around. Alexander Grothendieck and Robert Langlands … count among the most ambitious of all builders of mathematical theories, but everything they built was addressed to specific problems with ancient roots.

Related post: Examples bring a subject to life

Special function resources

This week’s resource post: some pages of notes on special functions:

See also blog posts tagged special functions.

I tweet about special functions occasionally on AnalysisFact

Last week: HTML, LaTeX, and Unicode

Next week: Python resources

Counting primitive bit strings

A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)

For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says

 2^{12} = f(12) + f(6) + f(4) + f(3) + f(2) + f(1)

and in general

2^n = \sum_{d \mid n} f(d)

Here the sum is over all positive integers d that divide n.

Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that

f(n) = \sum_{d \mid n} \mu\left(\frac{n}{d}\right) 2^d

where μ is the Möbius function.

We could compute f(n) with Python as follows:

from sympy.ntheory import mobius, divisors

def num_primitive(n):
    return sum( [mobius(n/d)*2**d for d in divisors(n)] )

The latest version of SymPy, version 0.7.6, comes with a function mobius for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius function:

from sympy.ntheory import factorint

def mobius(n):
    exponents = factorint(n).values()
    lenexp = len(exponents)
    m = 0 if lenexp == 0 else max(exponents)
    return 0 if m > 1 else (-1)**lenexp

The version of mobius that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.

After a coin comes up heads 10 times

Suppose you’ve seen a coin come up heads 10 times in a row. What do you believe is likely to happen next? Three common responses:

  1. Heads
  2. Tails
  3. Equal probability of heads or tails.

Each is reasonable in its own context. The last answer is correct assuming the flips are independent and heads and tails are equally likely.

But as I argued here, if you see nothing but heads, you have reason to question the assumption that the coin is fair. So there’s some justification for the first answer.

The reasoning behind the second answer is that tails are “due.” This isn’t true if you’re looking at independent flips of a fair coin, but it could reasonable in other settings, such as sampling without replacement.

Say there are a number of coins on a table, covered by a cloth. A fixed number are on the table heads up, and a fixed number tails up. You reach under the cloth and slide a coin out. Every head you pull out increases the chances that the next coin will be tails. If there were an equal number of heads and tails under the cloth to being with, then after pulling out 10 heads tails are indeed more likely next time.

Related post: Long runs