From the category archives:

Math

Three rules of thumb

by John on July 1, 2009

Here are three rules of thumb for back-of-the-envelope estimates:

  1. Duff’s rule: Pi seconds is a nanocentury.
  2. Hopper’s rule: Light travels one foot in a nanosecond.
  3. Rule of 72: An investment at n% interest will double in 72/n years.

How might you use these? How accurate are they?

Duff’s rule comes in handy when converting from times measured in seconds to times measured on calendars. This may not sound useful, but it often happens in software. For example, if a task takes a second to complete, how long would it take to do it a billion times? Well, a billion seconds, obviously. But how long is that in familiar terms? Duff’s rule says a century is about 3.14 billion seconds, so a billion seconds would be something like 30 years.

How accurate is Duff’s rule? A year is 31,536,000 seconds, whereas Duff’s rule would estimate 31,415,927 seconds, so it underestimates the number of seconds in a year by about 0.4%.

Hopper’s rule is useful in electrical engineering. For example, you might need to know how long it would take a radio signal to travel between a transmitter and receiver. Hopper’s rule can explain why computer chip clock rates are not increasing. Electrical signals travel at some fraction of the speed of light, and current chip designs are limited by whether a signal can move across the chip during a clock cycle.

How accurate is Hopper’s rule? Light travels 299,792,458 meters per second. That corresponds to 0.983 feet per nanosecond, so Hopper’s rule overestimates by about 1.7%.

Here is a terrific video of Grace Hopper explaining Hopper’s rule to David Letterman, around 4:25. (Thanks Bill!)

The Rule of 72 is obviously useful in financial estimates. For example, $1000 invested at 6% interest will become $2000 in 72/6 = 12 years.

How accurate is the rule of 72? The value of an initial investment P at time t with under a continuous interest rate r is P exp(rt). Solving exp(rt) = 2 for t gives t = log 2 / r. If we express r as a percentage, we have to multiply t by 100. This says that for continuously compounded interest, the rule of 72 would be exact if “72″ were replaced with 100 log 2 = 69.3. So for continuous interest, the rule overestimates the doubling time by 0.72/log 2 or about 4%. So why use 72 rather than 69.3? There are two reasons. First, 72 is easy to work with mentally since it is divisible by lots of small integers. Second, interest is often compounded periodically — say annually or monthly — rather than continuously.

The doubling time is longer for investments with periodic interest rather than continuous interest. The overestimate from using 72 rather than 69.3 is partially canceled out by the accounting for the longer doubling time for periodic compounding and so 72 may work better than 69.3. Exactly how accurate the rule of 72 is for periodically compounded interest depends on the interest rate and the compounding period.

Related post:

Bancroft’s rule (rule of thumb for estimating linear regression)

{ 4 comments }

If you run into someone on the street, the probability that the other person shares your birthday is 1/365. If you run into five people, the probability that at least one of them shares your birthday is 5/365, right?

The answer 5/365 is quite accurate, but not exactly correct. It’s good enough for this problem since there are other practical difficulties besides the quality of the approximation. For example, the problem implicitly assumes there are 365 days in a year, i.e. that no one is ever born on Leap Day.

Now think about a similar problem. Suppose the chance of rain is 40% each day for the next three days. Does that mean there is a 120% chance that it will rain at least one of the next three days? That can’t be. In fact, the chance of some rain over the next three days is 78.4%.

The following rule appears to be correct for birthdays but not for predicting rain.

If the probability of a success in one attempt is p, then the probability of at least one success in n attempts is np.

Why does this rule hold sometimes and not at other times? If the probability of success on each attempt is p, the probability of failure on each attempt is (1-p). The probability of n failures in a row is (1-p)n and so the probability of at least one success is 1 - (1-p)n. That’s the right way to approach the birthday example and the rain example. In the birthday example, p = 1/365 and so the probability of running into at least one person in five who shares your birthday is 1 - (364/365)5 = 0.013624. And 5/365 = 0.013699 is a very good approximation. In the rain example, p = 0.4 and the probability of at least one day of rain out of the next three is 1 - 0.63 = 0.784. The difference between the birthday example and the rain example is the size of p. The following equation, based on the binomial theorem, explains why.

1 - (1-p)^n = np - {n \choose 2}p^2 + {n \choose 3} p^3 - {n \choose 4}p^4 + \cdots

When we use np as our approximation, we’re ignoring the terms involving p2 and higher powers of p. When p is small, higher powers of p are very small and can be ignored. That’s why the approximation worked well for p = 1/365. But when p is large, say p = 0.4, the error is large; the terms involving higher powers of p are important in that case. Notice also that the size of n matters. The birthday example breaks down when n is large. If you run into 400 people, it is likely that one of them will share your birthday, but far from certain. The probability in that case is about 2/3,  not 400/365.

When p and n are both small, the probability of at least one success out of n tries is approximately np. We can say more. Because the first term the approximation drops from the equation above has a negative sign, our approximation is also an upper bound. This says np slightly over-estimates the probability.

Now how small do p and n have to be? If you calculate the approximation np and get a small answer, then it’s a good answer. Why? The error in the np approximation is roughly n(n-1)p2/2, which is less than (np)2. And if np is small, (np)2 is very small.

See Sales tax included for a similar discussion. That also post looks at a common mistake and explains why it makes a good approximation under some circumstances and not under others.

{ 3 comments }

Optical illusion, mathematical illusion

by John on June 24, 2009

Someone sent me a link to an optical illusion while I was working on a math problem. The two things turned out to be related.

In the image below, what look like blues spiral and green spirals are actually exactly the same color. The spiral that looks blue is actually green inside, but the magenta stripes across it make the green look blue. I know you don’t believe me; I didn’t believe it either. See this blog post for an explanation, including a magnified close-up of the image. Or open it in an image editor and use a color selector to see for yourself.

My math problem was also a case of two things that look different even though they are not. Maybe you can think back to a time as a student when you knew your answer was correct even though it didn’t match the answer in the back of the book. The two answers were equivalent but written differently. In an algebra class you might answer 5 / √ 3 when the book has 5 √ 3 / 3. In trig class you might answer 1 - cos2x when the book has sin2x. In a differential equations class, equivalent answers may look very different since arbitrary constants can obfuscate differences.

In my particular problem, I was looking at weights for Gauss-Hermite integration. I was trying to reconcile two different expressions for the weights, one in some software I’d written years ago and one given in A&S. I thought I’d found a bug, at least in my comments if not in my software. My confusion was analogous to not recognizing a trig identity.  I wish I could say that the optical illusion link made me think that the two expressions may be the same and they just look different because of a mathematical illusion. That would make a good story. Instead, I discovered the equivalence of the two expressions by brute force, having Mathematica print out the values so I could compare them. Only later did I see the analogy between my problem and the optical illusion.

In case you’re interested in the details, my problem boiled down to the equivalence between Hn+1(xi)2 and 4n2Hn-1(xi)2 where Hn(x) is the nth Hermite polynomial and xi is the ith root of Hn. Here’s why these are the same. The Hermite polynomials satisfy a recurrence relation Hn+1(x) = 2x Hn(x) - 2n Hn-1(x) for all x. Since Hn(xi) = 0, Hn+1(xi) = -2nHn-1(xi). Now square both sides.

Related post: Orthogonal polynomials

{ 6 comments }

Three books on inequalities

by John on June 3, 2009

The classic book on inequalities is Inequalities by Hardy, Littlewood, and Pólya, first published in 1934. I’ve mentioned it a few times before. The book is quite thorough. I imagine more people use it as a reference than read it cover to cover.

Inequalities by Hardy, Littlewood, and Polya

The Cauchy-Schwarz Master Class by Michael Steele is a more recent book on inequalities, published in 2004. The preface explains what the title means by a “master class.”

In the fine arts, a master class is a small class where students and coaches work together to support a high level of technical and creative excellence. This book tries to capture the spirit of a master class while providing coaching for readers who want to refine their skills as solvers of problems, especially those problems dealing with mathematical inequalities.

The Cauchy-Schwarz Master Class book lives up to its stated aim. Reading the book and working through the exercises reminds me of taking music lessons. The emphasis is on learning techniques rather than cataloging specific results.

The final book I’ll mention is When Less is More: Visualizing Basic Inequalities by Claudi Alsina and Roger B. Nelsen. Like The Cauchy-Schwarz Master Class, this book emphasizes problem solving technique. More specifically, it emphasizes geometric techniques for understanding and proving inequalities. (I’ve written a review of When Less is More, but unfortunately you have to log in as an MAA member to read it.) When Less is More is as concerned with beauty as with truth.

You can tell from the cover art and the architectural allusion in the title that this book cares about aesthetics. Each proof is a polished gem.

Related posts:

Old math books
Means and inequalities
Random inequalities

{ 0 comments }

The silver ratio

by John on May 20, 2009

Most people have heard of the golden ratio, but have you ever heard of the silver ratio? I only heard of it this week. The golden ratio can be expressed by a continued fraction in which all coefficients equal 1.

The silver ratio is the analogous continued fraction with all coefficients equal to 2.

You might think for a moment that the silver ratio should be just twice the golden ratio, but the coefficients contribute to the series in a non-linear way. The silver ratio actually equals 1 + √2. The golden ratio has a simple geometric interpretation. I don’t know of a geometric interpretation of the silver ratio. (Update: See Maxwell’s Demon for geometric applications of the silver ratio.)

A previous post mentioned that the golden ratio and related numbers are the worst case for Hurwitz’s theorem. The silver ratio and its cousins are the second worst case for the theorem.

Related posts:

Breastfeeding, the golden ratio, and rational approximation
Golden ratio and special angles
Connecting Fibonacci and geometric sequences
Fibonacci numbers and numerical integration

{ 4 comments }

Gil Kalai’s blog featured a guest post the other day about breastfeeding twins. The post commented in passing that

φ, the golden ratio, is the number hardest to approximate by rationals.

What could this possibly have to do with breastfeeding? The post described a pair of twins with hunger cycles sin(t) and sin(φt), functions that hardly ever come close to synchronizing. The constant φ is difficult to approximate by rational numbers, in a sense that I describe below, and this explains why the two functions are so often out of sync.

graphs of sin(t) and sin(φ t)

In one sense, φ is very easy to approximate by rationals. The ratio of any two consecutive Fibonacci numbers is a rational approximation to φ, the approximation improving as you go further out the Fibonacci sequence. The same is true for any generalized Fibonacci sequence, though the approximation may not be very good until you go out a ways in the sequence, depending on your starting values. So φ is easy to approximate in the sense that it is convenient to think of approximations.

Now let’s look at the sense in which φ is hard to approximate. How accurately can you approximate an irrational number ξ by a rational number a/b? Obviously you can do a better and better job by picking larger and larger fractions. For example, you could always take the first n digits of the decimal expansion of ξ and use that to make a rational approximation with denominator 10n. But that might be very inefficient. How well can you approximate ξ relative to the size of the denominator? This question is answered in general by a theorem of Hurwitz.

Given any irrational number ξ, there exist infinitely many different rational numbers a/b such that

\left| \xi - \frac{a}{b} \right| < \frac{1}{\sqrt{5}\, b^2}

The decimal expansion idea mentioned above is wasteful because the error goes down in proportion to the denominator, while Hurwitz theorem says it’s possible for the error to go down in proportion to the square of the denominator.

Can we do any better than Hurwitz theorem? For example, is it possible to replace √5 with some larger constant? Not in general, and the golden ratio φ provides the counterexample to any would-be refinements of Hurwitz theorem. The constant φ is a worst case. That is the sense in which φ is the number hardest to approximate by rationals. If φ and some related numbers are excluded, then the constant √5 can be increased.

Going back to breastfeeding, suppose the twins had hunger cycles sin(t) and sin(ξt). If ξ is irrational, the two curves will never exactly have the same period. But if ξ were equal to a rational number a/b, then the two functions would have a common period of length bπ if a is even or of length 2bπ if a is odd. If ξ were (approximately) equal to a/b with b small, the feedings would often synchronize. Since φ requires large values of b to make a good rational approximation, ξ = φ is a worst case.

Related posts:

Golden ratio and special angles
Connecting Fibonacci and geometric sequences
Fibonacci numbers at work

{ 1 comment }

Golden ratio and special angles

by John on May 16, 2009

The golden ratio comes up in unexpected places. This post will show how the sines and cosines of some special angles can be expressed in terms of the golden ratio and its complement.

[click to continue...]

{ 2 comments }

Here’s a quick demonstration of a connection between the Fibonacci sequence and geometric sequences.

The famous Fibonacci sequence starts out 1, 1, 2, 3, 5, 8, 13 … The first two terms are both 1, then each subsequent terms is the sum of the two preceding terms.

A generalized Fibonacci sequence can start with any two numbers and then apply the rule that subsequent terms are defined as the sum of their two predecessors. For example, if we start with 3 and 4, we get the sequence 3, 4, 7, 11, 18, 29, …

Let φ be the golden ratio, the positive solution to the equation 1 + x = x2. Let φ’ be the conjugate golden ratio, the negative solution to the same quadratic equation. Then φ = (1 + √ 5)/2 and φ’ = (1 - √ 5)/2.

Now let’s look at a generalized Fibonacci sequence starting with 1 and φ. Then our terms are 1, φ, 1 + φ, 1 + 2φ, 2 + 3φ, 3 + 5φ, … Let’s see whether we can simplify this sequence.

Now 1 + φ = φ2 because of the quadratic equation φ satisfies. That tells us the third term equals φ2. So our series starts out 1, φ, φ2. This is looking like a geometric sequence. Could the fourth term be φ3? In fact, it is. Since the fourth term is the sum of the second and third terms, it equals φ +φ2 = φ(1 + φ) = φ(φ2) = φ3. You can continue this line of reasoning to prove that the generalized Fibonacci sequence starting with 1 and φ is in fact the geometric sequence 1, φ, φ2, φ3, …

Now start a generalized Fibonacci sequence with φ’. Because φ’ is also a solution to 1 + x = x2, it follows that the sequence 1, φ’, 1 + φ’, 1 + 2φ’, 2 + 3φ’, … equals the geometric sequence 1, φ’, (φ’)2, (φ’)3, …

Related posts:

Honey bee genealogy (elementary)
Fibonacci numbers at work (advanced)

{ 4 comments }

Here’s a strange theorem I ran across last week. I’ll state the theorem then give some footnotes.

Suppose f(z) and g(z) are two functions meromorphic in the plane. Suppose also that there are five distinct numbers a1, …, a5 such that the solution sets {z : f(z) = ai} and {z : g(z) = ai} are equal. Then either f(z) and g(z) are equal everywhere or they are both constant.

Notes

A complex function of a complex variable is merom0rphic if it is differentiable except at isolated singularities. The theorem above applies to functions that are (complex) differentiable in the entire plane except at isolated poles.

The theorem is due to Rolf Nevanlinna. There’s a whole branch of complex analysis based on Nevanlinna’s work, but I’d not heard of it until last week. I have no idea why the theorem is true. It doesn’t seem that it should be true; the hypothesis seems far too weak for such a strong conclusion. But that’s par for the course in complex variables.

Update: I edited this post in response to the first comment below to make the theorem statement clearer.

{ 7 comments }

Someone sent me email regarding my online calculator for computing the distance between to locations given their longitude and latitude values. He wants to so sort of the opposite. Starting with the longitude and latitude of one location, he wants to find the longitude and latitude of locations moving north/south or east/west of that location. I like to answer reader questions when I can, so here goes. I’ll give a theoretical derivation followed by some Python code.

Longitude and latitude and are usually measured in degrees, but theoretical calculations are cleaner in radians.  Someone using the Python code below could think in terms of degrees; radians will only be used inside function implementations. We’ll use the fact that on a circle of radius r, an arc of angle θ radians has length rθ. We’ll assume the earth is a perfect sphere. See this post for a discussion of how close the earth is to being a sphere.

Moving North/South

I’ll start with moving north/south since that’s simpler. Let R be the radius of the earth. An arc of angle φ radians on the surface of the earth has length M = Rφ, so an arc M miles long corresponds to an angle of φ = M/R radians. Moving due north or due south does not change longitude.

Moving East/West

Moving east/west is a little more complicated. At the equator, the calculation is just like the calculation above, except that longitude changes rather than latitude. But the distance corresponding to one degree of longitude changes with latitude. For example, one degree of longitude along the Arctic Circle doesn’t take you nearly as far as it does at the equator.

Suppose you’re at latitude φ degrees north of the equator. The circumference of a circle at constant latitude φ, a circle parallel to the equator, is cos φ times smaller than the circumference of the equator. So at latitude φ an angle of θ radians describes an arc of length M = R θ cos φ. A distance M miles east or west corresponds to a change in longitude of θ = M/(R cos φ). Moving due east or due west does not change latitude.

Python code

The derivation above works with angles in radians. Python’s cosine function also works in radians. But longitude and latitude are usually expressed in degrees, so function inputs and outputs are in degrees.

import math

# Distances are measured in miles.
# Longitudes and latitudes are measured in degrees.
# Earth is assumed to be perfectly spherical.

earth_radius = 3960.0
degrees_to_radians = math.pi/180.0
radians_to_degrees = 180.0/math.pi

def change_in_latitude(miles):
    "Given a distance north, return the change in latitude."
    return (miles/earth_radius)*radians_to_degrees

def change_in_longitude(latitude, miles):
    "Given a latitude and a distance west, return the change in longitude."
    # Find the radius of a circle around the earth at given latitude.
    r = earth_radius*math.cos(latitude*degrees_to_radians)
    return (miles/r)*radians_to_degrees

Related posts:

What is the shape of the earth?
Spherical trigonometry

{ 6 comments }

51st Carnival of Mathematics posted

by John on April 24, 2009

The long-awaited 51st Carnival of Mathematics is up at squareCircleZ.

{ 0 comments }

Draw a few points on a circle and then draw a straight line from every point to every other point. Count the number of regions created.

2 points, 2 regions.

3 points, 4 regions.

4 points, 8 regions.

5 points, 16 regions.

6 points? See this post from the f(t) blog for the answer.

{ 8 comments }

Metabolism and power laws

by John on April 16, 2009

Bigger animals have more cells than smaller animals. More cells means more cellular metabolism and so more heat produced. How does the amount of heat an animal produces vary with its size? We clearly expect it to go up with size, but does it increase in proportion to volume? Surface area? Something in between?

A first guess would be that metabolism (equivalently, heat produced) goes up in proportion to volume. If cells are all roughly the same size, then number of cells increases proportionately with volume. But heat is dissipated through the surface. Surface area increases in proportion to the square of length but volume increases in proportion to the cube of length. That means the ratio of surface area to volume decreases as overall size increases. The surface area to volume ratio for an elephant is much smaller than it is for a mouse. If an elephant’s metabolism per unit volume were the same as that of a mouse, the elephant’s skin would burn up.

So metabolism cannot be proportional to volume. What about surface area? Here we get into variety and controversy. Many people assume metabolism is proportional to surface area based on the argument above. This idea was first proposed by Max Rubner in 1883. Others emphasize data that supports the theory that suggests metabolism is proportional to surface area.

In the 1930’s, Max Kleiber proposed that metabolism increases according to body mass raised to the power 3/4. (I’ve been a little sloppy here using body mass and volume interchangeably. Body mass is more accurate, though to first approximation animals have uniform density.) If metabolism were proportional to volume, the exponent would be 1. If it were proportional to surface area, the exponent would be 2/3. But Kleiber’s law says it’s somewhere in between, namely 3/4. The image below comes from a paper by Kleiber from 1947.

Kleiber M. (1947). Body size and metabolic rate. Physiological Reviews 27: 511-541.

The graph shows that on a log-log plot, the metabolism rate versus body mass for a large variety of animals has slope approximately 3/4.

So why the exponent 3/4? There is a theoretical explanation called the metabolic scaling theory proposed by Geoffrey West, Brian Enquist, and James Brown. Metabolic scaling theory says that circulatory systems and other networks are fractal-like because this is the most efficient way to serve an animal’s physiological needs. To quote Enquist:

Although living things occupy a three-dimensional space, their internal physiology and anatomy operate as if they were four-dimensional. … Fractal geometry has literally given life an added dimension.

The fractal theory would explain the power law exponent exponent 3/4 simply: it’s the ratio of the volume dimension to the fractal dimension. However, as I suggested earlier, this theory is controversial. Some biologists dispute Kleiber’s law. Others accept Kleiber’s law as an empirical observation but dispute the theoretical explanation of West, Enquist, and Brown.

To read more about metabolism and power laws, see chapter 17 of Complexity: A Guided Tour.

Related posts:

Networks and power laws
Rate of regularizing English verbs

{ 8 comments }

Albrecht Dürer’s art and math

by John on April 10, 2009

Richard Elwes has a fun blog post this morning: Dürer, rhinos, and snowflakes. The post is primarily about the art and mathematics of Albrecht Dürer (1471-1528)  but also includes some related links to recent writings, such as Michael Croucher’s blog post on snowflakes.

{ 3 comments }

Anatomy of a floating point number

by John on April 6, 2009

In my previous post, I explained that floating point numbers are a leaky abstraction. Often you can pretend that they are mathematical real numbers, but sometimes you cannot. This post peels back the abstraction and explains exactly what a floating point number is. (Technically, this post describes an IEEE 754 double precision floating point number, by far the most common kind of floating point number in practice.)

A floating point number has 64 bits that encode a number of the form ± p × 2e. The first bit encodes the sign, 0 for positive numbers and 1 for negative numbers. The next 11 bits encode the exponent e, and the last 52 bits encode the precision p. The encoding of the exponent and precision require some explanation.

The exponent is stored with a bias of 1023. That is, positive and negative exponents are all stored in a single positive number by storing e + 1023 rather than storing e directly. Eleven bits can represent integers from 0 up to 2047. Subtracting the bias, this corresponds to values of e from -1023 to +1024. Define emin = -1022 and emax = +1023. The values emin - 1 and emax + 1 are reserved for special use. More on that below.

Floating point numbers are typically stored in normalized form. In base 10, a number is in normalized scientific notation if the significand is ≥ 1 and < 10. For example, 3.14 × 102 is in normalized form, but 0.314 × 103 and 31.4 × 102 are not. In general, a number in base β is in normalized form if it is of the form p × βe where 1 ≤ p < β. This says that for binary, i.e. β = 2, the first bit of the significand of a normalized number is always 1. Since this bit never changes, it doesn’t need to be stored. Therefore we can express 53 bits of precision in 52 bits of storage. Instead of storing the significand directly, we store f, the fractional part, where the significand is of the form 1.f.

The scheme above does not explain how to store 0. Its impossible to specify values of f and e so that 1.f × 2e = 0. The floating point format makes an exception to the rules stated above. When e = emin - 1 and f = 0, the bits are interpreted as 0. When e = emin - 1 and f ≠ 0, the result is a denormalized number. The bits are interpreted as 0.f × 2emin. In short, the special exponent reserved below emin is used to represent 0 and denormalized floating point numbers.

The special exponent reserved above emax is used to represent ∞ and NaN. If e = emax + 1 and f = 0, the bits are interpreted as ∞. But if e = emax + 1 and f ≠ 0, the bits are interpreted as a NaN or “not a number.” See IEEE floating point exceptions for more information about ∞ and NaN.

Since the largest exponent is 1023 and the largest significant is 1.f where f has 52 ones, the largest floating point number is 21023(2 - 2-52) = 21024 - 2971 ≈ 21024 ≈ 1.8 × 10308. In C, this constant is defined as DBL_MAX, defined in <float.h>.

Since the smallest exponent is -1022, the smallest positive normalized number is 1.0 × 2-1022 ≈ 2.2 × 10-308. In C, this is defined as DBL_MIN. However, it is not the smallest positive number representable as a floating point number, only the smallest normalized floating point number. Smaller numbers can be expressed in denormalized form, albeit at a loss of significance. The smallest denormalized positive number occurs with f has 51 0’s followed by a single 1. This corresponds to 2-52*2-1022 = 2-1074 ≈ 4.9 × 10-324. Attempts to represent any smaller number must underflow to zero.

C gives the name DBL_EPSILON to the smallest positive number ε such that 1 + ε ≠ 1 to machine precision. Since the significant has 52 bits, it’s clear that DBL_EPSILON = 2-52 ≈ 2.2 × 10-16. That is why we say a floating point number has between 15 and 16 significant (decimal) figures.

For more details see What Every Computer Scientist Should Know About Floating-Point Arithmetic.

First post in this series: Floating point numbers are a leaky abstraction

{ 8 comments }