Posts tagged as:

Math

Gil Kalai’s blog featured a guest post the other day about breastfeeding twins. The post commented in passing that

φ, the golden ratio, is the number hardest to approximate by rationals.

What could this possibly have to do with breastfeeding? The post described a pair of twins with hunger cycles sin(t) and sin(φt), functions that hardly ever come close to synchronizing. The constant φ is difficult to approximate by rational numbers, in a sense that I describe below, and this explains why the two functions are so often out of sync.

graphs of sin(t) and sin(φ t)

In one sense, φ is very easy to approximate by rationals. The ratio of any two consecutive Fibonacci numbers is a rational approximation to φ, the approximation improving as you go further out the Fibonacci sequence. The same is true for any generalized Fibonacci sequence, though the approximation may not be very good until you go out a ways in the sequence, depending on your starting values. So φ is easy to approximate in the sense that it is convenient to think of approximations.

Now let’s look at the sense in which φ is hard to approximate. How accurately can you approximate an irrational number ξ by a rational number a/b? Obviously you can do a better and better job by picking larger and larger fractions. For example, you could always take the first n digits of the decimal expansion of ξ and use that to make a rational approximation with denominator 10n. But that might be very inefficient. How well can you approximate ξ relative to the size of the denominator? This question is answered in general by a theorem of Hurwitz.

Given any irrational number ξ, there exist infinitely many different rational numbers a/b such that

\left| \xi - \frac{a}{b} \right| < \frac{1}{\sqrt{5}\, b^2}

The decimal expansion idea mentioned above is wasteful because the error goes down in proportion to the denominator, while Hurwitz theorem says it’s possible for the error to go down in proportion to the square of the denominator.

Can we do any better than Hurwitz theorem? For example, is it possible to replace √5 with some larger constant? Not in general, and the golden ratio φ provides the counterexample to any would-be refinements of Hurwitz theorem. The constant φ is a worst case. That is the sense in which φ is the number hardest to approximate by rationals. If φ and some related numbers are excluded, then the constant √5 can be increased.

Going back to breastfeeding, suppose the twins had hunger cycles sin(t) and sin(ξt). If ξ is irrational, the two curves will never exactly have the same period. But if ξ were equal to a rational number a/b, then the two functions would have a common period of length bπ if a is even or of length 2bπ if a is odd. If ξ were (approximately) equal to a/b with b small, the feedings would often synchronize. Since φ requires large values of b to make a good rational approximation, ξ = φ is a worst case.

Related posts:

Golden ratio and special angles
Connecting Fibonacci and geometric sequences
Fibonacci numbers at work

{ 1 comment }

Golden ratio and special angles

by John on May 16, 2009

The golden ratio comes up in unexpected places. This post will show how the sines and cosines of some special angles can be expressed in terms of the golden ratio and its complement.

[click to continue...]

{ 2 comments }

Here’s a strange theorem I ran across last week. I’ll state the theorem then give some footnotes.

Suppose f(z) and g(z) are two functions meromorphic in the plane. Suppose also that there are five distinct numbers a1, …, a5 such that the solution sets {z : f(z) = ai} and {z : g(z) = ai} are equal. Then either f(z) and g(z) are equal everywhere or they are both constant.

Notes

A complex function of a complex variable is merom0rphic if it is differentiable except at isolated singularities. The theorem above applies to functions that are (complex) differentiable in the entire plane except at isolated poles.

The theorem is due to Rolf Nevanlinna. There’s a whole branch of complex analysis based on Nevanlinna’s work, but I’d not heard of it until last week. I have no idea why the theorem is true. It doesn’t seem that it should be true; the hypothesis seems far too weak for such a strong conclusion. But that’s par for the course in complex variables.

Update: I edited this post in response to the first comment below to make the theorem statement clearer.

{ 7 comments }

Someone sent me email regarding my online calculator for computing the distance between to locations given their longitude and latitude values. He wants to so sort of the opposite. Starting with the longitude and latitude of one location, he wants to find the longitude and latitude of locations moving north/south or east/west of that location. I like to answer reader questions when I can, so here goes. I’ll give a theoretical derivation followed by some Python code.

Longitude and latitude and are usually measured in degrees, but theoretical calculations are cleaner in radians.  Someone using the Python code below could think in terms of degrees; radians will only be used inside function implementations. We’ll use the fact that on a circle of radius r, an arc of angle θ radians has length rθ. We’ll assume the earth is a perfect sphere. See this post for a discussion of how close the earth is to being a sphere.

Moving North/South

I’ll start with moving north/south since that’s simpler. Let R be the radius of the earth. An arc of angle φ radians on the surface of the earth has length M = Rφ, so an arc M miles long corresponds to an angle of φ = M/R radians. Moving due north or due south does not change longitude.

Moving East/West

Moving east/west is a little more complicated. At the equator, the calculation is just like the calculation above, except that longitude changes rather than latitude. But the distance corresponding to one degree of longitude changes with latitude. For example, one degree of longitude along the Arctic Circle doesn’t take you nearly as far as it does at the equator.

Suppose you’re at latitude φ degrees north of the equator. The circumference of a circle at constant latitude φ, a circle parallel to the equator, is cos φ times smaller than the circumference of the equator. So at latitude φ an angle of θ radians describes an arc of length M = R θ cos φ. A distance M miles east or west corresponds to a change in longitude of θ = M/(R cos φ). Moving due east or due west does not change latitude.

Python code

The derivation above works with angles in radians. Python’s cosine function also works in radians. But longitude and latitude are usually expressed in degrees, so function inputs and outputs are in degrees.

import math

# Distances are measured in miles.
# Longitudes and latitudes are measured in degrees.
# Earth is assumed to be perfectly spherical.

earth_radius = 3960.0
degrees_to_radians = math.pi/180.0
radians_to_degrees = 180.0/math.pi

def change_in_latitude(miles):
    "Given a distance north, return the change in latitude."
    return (miles/earth_radius)*radians_to_degrees

def change_in_longitude(latitude, miles):
    "Given a latitude and a distance west, return the change in longitude."
    # Find the radius of a circle around the earth at given latitude.
    r = earth_radius*math.cos(latitude*degrees_to_radians)
    return (miles/r)*radians_to_degrees

Related posts:

What is the shape of the earth?
Spherical trigonometry

{ 6 comments }

51st Carnival of Mathematics posted

by John on April 24, 2009

The long-awaited 51st Carnival of Mathematics is up at squareCircleZ.

{ 0 comments }

Draw a few points on a circle and then draw a straight line from every point to every other point. Count the number of regions created.

2 points, 2 regions.

3 points, 4 regions.

4 points, 8 regions.

5 points, 16 regions.

6 points? See this post from the f(t) blog for the answer.

{ 8 comments }

Metabolism and power laws

by John on April 16, 2009

Bigger animals have more cells than smaller animals. More cells means more cellular metabolism and so more heat produced. How does the amount of heat an animal produces vary with its size? We clearly expect it to go up with size, but does it increase in proportion to volume? Surface area? Something in between?

A first guess would be that metabolism (equivalently, heat produced) goes up in proportion to volume. If cells are all roughly the same size, then number of cells increases proportionately with volume. But heat is dissipated through the surface. Surface area increases in proportion to the square of length but volume increases in proportion to the cube of length. That means the ratio of surface area to volume decreases as overall size increases. The surface area to volume ratio for an elephant is much smaller than it is for a mouse. If an elephant’s metabolism per unit volume were the same as that of a mouse, the elephant’s skin would burn up.

So metabolism cannot be proportional to volume. What about surface area? Here we get into variety and controversy. Many people assume metabolism is proportional to surface area based on the argument above. This idea was first proposed by Max Rubner in 1883. Others emphasize data that supports the theory that suggests metabolism is proportional to surface area.

In the 1930’s, Max Kleiber proposed that metabolism increases according to body mass raised to the power 3/4. (I’ve been a little sloppy here using body mass and volume interchangeably. Body mass is more accurate, though to first approximation animals have uniform density.) If metabolism were proportional to volume, the exponent would be 1. If it were proportional to surface area, the exponent would be 2/3. But Kleiber’s law says it’s somewhere in between, namely 3/4. The image below comes from a paper by Kleiber from 1947.

Kleiber M. (1947). Body size and metabolic rate. Physiological Reviews 27: 511-541.

The graph shows that on a log-log plot, the metabolism rate versus body mass for a large variety of animals has slope approximately 3/4.

So why the exponent 3/4? There is a theoretical explanation called the metabolic scaling theory proposed by Geoffrey West, Brian Enquist, and James Brown. Metabolic scaling theory says that circulatory systems and other networks are fractal-like because this is the most efficient way to serve an animal’s physiological needs. To quote Enquist:

Although living things occupy a three-dimensional space, their internal physiology and anatomy operate as if they were four-dimensional. … Fractal geometry has literally given life an added dimension.

The fractal theory would explain the power law exponent exponent 3/4 simply: it’s the ratio of the volume dimension to the fractal dimension. However, as I suggested earlier, this theory is controversial. Some biologists dispute Kleiber’s law. Others accept Kleiber’s law as an empirical observation but dispute the theoretical explanation of West, Enquist, and Brown.

To read more about metabolism and power laws, see chapter 17 of Complexity: A Guided Tour.

Related posts:

Networks and power laws
Rate of regularizing English verbs

{ 8 comments }

Albrecht Dürer’s art and math

by John on April 10, 2009

Richard Elwes has a fun blog post this morning: Dürer, rhinos, and snowflakes. The post is primarily about the art and mathematics of Albrecht Dürer (1471-1528)  but also includes some related links to recent writings, such as Michael Croucher’s blog post on snowflakes.

{ 3 comments }

Anatomy of a floating point number

by John on April 6, 2009

In my previous post, I explained that floating point numbers are a leaky abstraction. Often you can pretend that they are mathematical real numbers, but sometimes you cannot. This post peels back the abstraction and explains exactly what a floating point number is. (Technically, this post describes an IEEE 754 double precision floating point number, by far the most common kind of floating point number in practice.)

A floating point number has 64 bits that encode a number of the form ± p × 2e. The first bit encodes the sign, 0 for positive numbers and 1 for negative numbers. The next 11 bits encode the exponent e, and the last 52 bits encode the precision p. The encoding of the exponent and precision require some explanation.

The exponent is stored with a bias of 1023. That is, positive and negative exponents are all stored in a single positive number by storing e + 1023 rather than storing e directly. Eleven bits can represent integers from 0 up to 2047. Subtracting the bias, this corresponds to values of e from -1023 to +1024. Define emin = -1022 and emax = +1023. The values emin - 1 and emax + 1 are reserved for special use. More on that below.

Floating point numbers are typically stored in normalized form. In base 10, a number is in normalized scientific notation if the significand is ≥ 1 and < 10. For example, 3.14 × 102 is in normalized form, but 0.314 × 103 and 31.4 × 102 are not. In general, a number in base β is in normalized form if it is of the form p × βe where 1 ≤ p < β. This says that for binary, i.e. β = 2, the first bit of the significand of a normalized number is always 1. Since this bit never changes, it doesn’t need to be stored. Therefore we can express 53 bits of precision in 52 bits of storage. Instead of storing the significand directly, we store f, the fractional part, where the significand is of the form 1.f.

The scheme above does not explain how to store 0. Its impossible to specify values of f and e so that 1.f × 2e = 0. The floating point format makes an exception to the rules stated above. When e = emin - 1 and f = 0, the bits are interpreted as 0. When e = emin - 1 and f ≠ 0, the result is a denormalized number. The bits are interpreted as 0.f × 2emin. In short, the special exponent reserved below emin is used to represent 0 and denormalized floating point numbers.

The special exponent reserved above emax is used to represent ∞ and NaN. If e = emax + 1 and f = 0, the bits are interpreted as ∞. But if e = emax + 1 and f ≠ 0, the bits are interpreted as a NaN or “not a number.” See IEEE floating point exceptions for more information about ∞ and NaN.

Since the largest exponent is 1023 and the largest significant is 1.f where f has 52 ones, the largest floating point number is 21023(2 - 2-52) = 21024 - 2971 ≈ 21024 ≈ 1.8 × 10308. In C, this constant is defined as DBL_MAX, defined in <float.h>.

Since the smallest exponent is -1022, the smallest positive normalized number is 1.0 × 2-1022 ≈ 2.2 × 10-308. In C, this is defined as DBL_MIN. However, it is not the smallest positive number representable as a floating point number, only the smallest normalized floating point number. Smaller numbers can be expressed in denormalized form, albeit at a loss of significance. The smallest denormalized positive number occurs with f has 51 0’s followed by a single 1. This corresponds to 2-52*2-1022 = 2-1074 ≈ 4.9 × 10-324. Attempts to represent any smaller number must underflow to zero.

C gives the name DBL_EPSILON to the smallest positive number ε such that 1 + ε ≠ 1 to machine precision. Since the significant has 52 bits, it’s clear that DBL_EPSILON = 2-52 ≈ 2.2 × 10-16. That is why we say a floating point number has between 15 and 16 significant (decimal) figures.

For more details see What Every Computer Scientist Should Know About Floating-Point Arithmetic.

First post in this series: Floating point numbers are a leaky abstraction

{ 8 comments }

Joel Spolsky coined the term leaky abstraction for programming concepts that usually shield you from messy details but sometimes break down. A perfect abstraction is a black box that you never have to open. A leaky abstraction is a black box that you have to open up occasionally.

Floating point numbers, the computer representations of real numbers,  are leaky abstractions. They work remarkably well: you can usually pretend that a floating point type is a mathematical real number. But sometimes you can’t. The abstraction leaks, though not very often.

Most explanations I’ve heard for the limitations of machine numbers are pedantic. “There are only a finite number of floating point numbers so they can’t represent real numbers well.” That’s not much help. It doesn’t explain why floating point numbers actually do represent real numbers sufficiently well for most applications, and it doesn’t suggest where the abstraction might leak.

A standard floating point number has roughly 16 decimal places of precision and a maximum value on the order of 10308, a 1 followed by 308 zeros. (According to IEEE standard 754, the typical floating point implementation.)

Sixteen decimal places is a lot. Hardly any measured quantity is known to anywhere near that much precision. For example, the constant in Newton’s Law of Gravity is only known to four significant figures. The charge of an electron is known to 11 significant figures, much more precision than Newton’s gravitational constant, but still less than a floating point number. So when are 16 figures not enough? One problem area is subtraction. The other elementary operations — addition, multiplication, division — are very accurate. As long as you don’t overflow or underflow,  these operations often produce results that are correct to the last bit. But subtraction can be anywhere from exact to completely inaccurate.  If two numbers agree to n figures, you can lose up to n figures of precision in their subtraction. This problem can show up unexpectedly in the middle of other calculations. For an example, see this post on calculating standard deviation. See also computing derivatives, the second example in  Five Tips for Floating Point Programming.

What about overflow or underflow? When do you need numbers bigger than 10308? Often you don’t. But in probability calculations, for example, you need them all the time unless you’re clever. It’s common in probability to compute a medium-sized number that is the product of an astronomically large number and an infinitesimally small number. The final result fits into a computer just fine, but the intermediate numbers might not due to overflow or underflow. For example, the maximum floating point number on most computers is somewhere between 170 factorial and 171 factorial. Such large factorials often appear in applications, often in ratios with other large factorials. See Avoiding Overflow, Underflow, and Loss of Precision for tricks on how to work with factorials that would overflow if computed directly.

Often you can afford to be blissfully ignorant of the details of floating point arithmetic, but sometimes you cannot. A great place to learn more is David Goldberg’s paper What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Update: See follow-up post, Anatomy of a floating point number.

Related posts:

How to compute binomial probabilities
NaN, 1.#IND, 1.#INF, and all that

{ 10 comments }

Interpolation errors

by John on April 1, 2009

Say you have a function f(x) and you want to find a polynomial p(x) that agrees with f(x) at several points. Given a set of points x0, x1, x2, … xn you can always find a polynomial of degree n such that p(xi) = f(xi) for i = 0, 1, 2, …, n. It seems reasonable that the more points you pick, the better the interpolating polynomial p(x) will match the function f(x). If the two functions match exactly at a lot of points, they should match well everywhere. Sometimes that’s true and sometimes it’s not.

Here is a famous example due to Carle Runge. Let f(x) = 1/(1 + x2) and let pn be the polynomial that interpolates f(x) at n+1 evenly spaced nodes in the interval [-5, 5]. As n becomes larger, the fit becomes worse.

Here’s a graph of f(x) and p9(x). The smooth blue line is f(x) and the wiggly red line is p9(x).

graph of f(x) and p9(x)

Here’s the analogous graph for p16(x).

graph of f(x) and p16(x)

The fit is improving in the middle. In fact, the curves agree to within the thickness of the plot line from say -1 to 1. But the fit is so bad in the tails that the graph had to be cut off. Here’s another plot of f(x) and p16(x) on a different scale to show how far negative the polynomial dips.

graph of f(x) and p16(x) on different scale

The problem is the spacing of the nodes. Interpolation errors are bad for evenly spaced nodes. If we interpolate f(x) at different points, at the Chebyshev nodes, then the fit is good.

interpolating at Chebyshev nodes

The Chebyshev nodes on [-1, 1] are xi = cos( π i / n ). Here we multiplied these nodes by 5 to scale to the interval [-5, 5].

If the function f(x) is absolutely continuous, as in our example, then the interpolating polynomials converge uniformly when you interpolate at Chebyshev nodes. However, ordinary continuity is not enough. Given any sequence of nodes, there exists a continuous function such that the polynomial interpolation error grows like log(n) as the number of nodes n increases.

Some numerical integration methods are based on interpolating polynomials: fit a polynomial to the integrand, then integrate the polynomial exactly to approximate the original integral. The examples above suggest that increasing the order of such integration methods might not improve accuracy and might even make things worse.

{ 6 comments }

What does “classical” mean in science?

by John on March 27, 2009

The word “classical” has a standard meaning in the humanities, but not in science.

Ward Cheney and Will Light give a candid definition of “classical” in the scientific sense in the introduction to their book on approximation theory:

… the “classical” portion of approximation theory — understood to be the parts of the subject that were already in place when the authors were students.

There you have it: whatever was known when you were in school is classical. Yes, this definition is entirely relative. And it describes common usage pretty well.

{ 1 comment }

Here is a game developer’s view of the vagaries of floating point arithmetic.

Related posts:

Five tips for floating point programming
Overflow and loss of precision

{ 1 comment }

Springs, resistors, and harmonic means

by John on March 26, 2009

Harmonic mean has come up in a couple posts this week (with numbers and functions). This post will show how harmonic means come up in physical problems involving springs and resistors.

Suppose we have two springs in series with stiffness k1 and k2:

Then the combined stiffness k of the two springs satisfies 1/k = 1/k1 + 1/k2. Think about what this says in the extremes. If one of the springs were infinitely stiff, say k2 = ∞. Then k = k1. It would be as if the second spring were not there. Being infinitely stiff, we could think of it as an extension of the block it is attached to. Now think of one of the springs having no stiffness at all, say k1 = 0. Then k = 0. One mushy spring makes the combination mushy.

Next think of two springs in parallel:

Now the combined stiffness of the two springs is k = k1 + k2. Again think of the two extremes. If one spring is infinitely stiff, say k1 = ∞, then k = ∞ and the combination is infinitely stiff. And if one spring has no stiffness, say k2 = 0, then k = k1. We could imagine the spring with no stiffness isn’t there.

The stiffness of springs in series adds harmonically. The stiffness of the combination is half the harmonic mean of the two individual stiffnesses.

Electrical resistors combine in a way that is the opposite of mechanical springs. Resistors in parallel combine like springs in series, and vice versa.

If two resistors have resistance r1 and r2, the combined resistance r of the two resistors in parallel satisfies 1/r = 1/r1 + 1/r2. If one of the resistors has infinite resistance, say r2 = ∞, then r = r1. It would be as if the second resistor were not there. All electrons would flow through the first resistor.

If the two resistors were in series, then r = r1 + r2. If one resistor has infinite resistance, so does the combination. Electrons cannot flow through the combination if they cannot flow through one of the resistors. And if one resistor has zero resistance, say r2 = 0, then r = r1. Since the second resistor offers no resistance to the flow of electrons, it may as well not be there.

These physical problems illustrate why zeros as handled specially in the definition of means.

Image credit: Wikipedia

{ 1 comment }

Means and inequalities for functions

by John on March 26, 2009

A post on Monday looked at means an inequalities for a lists of non-negative numbers. This post looks at analogous means and inequalities for non-negative functions. We go from means defined in terms of sums to means defined in terms of integrals.

[click to continue...]

{ 0 comments }