Posts tagged as:

Math

Inverse Mercator projection

by John on September 21, 2009

In my earlier post on the Mercator projection, I derived the function h(φ) that maps latitude on the Earth to vertical height on a map. The inverse of this function turns out to hold a few surprises.

The height y corresponding to a positive latitude φ is given by

h(φ) = log( sec(φ) + tan(φ) ).

The inverse function, h-1(y) = φ gives the latitude as a function of height. This function is called the “Gudermannian” after Christoph Gudermann and is abbreviated gd(y). Gudermann was the student of one famous mathematician, Karl Friedrich Gauss, and the teacher of another famous mathematician, Karl Weierstrass.

The Gudermannian function gd(y) can be reduced to familiar functions:

gd(y) = arctan( sinh(y) ) = 2 arctan( ey ) – π/2.

That doesn’t look very promising. But here’s the interesting part: the function gd forms a bridge between hyperbolic trig functions and ordinary trig functions.

sin( gd(x) ) = tanh(x)
tan( gd(x) ) = sinh(x)
cos( gd(x) ) = sech(x)
sec( gd(x) ) = cosh(x)
csc( gd(x) ) = coth(x)
cot( gd(x) ) = csch(x)

By definition, gd(x) is an angle θ whose tangent is sinh(x).

In the figure, tan(θ) = sinh(x). Since cosh2(x) – sinh2(x) = 1, the hypotenuse of the triangle is cosh(x). The identities above follow directly from the figure. For example, sin(θ) = sinh(x) / cosh(x) = tanh(x).

Finally, it is easy to show that gd is the inverse of the Mercator scale function h:

h( gd(x) ) = log( sec( gd(x) ) + tan( gd(x) ) ) = log( cosh(x) + sinh(x) ) = log( ex ) = x.

Related links:

Mercator projection
Gudermannian on MathWorld

{ 1 comment }

Easy to guess, hard to prove

by John on September 9, 2009

Suppose you’re waiting for a friend and you have nothing to do. After a few minutes of boredom you pick up a pencil and some scrap paper. You start listing the prime numbers.

2, 3, 5, 7, 11, 13, 17,19, 23, 29, 31, …

Next you write down the forward differences, subtracting each number in the sequence from the one that follows it.

1, 2, 2, 4, 2, 4, 2, 4, …

Your friend is running late and so you repeat the process starting with the sequence you just created.

1, 0, 2, -2, 2, 2, 2, 2, 4, …

Hmm. That time you got a negative number in the list. You’re just doodling and you don’t want to think too hard, so you decide you’ll ignore signs and just write down the absolute values of the differences. So you erase the negative sign and take differences again.

1, 2, 0, 0, 0, 0, 0, 2, …

Your friend is quite late, so you keep doing this some more. After a while you notice that every new sequence has started with 1. Will every sequence start with 1? That’s Gilbreath’s conjecture, named after Norman Gilbreath who asked the question in 1958. I ran across the conjecture in The Math Book by Clifford Pickover. Gilbreath wasn’t the first to notice this pattern. François Proth noticed it in 1878 and published an incorrect proof of the conjecture.

Gilbreath’s conjecture has been verified for the first several billion sequences, but nobody has proved that every sequence will start with 1. Paul Erdős speculated that Gilbreath’s conjecture is true but it would be 200 years before anyone could prove it. I find Erdős’s conjecture more interesting than Gilbreath’s conjecture.

Here’s what I imagine that Erdős had in mind. While the process Gilbreath created is very simple, it is also a strange thing to study. It’s not the kind of thing that people have proven theorems about. No one knows how to approach the problem. There are far more complicated problems in the mainstream of mathematics that will probably be resolved sooner because they are related to previously solved problems and researchers have some idea where to start working on them.

Other posts on number theory:

Constructive proof of the Chinese remainder theorem
How to solve linear congruences
How many numbers are squares mod m
Connecting probability and number theory

Other posts on proofs:

Why proof by examples doesn’t work
Errors in math papers not a big deal?
Jenga mathematics
In praise of tedious proofs
Proofs of false statements

{ 3 comments }

Why care about spherical trig?

by John on September 7, 2009

Last spring I wrote a post on spherical trigonometry, the study of triangles drawn on a sphere (e.g. the surface of the Earth).

triangles drawn on a sphere

Mel Hagen left a comment on that post a few days ago saying

I am revisiting Spherical Trig after 30 years by going back over some of my books that I have collected over the years. …

I asked Mel via email why he was revisiting the subject. He wrote an interesting reply that I am including below with his permission.

[click to continue...]

{ 1 comment }

My mathematical opposite

by John on July 8, 2009

Eugenia Cheng may be my mathematical opposite. She did a great interview with Peter Rowlett in which she bubbles over with enthusiasm for category theory. She explains that she couldn’t stand applied math, but stuck with math because she believed there was something there she could love. The further she moved from applicable math, the happier she became. Abstract algebra was a big improvement, but still too concrete. When she discovered category theory, she was home.

Category theory is a sort of meta-mathematics. It aims to identify patterns across diverse areas of math the way a particular area of math may identify patterns in nature. I like the idea of category theory, but I get that deer-in-the-headlights look in my eyes almost immediately when I look at category theory in any detail.

I enjoy pure math, though I prefer analysis to algebra. I even enjoyed my first abstract algebra class, but when I ran into category theory I knew I’d exceeded my abstraction tolerance. I moved more toward the applied end of the spectrum the longer I was in college. Afterward, I moved so far toward the applied end that you might say I fell off the end and moved into things that are so applied that they’re not strictly mathematics: mathematical modeling, software development, statistics, etc. I call myself a very applied mathematician because I actually apply math and don’t just study areas of math that could potentially be applied.

I appreciate Eugenia Cheng’s enthusiasm even though I don’t share her taste in math. I have long intended to go back and learn a little category theory. It would be great mental exercise precisely because it is so foreign to my way of thinking. Cheng’s interview inspired me to give it one more try.

{ 7 comments }

Gil Kalai’s blog featured a guest post the other day about breastfeeding twins. The post commented in passing that

φ, the golden ratio, is the number hardest to approximate by rationals.

What could this possibly have to do with breastfeeding? The post described a pair of twins with hunger cycles sin(t) and sin(φt), functions that hardly ever come close to synchronizing. The constant φ is difficult to approximate by rational numbers, in a sense that I describe below, and this explains why the two functions are so often out of sync.

graphs of sin(t) and sin(φ t)

In one sense, φ is very easy to approximate by rationals. The ratio of any two consecutive Fibonacci numbers is a rational approximation to φ, the approximation improving as you go further out the Fibonacci sequence. The same is true for any generalized Fibonacci sequence, though the approximation may not be very good until you go out a ways in the sequence, depending on your starting values. So φ is easy to approximate in the sense that it is convenient to think of approximations.

Now let’s look at the sense in which φ is hard to approximate. How accurately can you approximate an irrational number ξ by a rational number a/b? Obviously you can do a better and better job by picking larger and larger fractions. For example, you could always take the first n digits of the decimal expansion of ξ and use that to make a rational approximation with denominator 10n. But that might be very inefficient. How well can you approximate ξ relative to the size of the denominator? This question is answered in general by a theorem of Hurwitz.

Given any irrational number ξ, there exist infinitely many different rational numbers a/b such that

\left| \xi - \frac{a}{b} \right| < \frac{1}{\sqrt{5}\, b^2}

The decimal expansion idea mentioned above is wasteful because the error goes down in proportion to the denominator, while Hurwitz theorem says it’s possible for the error to go down in proportion to the square of the denominator.

Can we do any better than Hurwitz theorem? For example, is it possible to replace √5 with some larger constant? Not in general, and the golden ratio φ provides the counterexample to any would-be refinements of Hurwitz theorem. The constant φ is a worst case. That is the sense in which φ is the number hardest to approximate by rationals. If φ and some related numbers are excluded, then the constant √5 can be increased.

Going back to breastfeeding, suppose the twins had hunger cycles sin(t) and sin(ξt). If ξ is irrational, the two curves will never exactly have the same period. But if ξ were equal to a rational number a/b, then the two functions would have a common period of length bπ if a is even or of length 2bπ if a is odd. If ξ were (approximately) equal to a/b with b small, the feedings would often synchronize. Since φ requires large values of b to make a good rational approximation, ξ = φ is a worst case.

Related posts:

Golden ratio and special angles
Connecting Fibonacci and geometric sequences
Fibonacci numbers at work

{ 2 comments }

Golden ratio and special angles

by John on May 16, 2009

The golden ratio comes up in unexpected places. This post will show how the sines and cosines of some special angles can be expressed in terms of the golden ratio and its complement.

[click to continue...]

{ 3 comments }

Here’s a strange theorem I ran across last week. I’ll state the theorem then give some footnotes.

Suppose f(z) and g(z) are two functions meromorphic in the plane. Suppose also that there are five distinct numbers a1, …, a5 such that the solution sets {z : f(z) = ai} and {z : g(z) = ai} are equal. Then either f(z) and g(z) are equal everywhere or they are both constant.

Notes

A complex function of a complex variable is merom0rphic if it is differentiable except at isolated singularities. The theorem above applies to functions that are (complex) differentiable in the entire plane except at isolated poles.

The theorem is due to Rolf Nevanlinna. There’s a whole branch of complex analysis based on Nevanlinna’s work, but I’d not heard of it until last week. I have no idea why the theorem is true. It doesn’t seem that it should be true; the hypothesis seems far too weak for such a strong conclusion. But that’s par for the course in complex variables.

Update: I edited this post in response to the first comment below to make the theorem statement clearer.

{ 7 comments }

Someone sent me email regarding my online calculator for computing the distance between to locations given their longitude and latitude values. He wants to do sort of the opposite. Starting with the longitude and latitude of one location, he wants to find the longitude and latitude of locations moving north/south or east/west of that location. I like to answer reader questions when I can, so here goes. I’ll give a theoretical derivation followed by some Python code.

Longitude and latitude and are usually measured in degrees, but theoretical calculations are cleaner in radians.  Someone using the Python code below could think in terms of degrees; radians will only be used inside function implementations. We’ll use the fact that on a circle of radius r, an arc of angle θ radians has length rθ. We’ll assume the earth is a perfect sphere. See this post for a discussion of how close the earth is to being a sphere.

Moving North/South

I’ll start with moving north/south since that’s simpler. Let R be the radius of the earth. An arc of angle φ radians on the surface of the earth has length M = Rφ, so an arc M miles long corresponds to an angle of φ = M/R radians. Moving due north or due south does not change longitude.

Moving East/West

Moving east/west is a little more complicated. At the equator, the calculation is just like the calculation above, except that longitude changes rather than latitude. But the distance corresponding to one degree of longitude changes with latitude. For example, one degree of longitude along the Arctic Circle doesn’t take you nearly as far as it does at the equator.

Suppose you’re at latitude φ degrees north of the equator. The circumference of a circle at constant latitude φ, a circle parallel to the equator, is cos φ times smaller than the circumference of the equator. So at latitude φ an angle of θ radians describes an arc of length M = R θ cos φ. A distance M miles east or west corresponds to a change in longitude of θ = M/(R cos φ). Moving due east or due west does not change latitude.

Python code

The derivation above works with angles in radians. Python’s cosine function also works in radians. But longitude and latitude are usually expressed in degrees, so function inputs and outputs are in degrees.

import math

# Distances are measured in miles.
# Longitudes and latitudes are measured in degrees.
# Earth is assumed to be perfectly spherical.

earth_radius = 3960.0
degrees_to_radians = math.pi/180.0
radians_to_degrees = 180.0/math.pi

def change_in_latitude(miles):
    "Given a distance north, return the change in latitude."
    return (miles/earth_radius)*radians_to_degrees

def change_in_longitude(latitude, miles):
    "Given a latitude and a distance west, return the change in longitude."
    # Find the radius of a circle around the earth at given latitude.
    r = earth_radius*math.cos(latitude*degrees_to_radians)
    return (miles/r)*radians_to_degrees

Related posts:

What is the shape of the earth?
Spherical trigonometry
Finding distances using latitude and longitude

{ 8 comments }

51st Carnival of Mathematics posted

by John on April 24, 2009

The long-awaited 51st Carnival of Mathematics is up at squareCircleZ.

{ 0 comments }

Draw a few points on a circle and then draw a straight line from every point to every other point. Count the number of regions created.

2 points, 2 regions.

3 points, 4 regions.

4 points, 8 regions.

5 points, 16 regions.

6 points? See this post from the f(t) blog for the answer.

{ 12 comments }

Metabolism and power laws

by John on April 16, 2009

Bigger animals have more cells than smaller animals. More cells means more cellular metabolism and so more heat produced. How does the amount of heat an animal produces vary with its size? We clearly expect it to go up with size, but does it increase in proportion to volume? Surface area? Something in between?

A first guess would be that metabolism (equivalently, heat produced) goes up in proportion to volume. If cells are all roughly the same size, then number of cells increases proportionately with volume. But heat is dissipated through the surface. Surface area increases in proportion to the square of length but volume increases in proportion to the cube of length. That means the ratio of surface area to volume decreases as overall size increases. The surface area to volume ratio for an elephant is much smaller than it is for a mouse. If an elephant’s metabolism per unit volume were the same as that of a mouse, the elephant’s skin would burn up.

So metabolism cannot be proportional to volume. What about surface area? Here we get into variety and controversy. Many people assume metabolism is proportional to surface area based on the argument above. This idea was first proposed by Max Rubner in 1883. Others emphasize data that supports the theory that suggests metabolism is proportional to surface area.

In the 1930’s, Max Kleiber proposed that metabolism increases according to body mass raised to the power 3/4. (I’ve been a little sloppy here using body mass and volume interchangeably. Body mass is more accurate, though to first approximation animals have uniform density.) If metabolism were proportional to volume, the exponent would be 1. If it were proportional to surface area, the exponent would be 2/3. But Kleiber’s law says it’s somewhere in between, namely 3/4. The image below comes from a paper by Kleiber from 1947.

Kleiber M. (1947). Body size and metabolic rate. Physiological Reviews 27: 511-541.

The graph shows that on a log-log plot, the metabolism rate versus body mass for a large variety of animals has slope approximately 3/4.

So why the exponent 3/4? There is a theoretical explanation called the metabolic scaling theory proposed by Geoffrey West, Brian Enquist, and James Brown. Metabolic scaling theory says that circulatory systems and other networks are fractal-like because this is the most efficient way to serve an animal’s physiological needs. To quote Enquist:

Although living things occupy a three-dimensional space, their internal physiology and anatomy operate as if they were four-dimensional. … Fractal geometry has literally given life an added dimension.

The fractal theory would explain the power law exponent exponent 3/4 simply: it’s the ratio of the volume dimension to the fractal dimension. However, as I suggested earlier, this theory is controversial. Some biologists dispute Kleiber’s law. Others accept Kleiber’s law as an empirical observation but dispute the theoretical explanation of West, Enquist, and Brown.

To read more about metabolism and power laws, see chapter 17 of Complexity: A Guided Tour.

Related posts:

Networks and power laws
Rate of regularizing English verbs

{ 10 comments }

Albrecht Dürer’s art and math

by John on April 10, 2009

Richard Elwes has a fun blog post this morning: Dürer, rhinos, and snowflakes. The post is primarily about the art and mathematics of Albrecht Dürer (1471-1528)  but also includes some related links to recent writings, such as Michael Croucher’s blog post on snowflakes.

{ 3 comments }

Anatomy of a floating point number

by John on April 6, 2009

In my previous post, I explained that floating point numbers are a leaky abstraction. Often you can pretend that they are mathematical real numbers, but sometimes you cannot. This post peels back the abstraction and explains exactly what a floating point number is. (Technically, this post describes an IEEE 754 double precision floating point number, by far the most common kind of floating point number in practice.)

A floating point number has 64 bits that encode a number of the form ± p × 2e. The first bit encodes the sign, 0 for positive numbers and 1 for negative numbers. The next 11 bits encode the exponent e, and the last 52 bits encode the precision p. The encoding of the exponent and precision require some explanation.

The exponent is stored with a bias of 1023. That is, positive and negative exponents are all stored in a single positive number by storing e + 1023 rather than storing e directly. Eleven bits can represent integers from 0 up to 2047. Subtracting the bias, this corresponds to values of e from -1023 to +1024. Define emin = -1022 and emax = +1023. The values emin – 1 and emax + 1 are reserved for special use. More on that below.

Floating point numbers are typically stored in normalized form. In base 10, a number is in normalized scientific notation if the significand is ≥ 1 and < 10. For example, 3.14 × 102 is in normalized form, but 0.314 × 103 and 31.4 × 102 are not. In general, a number in base β is in normalized form if it is of the form p × βe where 1 ≤ p < β. This says that for binary, i.e. β = 2, the first bit of the significand of a normalized number is always 1. Since this bit never changes, it doesn’t need to be stored. Therefore we can express 53 bits of precision in 52 bits of storage. Instead of storing the significand directly, we store f, the fractional part, where the significand is of the form 1.f.

The scheme above does not explain how to store 0. Its impossible to specify values of f and e so that 1.f × 2e = 0. The floating point format makes an exception to the rules stated above. When e = emin – 1 and f = 0, the bits are interpreted as 0. When e = emin – 1 and f ≠ 0, the result is a denormalized number. The bits are interpreted as 0.f × 2emin. In short, the special exponent reserved below emin is used to represent 0 and denormalized floating point numbers.

The special exponent reserved above emax is used to represent ∞ and NaN. If e = emax + 1 and f = 0, the bits are interpreted as ∞. But if e = emax + 1 and f ≠ 0, the bits are interpreted as a NaN or “not a number.” See IEEE floating point exceptions for more information about ∞ and NaN.

Since the largest exponent is 1023 and the largest significant is 1.f where f has 52 ones, the largest floating point number is 21023(2 – 2-52) = 21024 – 2971 ≈ 21024 ≈ 1.8 × 10308. In C, this constant is defined as DBL_MAX, defined in <float.h>.

Since the smallest exponent is -1022, the smallest positive normalized number is 1.0 × 2-1022 ≈ 2.2 × 10-308. In C, this is defined as DBL_MIN. However, it is not the smallest positive number representable as a floating point number, only the smallest normalized floating point number. Smaller numbers can be expressed in denormalized form, albeit at a loss of significance. The smallest denormalized positive number occurs with f has 51 0’s followed by a single 1. This corresponds to 2-52*2-1022 = 2-1074 ≈ 4.9 × 10-324. Attempts to represent any smaller number must underflow to zero.

C gives the name DBL_EPSILON to the smallest positive number ε such that 1 + ε ≠ 1 to machine precision. Since the significant has 52 bits, it’s clear that DBL_EPSILON = 2-52 ≈ 2.2 × 10-16. That is why we say a floating point number has between 15 and 16 significant (decimal) figures.

For more details see What Every Computer Scientist Should Know About Floating-Point Arithmetic.

First post in this series: Floating point numbers are a leaky abstraction

{ 8 comments }

Joel Spolsky coined the term leaky abstraction for programming concepts that usually shield you from messy details but sometimes break down. A perfect abstraction is a black box that you never have to open. A leaky abstraction is a black box that you have to open up occasionally.

Floating point numbers, the computer representations of real numbers,  are leaky abstractions. They work remarkably well: you can usually pretend that a floating point type is a mathematical real number. But sometimes you can’t. The abstraction leaks, though not very often.

Most explanations I’ve heard for the limitations of machine numbers are pedantic. “There are only a finite number of floating point numbers so they can’t represent real numbers well.” That’s not much help. It doesn’t explain why floating point numbers actually do represent real numbers sufficiently well for most applications, and it doesn’t suggest where the abstraction might leak.

A standard floating point number has roughly 16 decimal places of precision and a maximum value on the order of 10308, a 1 followed by 308 zeros. (According to IEEE standard 754, the typical floating point implementation.)

Sixteen decimal places is a lot. Hardly any measured quantity is known to anywhere near that much precision. For example, the constant in Newton’s Law of Gravity is only known to four significant figures. The charge of an electron is known to 11 significant figures, much more precision than Newton’s gravitational constant, but still less than a floating point number. So when are 16 figures not enough? One problem area is subtraction. The other elementary operations — addition, multiplication, division — are very accurate. As long as you don’t overflow or underflow,  these operations often produce results that are correct to the last bit. But subtraction can be anywhere from exact to completely inaccurate.  If two numbers agree to n figures, you can lose up to n figures of precision in their subtraction. This problem can show up unexpectedly in the middle of other calculations. For an example, see this post on calculating standard deviation. See also computing derivatives, the second example in  Five Tips for Floating Point Programming.

What about overflow or underflow? When do you need numbers bigger than 10308? Often you don’t. But in probability calculations, for example, you need them all the time unless you’re clever. It’s common in probability to compute a medium-sized number that is the product of an astronomically large number and an infinitesimally small number. The final result fits into a computer just fine, but the intermediate numbers might not due to overflow or underflow. For example, the maximum floating point number on most computers is somewhere between 170 factorial and 171 factorial. Such large factorials often appear in applications, often in ratios with other large factorials. See Avoiding Overflow, Underflow, and Loss of Precision for tricks on how to work with factorials that would overflow if computed directly.

Often you can afford to be blissfully ignorant of the details of floating point arithmetic, but sometimes you cannot. A great place to learn more is David Goldberg’s paper What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Update: See follow-up post, Anatomy of a floating point number.

Related posts:

How to compute binomial probabilities
NaN, 1.#IND, 1.#INF, and all that

{ 15 comments }

Interpolation errors

by John on April 1, 2009

Say you have a function f(x) and you want to find a polynomial p(x) that agrees with f(x) at several points. Given a set of points x0, x1, x2, … xn you can always find a polynomial of degree n such that p(xi) = f(xi) for i = 0, 1, 2, …, n. It seems reasonable that the more points you pick, the better the interpolating polynomial p(x) will match the function f(x). If the two functions match exactly at a lot of points, they should match well everywhere. Sometimes that’s true and sometimes it’s not.

Here is a famous example due to Carle Runge. Let f(x) = 1/(1 + x2) and let pn be the polynomial that interpolates f(x) at n+1 evenly spaced nodes in the interval [-5, 5]. As n becomes larger, the fit becomes worse.

Here’s a graph of f(x) and p9(x). The smooth blue line is f(x) and the wiggly red line is p9(x).

graph of f(x) and p9(x)

Here’s the analogous graph for p16(x).

graph of f(x) and p16(x)

The fit is improving in the middle. In fact, the curves agree to within the thickness of the plot line from say -1 to 1. But the fit is so bad in the tails that the graph had to be cut off. Here’s another plot of f(x) and p16(x) on a different scale to show how far negative the polynomial dips.

graph of f(x) and p16(x) on different scale

The problem is the spacing of the nodes. Interpolation errors are bad for evenly spaced nodes. If we interpolate f(x) at different points, at the Chebyshev nodes, then the fit is good.

interpolating at Chebyshev nodes

The Chebyshev nodes on [-1, 1] are xi = cos( π i / n ). Here we multiplied these nodes by 5 to scale to the interval [-5, 5].

If the function f(x) is absolutely continuous, as in our example, then the interpolating polynomials converge uniformly when you interpolate at Chebyshev nodes. However, ordinary continuity is not enough. Given any sequence of nodes, there exists a continuous function such that the polynomial interpolation error grows like log(n) as the number of nodes n increases.

Some numerical integration methods are based on interpolating polynomials: fit a polynomial to the integrand, then integrate the polynomial exactly to approximate the original integral. The examples above suggest that increasing the order of such integration methods might not improve accuracy and might even make things worse.

{ 7 comments }