Drawing with a compass on a globe

Take a compass and draw a circle on a globe. Then take the same compass, opened to the same width, and draw a circle on a flat piece of paper. Which circle has more area?

If the circle is small compared to the radius of the globe, then the two circles will be approximately equal because a small area on a globe is approximately flat.

To get an idea what happens for larger circles, let’s a circle on the globe as large as possible, i.e. the equator. If the globe has radius r, then to draw the equator we need our compass to be opened a width of √2 r, the distance from the north pole to the equator along a straight line cutting through the globe.

The area of a hemisphere is 2πr². If we take our compass and draw a circle of radius √2 r on a flat surface we also get an area of 2πr². And by continuity we should expect that if we draw a circle that is nearly as big as the equator then the corresponding circle on a flat surface should have approximately the same area.

Interesting. This says that our compass will draw a circle with the same area whether on a globe or on a flat surface, at least approximately, if the width of the compass sufficiently small or sufficiently large. In fact, we get exactly the same area, regardless of how wide the compass is opened up. We haven’t proven this, only given a plausibility argument, but you can find a proof in [1].

Note that the width w of the compass is the radius of the circle drawn on a flat surface, but it is not the radius of the circle drawn on the globe. The width w is greater than the radius of the circle, but less than the distance along the sphere from the center of the circle. In the case of the equator, the radius of the circle is r, the width of the compass is √2 r , and the distance along the sphere from the north pole to the equator is πr/2.

Related posts

[1] Nick Lord. On an alternative formula for the area of a spherical cap. The Mathematical Gazette, Vol. 102, No. 554 (July 2018), pp. 314–316

The negative binomial distribution and Pascal’s triangle

The Poisson probability distribution gives a simple, elegant model for count data. You can even derive from certain assumptions that data must have a Poisson distribution. Unfortunately reality doesn’t often go along with those assumptions.

A Poisson random variable with mean λ also has variance λ. But it’s often the case that data that would seem to follow a Poisson distribution has a variance greater than its mean. This phenomenon is called over-dispersion: the dispersion (variance) is larger than a Poisson distribution assumption would allow.

One way to address over-dispersion is to use a negative binomial distribution. This distribution has two parameters, r and p, and has the following probability mass function (PMF).

P(X = x) = \binom{r + x - 1}{x} p^r(1-p)^x

As the parameter r goes to infinity, the negative binomial distribution converges to a Poisson distribution. So you can think of the negative binomial distribution as a generalization of the Poisson distribution.

These notes go into the negative binomial distribution in some detail, including where its name comes from.

If the parameter r is a non-negative integer, then the binomial coefficients in the PMF for the negative binomial distribution are on the (r+1)st diagonal of Pascal’s triangle.

Pascal's triangle

The case r = 0 corresponds to the first diagonal, the one consisting of all 1s. The case r = 1 corresponds to the second diagonal consisting of consecutive integers. The case r = 2 corresponds to the third diagonal, the one consisting of triangular numbers. And so forth.

Related posts

A strange take on the harmonic series

It is well known that the harmonic series

1 + ½ + ⅓ + ¼ + …

diverges. But if you take the denominators as numbers in base 11 or higher, the series converges [1].

I wonder what inspired this observation. Maybe Brewster was bored, teaching yet another cohort of students that the harmonic series diverges, and let his mind wander.

Proof

Let f(n) be the function that takes a positive integer n, writes it in base 10, then reinterprets the result as a number in base b where b > 10. Brewster is saying that the sum of the series 1/f(n) converges.

To see this, note that the first 10 terms are less than or equal to 1. The next 100 terms are less than 1/b. The next 1000 terms are less than 1/b², and so on. This means the series is bounded by the geometric series 10 (10/b)m.

Python

Incidentally, despite being an unusual function, f is very easy to implement in Python:

   def f(n, b): return int(str(n), b)

Citation

Brewster’s note was so brief that I will quote it here in full.

The [harmonic series] is divergent. But if the denominators of the terms are read as numbers in scale 11 or any higher scale, the series is convergent, and the sum is greater than 2.828 and less than 26.29. The convergence is rather slow. I estimate that, to find the last number by direct addition, one would have to work out 1090 terms, to about 93 places of decimals.

[1] G. W. Brewster. An Old Result in a New Dress. The Mathematical Gazette, Vol. 37, No. 322 (Dec., 1953), pp. 269–270.

Variance matters more than mean in the extremes

Suppose you have two normal random variables, X and Y, and that the variance of X is less than the variance of Y.

Let M be an equal mixture of X and Y. That is, to sample from M, you first chose X or Y with equal probability, then you choose a sample from the random variable you chose.

Now suppose you’ve observed an extreme value of M. Then it is more likely the that the value came from Y. The means of X and Y don’t matter, other than determining the cutoff for what “extreme” means.

High-level math

To state things more precisely, there is some value t such that the posterior probability that a sample m from M came from Y, given that |m| > t, is greater than the posterior probability that m came from X.

Let’s just look at the right-hand tails, even though the principle applies to both tails. If X and Y have the variance, but the mean of X is greater, then larger values of Z are more likely to have come from X. Now suppose the variance of Y is larger. As you go further out in the right tail of M, the posterior probability of an extreme value having come from Y increases, and eventually it surpasses the posterior probability of the sample having come from Y. If X has a larger mean than Y that will delay the point at which the posterior probability of Y passes the posterior probability of X, but eventually variance matters more than mean.

Detailed math

Let’s give a name to the random variable that determines whether we choose X or Y. Let’s call it C for coin flip, and assume C takes on 0 and 1 each with probability 1/2. If C = 0 we sample from X and if C = 1 we sample from Y. We want to compute the probability P(C = 1 | Mt).

Without loss of generality we can assume X has mean 0 and variance 1. (Otherwise transform X and Y by subtracting off the mean of X then divide by the standard deviation of X.) Denote the mean of Y by μ and the standard deviation by σ.

From Bayes’ theorem we have

\text{P}(C = 1 \mid M \geq t) = \frac{ \text{P}(Y \geq t) }{ \text{P}(X \geq t) + \text{P}(Y \geq t) } = \frac{\Phi^c\left(\frac{t-\mu}{\sigma}\right)}{\Phi^c(t) + \Phi^c\left(\frac{t-\mu}{\sigma}\right)}

where Φc(t) = P(Zt) for a standard normal random variable.

Similarly, to compute P(C = 1 | Mt) just flip the direction of the inequality signs replace Φc(t) = P(Zt) with Φ(t) = P(Zt).

The calculation for P(C = 1  |  |M| ≥ t) is similar

Example

Suppose Y has mean −2 and variance 10. The blue curve shows that a large negative sample from M very likely comes from Y and the orange line shows that large positive sample very likely comes from Y as well.

The dip in the orange curve shows the transition zone where Y‘s advantage due to a larger mean gives way to the disadvantage of a smaller variance. This illustrates that the posterior probability of Y increases eventually but not necessarily monotonically.

Here’s a plot showing the probability of a sample having come from Y depending on its absolute value.

Related posts

Increasing speed due to friction

Orbital mechanics is fascinating. I’ve learned a bit about it for fun, not for profit. I seriously doubt Elon Musk will ever call asking me to design an orbit for him. [1]

One of the things that makes orbital mechanics interesting is that it can be counter-intuitive. For example, atmospheric friction can make a satellite move faster. How can this be? Doesn’t friction always slow things down?

Friction does reduce a satellite’s tangential velocity, causing it to move into a lower orbit, which increases its velocity. It’s weird to think about, but the details are worked out in [2].

Note the date on the article: May 1958. The paper was written in response to Sputnik 1 which launched in October 1957. Parkyn’s described the phenomenon of acceleration due to friction in general, and how it applied to Sputnik in particular.

Related posts

[1] I had a lead on a project with NASA once, but it wasn’t orbital mechanics, and the lead didn’t materialize.

[2] D. G. Parkyn. The Effect of Friction on Elliptic Orbits. The Mathematical Gazette. Vol. 42, No. 340 (May, 1958), pp. 96-98

Ptolemy’s theorem

Draw a quadrilateral by pick four arbitrary points on a circle and connecting them cyclically.

inscribed quadrilateral

Now multiply the lengths of the pairs of opposite sides. In the diagram below this means multiplying the lengths of the two horizontal-ish blue sides and the two vertical-ish orange sides.

quadrilateral with opposite sides colored

Ptolemy’s theorem says that the sum of the two products described above equals the product of the diagonals.

inscribed quadrilateral with diagonals

To put it in colorful terms, the product of the blue sides plus the product of the orange sides equals the product of the green diagonals.

The converse of Ptolemy’s theorem also holds. If the relationship above holds for a quadrilateral, then the quadrilateral can be inscribed in a circle.

Note that if the quadrilateral in Ptolemy’s theorem is a rectangle, then the theorem reduces to the Pythagorean theorem.

Related posts

Rule for converting trig identities into hyperbolic identities

There is a simple rule of thumb for converting between (circular) trig identities and hyperbolic trig identities known as Osborn’s rule: stick an h on the end of trig functions and flip signs wherever two sinh functions are multiplied together.

Examples

For example, the circular identity

sin(θ + φ) = sin(θ) cos(φ) + cos(θ) sin(φ)

becomes the hyperbolic identity

sinh(θ + φ) = sinh(θ) cosh(φ) + cosh(θ) sinh(φ)

but the identity

2 sin(θ) sin(φ) = cos(θ − φ) − cos(θ + φ)

becomes

2 sinh(θ) sinh(φ) = cosh(θ + φ) − cosh(θ − φ)

because there are two sinh terms.

Derivation

Osborn’s rule isn’t deep. It’s a straight-forward application of Euler’s theorem:

exp(iθ) = cos(θ) + i sin(θ).

More specifically, Osborn’s rule follows from two corollaries of Euler’s theorem:

sin(iθ) = i sinh(θ)
cos(iθ) = cosh(θ)

Why bother?

The advantage of Osborn’s rule is that it saves time, and perhaps more importantly, it reduces the likelihood of making a mistake.

You could always derive any identity you need on the spot. All trig identities—circular or hyperbolic—are direct consequences of Euler’s theorem. But it saves time to work at a higher level of abstraction. And as I’ve often said in the context of more efficient computer usage, the advantage of doing things faster is not so much the time directly saved but the decreased probability of losing your train of thought.

Caveats

Osborn’s rule included implicit expressions of sinh, such as in tanh = sinh / cosh. So, for example, the circular identity

tan(2θ) = 2 tan(θ) / (1 − tan²(θ))

becomes

tanh(2θ) = 2 tanh(θ) / (1 + tanh²(θ))

because the tanh² term implicitly contains two sinh terms.

Original note

Osborn’s original note [1] from 1902 is so short that I include the entire text below:

Related posts

[1] G. Osborn. Mnemonic for Hyperbolic Formulae. The Mathematical Gazette, Vol. 2, No. 34 (Jul., 1902), p. 189

Interpolation and the cotanc function

This weekend I wrote three posts related to interpolation:

The first post looks at reducing the size of mathematical tables by switching for linear to quadratic interpolation. The immediate application is obsolete, but the principles apply to contemporary problems.

The second post looks at alternatives to Lagrange interpolation that are much better suited to hand calculation. The final post is a tangent off the middle post.

Tau and sigma functions

In the process of writing the posts above, I looked at Chambers Six Figure Mathematical Tables from 1964. There I saw a couple curious functions I hadn’t run into before, functions the author called τ and σ.

τ(x) = x cot x
σ(x) = x csc x

So why introduce these two functions? The cotangent and cosecant functions have a singularity at 0, and so its difficult to tabulate and interpolate these functions. I touched on something similar in my recent post on interpolating the gamma function: because the function grows rapidly, linear interpolation gives bad results. Interpolating the log of the gamma function gives much better results.

Chambers tabulates τ(x) and σ(x) because these functions are easy to interpolate.

The cotanc function

I’ll refer to Chambers’ τ function as the cotanc function. This is a whimsical choice, not a name used anywhere else as far as I know. The reason for the name is as follows. The sinc function

sinc(x) = sin(x)/x

comes up frequently in signal processing, largely because it’s the Fourier transform of the indicator function of an interval. There are a few other functions that tack a c onto the end of a function to indicate it has been divided by x, such as the jinc function.

The function tan(x)/x is sometimes called the tanc function, though this name is far less common than sinc. The cotangent function is the reciprocal of the tangent function, so I’m calling the reciprocal of the tanc function the cotanc function. Maybe it should be called the cotank function just for fun. For category theorists this brings up images of a tank that fires backward.

Practicality of the cotanc function

As noted above, the cotangent function is ill-behaved near 0, but the cotanc function is very nicely behaved near 0. The cotanc function has singularities at non-zero multiples of π but multiplying by x removes the singularity at 0.

As noted here, interpolation error depends on the size of the derivatives of the function being interpolated. Since the cotanc function is flat and smooth, it has small derivatives and thus small interpolation error.

Binomial coefficients with non-integer arguments

When n and r are positive integers, with nr, there is an intuitive interpretation of the binomial coefficient C(n, r), namely the number of ways to select r things from a set of n things. For this reason C(n, r) is usually pronounced “n choose r.”

But what might something like C(4.3, 2)? The number of ways to choose two giraffes out of a set of 4.3 giraffes?! There is no combinatorial interpretation for binomial coefficients like these, though they regularly come up in applications.

It is possible to define binomial coefficients when n and r are real or even complex numbers. These more general binomial coefficients are in this liminal zone of topics that come up regularly, but not so regularly that they’re widely known. I wrote an article about this a decade ago, and I’ve had numerous occasions to link to it ever since.

The previous post implicitly includes an application of general binomial coefficients. The post alludes to coefficients that come up in Bessel’s interpolation formula but doesn’t explicitly say what they are. These coefficients Bk can be defined in terms of the Gaussian interpolation coefficients, which are in turn defined by binomial coefficients with non-integer arguments.

\begin{eqnarray*} G_{2n} &=& {p + n - 1 \choose 2n} \\ G_{2n+1} &=& {p + n \choose 2n + 1} \\ B_{2n} &=& \frac{1}{2}G_{2n} \\ B_{2n+1} &=& G_{2n+1} - \frac{1}{2} G_{2n} \end{eqnarray*}

Note that 0 < p < 1.

The coefficients in Everett’s interpolation formula can also be expressed simply in terms of the Gauss coefficients.

\begin{eqnarray*} E_{2n} &=& G_{2n} - G_{2n+1} \\ F_{2n} &=& G_{2n+1} \\ \end{eqnarray*}

Bessel, Everett, and Lagrange interpolation

I never heard of Bessel or Everett interpolation until long after college. I saw Lagrange interpolation several times. Why Lagrange and not Bessel or Everett?

First of all, Bessel interpolation and Everett interpolation are not different kinds of interpolation; they are different algorithms for carrying out the same interpolation as Lagrange. There is a unique polynomial of degree n fitting a function at n + 1 points, and all three methods evaluate this same polynomial.

The advantages of Bessel’s approach or Everett’s approach to interpolation are practical, not theoretical. In particular, these algorithms are practical when interpolating functions from tables, by hand. This was a lost art by the time I went to college. Presumably some of the older faculty had spent hours computing with tables, but they never said anything about it.

Bessel interpolation and Everett interpolation are minor variations on the same theme. This post will describe both at such a high level that there’s no difference between them. This post is high-level because that’s exactly what seems to be missing in the literature. You can easily find older books that go into great detail, but I believe you’ll have a harder time finding a survey presentation like this.

Suppose you have a function f(x) that you want to evaluate at some value p. Without loss of generality we can assume our function has been shifted and scaled so that we have tabulated f at integers our value p lies between 0 and 1.

Everett’s formula (and Bessel’s) cleanly separates the parts of the interpolation formula that depend on f and the parts that depend on p.

The values of the function f enter through the differences of the values of f at consecutive integers, and differences of these differences, etc. These differences are easy to calculate by hand, and sometimes were provided by the table publisher.

The functions of p are independent of the function being interpolated, so these functions, the coefficients in Bessel’s formula and Everett’s formula, could be tabulated as well.

If the function differences are tabulated, and the functions that depend on p are tabulated, you could apply polynomial interpolation without ever having to explicitly evaluate a polynomial. You’d just need to evaluate a sort of dot product, a sum of numbers that depend on f multiplied by numbers that depend on p.

Another advantage of Bessel’s and Everett’s interpolation formulas is that they can be seen as a sequence of refinements. First you obtain the linear interpolation result, then refine it to get the quadratic interpolation result, then add terms to get the cubic interpolation result, etc.

This has several advantages. First, you have the satisfaction of seeing progress along the way; Lagrange interpolation may give you nothing useful until you’re done. Related to this is a kind of error checking: you have a sense of where the calculations are going, and intermediate results that violate your expectations are likely to be errors. Finally, you can carry out the calculations for the smallest terms in with less precision. You can use fewer and fewer digits in your hand calculations as you compute successive refinements to your result.

Related posts