A couple more variations on an ancient theme

I’ve written a couple posts on the approximation

\cos x \approx \frac{\pi^2 - 4x^2}{\pi^2 + x^2}

by the Indian astronomer Aryabhata (476–550). The approximation is accurate for x in [−π/2, π/2].

The first post collected a Twitter thread about the approximation into a post. The second looked at how far the coefficients in Aryabhata’s approximation are from the optimal approximation as a ratio of quadratics.

This post will answer a couple questions. First, what value of π did Aryabhata have and how would that effect the approximation error? Second, how bad would Aryabhata’s approximation be if we used the approximation π² ≈ 10?

Using Aryabhata’s value of π

Aryabhata knew the value 3.1416 for π. We know this because he said that a circle of diameter 20,000 would have circumference  62,832. We don’t know, but it’s plausible that he knew π to more accuracy and rounded it to the implied value.

Substituting 3.1416 for π changes the sixth decimal of the approximation, but the approximation is good to only three decimal places, so 3.1416 is as good as a more accurate approximation as far as the error in approximating cosine is concerned.

Using π² ≈ 10

Substituting 10 for π² in Aryabhata’s approximation gives an approximation that’s convenient to evaluate by hand.

\cos x \approx \frac{10 - 4x^2}{10 + x^2}

It’s very accurate for small values of x but the maximum error increases from 0.00163 to 0.01091. Here’s a plot of the error.

Finding pi in the alphabet

Write the letters of the alphabet around a circle, then strike out the letters that are symmetrical about a vertical line. The remaining letters are grouped in clumps of 3, 1, 4, 1, and 6 letters.

I’ve heard that this observation is due to Martin Gardner, but I don’t have a specific reference.

In case you’re interested, here’s the Python script I wrote to make the image above.

    from numpy import *
    import matplotlib.pyplot as plt
    
    for i in range(26):
        letter = chr(ord('A') + i)
        if letter in "AHIMOTUVWXY":
            latex = r"$\equiv\!\!\!\!\!" + letter + "$"
        else:
            latex = f"${letter}$"
        theta = pi/2 - 2*pi*i/26
        pt = (0.5*cos(theta), 0.5*sin(theta))
        plt.plot(pt[0], pt[1], ' ')
        plt.annotate(latex, pt, fontsize="xx-large")
    plt.axis("off")
    plt.gca().set_aspect("equal")
    plt.savefig("alphabet_pi.png")

Ancient accurate approximation for sine

This post started out as a Twitter thread. The text below is the same as that of the thread after correcting an error in the first part of the thread. I also added a footnote on a theorem the thread alluded to.

***

The following approximation for sin(x) is remarkably accurate for 0 < x < π.

\sin(x) \approx \frac{16x(\pi - x)}{5\pi^2 - 4x(\pi - x)}

The approximation is so good that you can’t see the difference between the exact value and the approximation until you get outside the range of the approximation.

Here’s a plot of just the error.

This is a very old approximation, dating back to Aryabhata I, around 500 AD.

In modern terms, it is a rational approximation, quadratic in the numerator and denominator. It’s not quite optimal since the ripples in the error function are not of equal height [1], but the coefficients are nice numbers.

***

As pointed out in the comments, replacing x with π/2 − x in order to get an approximation for cosine gives a much nicer equation.

\cos x \approx \frac{\pi^2 - 4x^2}{\pi^2 + x^2}

***

[1] The equioscillation theorem says that the optimal approximation will have ripples of equal positive and negative amplitude. This post explores the equioscillation theorem further and finds how far Aryabhata’s is from optimal.

A strange take on the harmonic series

It is well known that the harmonic series

1 + ½ + ⅓ + ¼ + …

diverges. But if you take the denominators as numbers in base 11 or higher, the series converges [1].

I wonder what inspired this observation. Maybe Brewster was bored, teaching yet another cohort of students that the harmonic series diverges, and let his mind wander.

Proof

Let f(n) be the function that takes a positive integer n, writes it in base 10, then reinterprets the result as a number in base b where b > 10. Brewster is saying that the sum of the series 1/f(n) converges.

To see this, note that the first 10 terms are less than or equal to 1. The next 100 terms are less than 1/b. The next 1000 terms are less than 1/b², and so on. This means the series is bounded by the geometric series 10 (10/b)m.

Python

Incidentally, despite being an unusual function, f is very easy to implement in Python:

   def f(n, b): return int(str(n), b)

Citation

Brewster’s note was so brief that I will quote it here in full.

The [harmonic series] is divergent. But if the denominators of the terms are read as numbers in scale 11 or any higher scale, the series is convergent, and the sum is greater than 2.828 and less than 26.29. The convergence is rather slow. I estimate that, to find the last number by direct addition, one would have to work out 1090 terms, to about 93 places of decimals.

[1] G. W. Brewster. An Old Result in a New Dress. The Mathematical Gazette, Vol. 37, No. 322 (Dec., 1953), pp. 269–270.

Rule for converting trig identities into hyperbolic identities

There is a simple rule of thumb for converting between (circular) trig identities and hyperbolic trig identities known as Osborn’s rule: stick an h on the end of trig functions and flip signs wherever two sinh functions are multiplied together.

Examples

For example, the circular identity

sin(θ + φ) = sin(θ) cos(φ) + cos(θ) sin(φ)

becomes the hyperbolic identity

sinh(θ + φ) = sinh(θ) cosh(φ) + cosh(θ) sinh(φ)

but the identity

2 sin(θ) sin(φ) = cos(θ − φ) − cos(θ + φ)

becomes

2 sinh(θ) sinh(φ) = cosh(θ + φ) − cosh(θ − φ)

because there are two sinh terms.

Derivation

Osborn’s rule isn’t deep. It’s a straight-forward application of Euler’s theorem:

exp(iθ) = cos(θ) + i sin(θ).

More specifically, Osborn’s rule follows from two corollaries of Euler’s theorem:

sin(iθ) = i sinh(θ)
cos(iθ) = cosh(θ)

Why bother?

The advantage of Osborn’s rule is that it saves time, and perhaps more importantly, it reduces the likelihood of making a mistake.

You could always derive any identity you need on the spot. All trig identities—circular or hyperbolic—are direct consequences of Euler’s theorem. But it saves time to work at a higher level of abstraction. And as I’ve often said in the context of more efficient computer usage, the advantage of doing things faster is not so much the time directly saved but the decreased probability of losing your train of thought.

Caveats

Osborn’s rule included implicit expressions of sinh, such as in tanh = sinh / cosh. So, for example, the circular identity

tan(2θ) = 2 tan(θ) / (1 − tan²(θ))

becomes

tanh(2θ) = 2 tanh(θ) / (1 + tanh²(θ))

because the tanh² term implicitly contains two sinh terms.

Original note

Osborn’s original note [1] from 1902 is so short that I include the entire text below:

Related posts

[1] G. Osborn. Mnemonic for Hyperbolic Formulae. The Mathematical Gazette, Vol. 2, No. 34 (Jul., 1902), p. 189

Binomial coefficients with non-integer arguments

When n and r are positive integers, with nr, there is an intuitive interpretation of the binomial coefficient C(n, r), namely the number of ways to select r things from a set of n things. For this reason C(n, r) is usually pronounced “n choose r.”

But what might something like C(4.3, 2)? The number of ways to choose two giraffes out of a set of 4.3 giraffes?! There is no combinatorial interpretation for binomial coefficients like these, though they regularly come up in applications.

It is possible to define binomial coefficients when n and r are real or even complex numbers. These more general binomial coefficients are in this liminal zone of topics that come up regularly, but not so regularly that they’re widely known. I wrote an article about this a decade ago, and I’ve had numerous occasions to link to it ever since.

The previous post implicitly includes an application of general binomial coefficients. The post alludes to coefficients that come up in Bessel’s interpolation formula but doesn’t explicitly say what they are. These coefficients Bk can be defined in terms of the Gaussian interpolation coefficients, which are in turn defined by binomial coefficients with non-integer arguments.

\begin{eqnarray*} G_{2n} &=& {p + n - 1 \choose 2n} \\ G_{2n+1} &=& {p + n \choose 2n + 1} \\ B_{2n} &=& \frac{1}{2}G_{2n} \\ B_{2n+1} &=& G_{2n+1} - \frac{1}{2} G_{2n} \end{eqnarray*}

Note that 0 < p < 1.

The coefficients in Everett’s interpolation formula can also be expressed simply in terms of the Gauss coefficients.

\begin{eqnarray*} E_{2n} &=& G_{2n} - G_{2n+1} \\ F_{2n} &=& G_{2n+1} \\ \end{eqnarray*}

Math’s base 32 versus Linux’s base 32

The convention in math for writing numbers in bases larger than 10 is to insert capital letters after 9, starting with A. So, for example, the digits in base 12 are 0, 1, 2, …, 9, A, and B.

So if you’re familiar with math but not Linux, and you run across the base32 utility, you might naturally assume that the command converts numbers to base 32 using the symbols 0, 1, 2, &hellip, 9, A, B, C, …, V. That’s a reasonable guess, but it actually uses the symbols A, B, C, …, Z, 2, 3, 4, 5, 6, and 7. It’s all described in RFC 3548.

What’s going on? The purpose of base 32 encoding is to render binary data in a way that is human readable and capable of being processed by software that was originally written with human readable input in mind. The purpose is not to carry out mathematical operations.

Note that the digit 0 is not used, because it’s visually similar to the letter O. The digit 1 is also not used, perhaps because it looks like a lowercase l in some fonts.

Related posts

Calculating when a planet will appear to move backwards

When we say that the planets in our solar system orbit the sun, not the earth, we mean that the motions of the planets is much simpler to describe from the vantage point of the sun. The sun is no more the center of the universe than the earth is. Describing the motion of the planets from the perspective of our planet is not wrong, but it is inconvenient. (For some purposes. It’s quite convenient for others.)

The word planet means “wanderer.” This because the planets appear to wander in the night sky. Unlike stars that appear to orbit the earth in a circle as the earth orbits the sun, planets appear to occasionally reverse course. When planets appear to move backward this is called retrograde motion.

Apparent motion of Venus

Here’s what the motion of Venus would look like over a period of 8 years as explored here.

Venus from Earth

Venus completes 13 orbits around the sun in the time it takes Earth to complete 8 orbits. The ratio isn’t exactly 13 to 8, but it’s very close. Five times over the course of eight years Venus will appear to reverse course for a few days. How many days? We will get to that shortly.

When we speak of the motion of the planets through the night sky, we’re not talking about their rising and setting each day due to the rotation of the earth on its axis. We’re talking about their motion from night to night. The image above is how an observer far above the Earth and not rotating with the Earth would see the position of Venus over the course of eight years.

The orbit of Venus as seen from earth is beautiful but complicated. From the Copernican perspective, the orbits of Earth and Venus are simply concentric circles. You may bristle at my saying planets have circular rather than elliptical orbits [1]. The orbits are not exactly circles, but are so close to circular that you cannot see the difference. For the purposes of this post, we’ll assume planets orbit the sun in circles.

Calculating retrograde periods

There is a surprisingly simple equation [2] for finding the points where a planet will appear to change course:

\cos(kt) = \frac{\sqrt{Rr}}{R + r - \sqrt{Rr}}

Here r is the radius of Earth’s orbit and R is the radius of the other planet’s orbit [3]. The constant k is the difference in angular velocities of the two planets. You can solve this equation for the times when the apparent motion changes.

Note that the equation is entirely symmetric in r and R. So Venusian observing Earth and an Earthling observing Venus would agree on the times when the apparent motions of the two planets reverse.

Example calculation

Let’s find when Venus enters and leaves retrograde motion. Here are the constants we need.

r = 1       # AU
R = 0.72332 # AU

venus_year = 224.70 # days
earth_year = 365.24 # days
k = 2*pi/venus_year - 2*pi/earth_year

c = sqrt(r*R) / (r + R - sqrt(r*R))

With these constants we can now plot cos(kt) and see when it equals c.

This shows there are five times over the course of eight years when Venus is in apparent retrograde motion.

If we set time t = 0 to be a time when Earth and Venus are aligned, we start in the middle of retrograde period. Venus enters prograde motion 21 days later, and the next retrograde period begins at day 563. So out of every 584 days, Venus spends 42 days in retrograde motion and 542 days in prograde motion.

Related posts

[1] Planets do not exactly orbit in circles. They don’t exactly orbit in ellipses either. Modeling orbits as ellipses is much more accurate than modeling orbits as circles, but not still not perfectly accurate.

[2] 100 Great Problems of Elementary Mathematics: Their History and Solution. By Heinrich Dörrie. Dover, 1965.

[3] There’s nothing unique about observing planets from Earth. Here “Earth” simply means the planet you’re viewing from.

A “well-known” series

I was reading an article [1] that refers to “a well-known trigonometric series” that I’d never seen before. This paper cites [2] which gives the series as

\begin{align*} \frac{\sin m\phi}{\cos \phi} &= m\sin\phi - \frac{m(m^2-2^2)}{3!}\sin^3\phi \\ &\phantom{=} \;+ \frac{m(m^2-2^2)(m^2 - 4^2)}{5!}\sin^5\phi - \cdots \end{align*}

Note that the right hand side is not a series in φ but rather in sin φ.

Motivation

Why might you know sin φ and want to calculate sin mφ / cos φ? This doesn’t seem like a sufficiently common task for the series to be well-known. The references are over a century old, and maybe the series were useful in hand calculations in a way that isn’t necessary anymore.

However, [1] was using the series for a theoretical derivation, not for calculation; the author was doing some hand-wavy derivation, sticking the difference operator E into a series as if it were a number, a technique known as “umbral calculus.” The name comes from the Latin word umbra for shadow. The name referred to the “shadowy” nature of the technique which wasn’t made rigorous until much later.

Convergence

The series above terminates if m is an even integer. But there are no restrictions on m, and in general the series is infinite.

The series obviously has trouble if cos φ = 0, i.e. when φ = ±π/2, but it converges for all m if −π/2 < φ < π/2.

Tangent

If m = 1, sin mφ / cos φ is simply tan φ. The function tan φ has a complicated power series in φ involving Bernoulli numbers, but it has a simpler power series in sin φ.

References

[1] G. J. Lidstone. Notes on Everett’s Interpolation Formula. 1922

[2] E. W. Hobson. A Treatise on Plane Trigonometry. Fourth Edition, 1918. Page 276.

Approximation by prime powers

The well-known Weierstrass approximation theorem says that polynomials are dense in C [0, 1]. That is, given any continuous function f on the unit interval, and any ε > 0, you can find a polynomial P such that f and P are never more than ε apart.

This means that linear combinations of the polynomials

1, x, x², x³, …

are dense in C [0, 1].

Do you need all these powers of x? Could you approximate any continuous function arbitrarily well if you left out one of these powers, say x7? Yes, you could.

You cannot omit the constant polynomial 1, but you can leave out any other power of x. In fact, you can leave out a lot of powers of x, as long as the sequence of exponents doesn’t thin out too quickly.

Müntz approximation theorem

Herman Müntz proved in 1914 that a necessary and sufficient pair of conditions on the exponents of x is that the first exponent is 0 and that the sum of the reciprocals of the exponents diverges.

In other words, the sequence of powers of x

xλ0, xλ1, xλ2, …

with

λ0 < λ1 < λ2

is dense in C [0, 1] if and only if λ0 = 0 and

1/λ1 + 1/λ2 + 1/λ3 + … = ∞

Prime power example

Euler proved in 1737 that the sum of the reciprocals of the primes diverges, so the sequence

1, x2, x3, x5, x7, x11, …

is dense in C [0, 1]. We can find a polynomial as close as we like to any particular continuous function if we combine enough prime powers.

Let’s see how well we can approximate |x − ½| using prime exponents up to 11.

The polynomial above is

0.4605 − 5.233 x2 + 7.211 x3 + 0.9295 x5 − 4.4646 x7 + 1.614 x11.

This polynomial is not the best possible uniform approximation: it’s the least squares approximation. That is, it minimizes the 2-norm and not the ∞-norm. That’s because it’s convenient to do a least squares fit in Python using scipy.optimize.curve_fit.

Incidentally, the Müntz approximation theorem holds for the 2-norm as well.

Related posts