Cycle of New Year’s Days

Here’s a visualization of how the day of the week for New Year’s Day changes.

The green diamonds represent leap years and the blue squares represent ordinary years.

The day of the week for New Year’s Day advances one day after each ordinary year and two days after each leap year, hence the diagonal stripes in the graph above.

The whole cycle repeats every 28 years. During that 28 year cycle, New Year’s Day falls on each day of the week four times: three times in an ordinary year and once in a leap year. Or to put it another way, each horizontal row of the graph above contains three blue squares and one green diamond.

The comments above are true under the Julian calendar, without exception. And they’re true for long stretches of time under the Gregorian calendar. For example, the pattern above repeats from 1901 to 2099.

The Julian calendar had a leap day every four years, period. This made the calendar year longer than the solar year by about 3 days every 400 years, so the Gregorian calendar removed 3 leap days. A year divisible by 100 is not a leap year unless it is also divisible by 400. So the Gregorian calendar disrupts the pattern above every 100 years.

Related posts

Interval arithmetic and fixed points

A couple days ago I analyzed the observation that repeatedly pressing the cosine key on a calculator leads to a fixed point. After about 90 iterations the number no longer changes. This post will analyze the same phenomenon a different way.

Interval arithmetic

Interval arithmetic is a way to get exact results of a sort from floating point arithmetic.

Suppose you start with a number x that cannot be represented exactly as a floating point number, and you want to compute f(x) for some function f. You can’t represent x exactly, but unless x is too large you can represent a pair of numbers a and b such that x is certainly in the interval [a, b]. Then f(x) is in the set f( [a, b] ).

Maybe you can represent f( [a, b] ) exactly. If not, you can enlarge the interval a bit to exactly represent an interval that contains f(x). After applying several calculations, you have an interval, hopefully one that’s not too big, containing the exact result.

(I said above that interval arithmetic gives you exact results of a sort because even though you don’t generally get an exact number at the end, you do get an exact interval containing the result.)

Cosine iteration

In this post we will use interval arithmetic, not to compensate for the limitations of computer arithmetic, but to illustrate the convergence of iterated cosines.

The cosine of any real number lies in the interval [−1, 1]. To put it another way,

cos( [−∞, ∞] ) = [−1, 1].

Because cosine is an even function,

cos( [−1, 1] ) = cos( [0, 1] )

and so we can limit our attention to the interval [0, 1].

Now the cosine is a monotone decreasing function from 0 to π, and so it’s monotone on [0, 1]. For any two points with 0 ≤ ab ≤ π we have

cos( [a, b] ) = [cos(b), cos(a)].

Note that the order of a and b reverses on the right hand side of the equation because cosine is decreasing. When we apply cosine again we get back the original order.

cos(cos( [a, b] )) = [cos(cos(a)), cos(cos(b))].

Incidentally, this flip-flop explains why the cobweb plot from the previous post looks like a spiral rather than a staircase.

Now define a0 = 0, b0 = 1, and

[an+1, bn+1] = cos( [an, bn] ) = [cos(bn), cos(an)].

We could implement this in Python with a pair of mutually recursive functions.

    a = lambda n: 0 if n == 0 else cos(b(n-1))
    b = lambda n: 1 if n == 0 else cos(a(n-1))

Here’s a plot of the image of [0, 1] after n iterations.

Note that odd iterations increase the lower bound and even iterations decrease the upper bound.

Numerical interval arithmetic

This post introduced interval arithmetic as a numerical technique, then proceeded to do pure math. Now let’s think about computing again.

The image of [0, 1] under cosine is [cos(1), cos(0)] = [cos(1), 1]. A computer can represent 1 exactly but not cos(1). Suppose we compute

cos(1) = 0.5403023058681398

and assume each digit in the result is correct. Maybe the exact value of cos(1) was slightly smaller and was rounded to this value, but we know for sure that

cos( [0, 1] ) ⊂ [0.5403023058681397, 1]

So in this case we don’t know the image of [0, 1], but we know an interval that contains the image, hence the subset symbol.

We could iterate this process, next computing an interval that contains

cos( [0.5403023058681397, 1] )

and so forth. At each step we would round the left endpoint down to the nearest representable lower bound and round the right endpoint up to the nearest representable upper bound. In practice we’d be concerned with machine representable numbers rather than decimal representable numbers, but the principle is the same.

The potential pitfall of interval arithmetic in practice is that intervals may grow so large that the final result is not useful. But that’s not the case here. The rounding error at each step is tiny, and contraction maps reduce errors at each step rather than magnifying them. In a more complicated calculation, we might have to resort to lose estimates and not have such tight intervals at each step.

Related posts

Pressing the cosine key over and over

No matter what number you start with, if you press the cos key on a calculator repeatedly, the numbers eventually quit changing. This fact has been rediscovered by countless children playing with calculators.

If you start with 0, which is likely the default when you turn on a calculator, you’ll hit the final value after 90 steps [1]. You could verify this with the following Python code.

    from math import cos

    x = 0
    for i in range(100):
        x = cos(x)
        print(i, x) 

Starting with iteration 90, the code prints 0.7390851332151607 every time.

Visualizing convergence

Let’s visualize the sequence of values using a cobweb plot. Here’s an image I made for a post four years ago, starting with x = 1.

cobweb plot for iterating cosine

To read the graph, start at x = 1 on the horizontal axis. The solid black line is a plot of the cosine function, and so the point above 1 where the blue line starts is y = cos(1) = 0.54.

Since we’re going to stick our output back into cosine as input, we need to turn our y into an x. We do this by sliding the point over to the dotted line representing y = x. This creates the horizontal blue line that is the first segment of our spiral. This takes us to the point (cos(1), cos(1)).

Now when we take the cosine again, we get cos(cos(1)) = 0.86. Now we move from our previous value of y = 0.54 to our new value of y = 0.86. This gives the next segment of our spiral. Again we need to turn our y into an x, so we slide over to the line y = x as before, only this time we’re approaching from the left side rather than from the right.

We quickly get to where we can no longer see the convergence. The plot above used 20 iterations, and we end up with a blue blob around the point of convergence.

Proving convergence

We said that the iteration converges for any starting point. Why is that?

For one thing, we might as well assume x is between 0 and 1; if not, it will be after one iteration.

The mean value theorem says for any pair x1 and x2 in [0, 1],

cos(x1) − cos(x2) = − sin(c) (x1x2)

for some c in [0, 1] because the derivative of cos(x) is − sin(x). It follows that

| cos(x1) − cos(x2) | ≤ sin(1) | x1x2 |

because the maximum value of sin(x) on [0, 1] is sin(1). Since sin(1) = 0.84… is less than 1, this says that cosine is a contraction mapping on [0, 1] and so there is a fixed point p such that cos(p) = p.

Rate of convergence

We could use this to calculate an upper bound on how far x is from the fixed point p after k iterations:

| xp | ≤ sin(1)k

and so if we want to be within a distance ε of the fixed point, we need

k ≥ log(ε) / log(sin(1)).

This says that to get within floating point precision (about 10−16) of the fixed point, 214 iterations will do. This is true, but it’s a pessimistic estimate because it’s based on a pessimistic estimate of sin(c).

Once we get close to the fixed point p, the rate of convergence is more like sin(p)k than sin(1)k. This suggests we should be within floating point precision of the fixed point after about 90 iterations, which is what we saw at the top of the post.

More fixed point posts

[1] This is true if your calculator is using 64-bit floating point numbers. Your mileage may vary if you have a calculator that retains more or less precision.

Gregorian Calendar and Number Theory

The time it takes for the earth to orbit the sun is not an integer multiple of the time it takes for the earth to rotate on its axis, nor is it a rational number with a small denominator. Why should it be? Much of the complexity of our calendar can be explained by rational approximations to an irrational number.

Rational approximation

The ratio is of course approximately 365. A better approximation is 365.25, but that’s not right either. A still better approximation would be 365.2422.

A slightly less accurate, but more convenient, approximation is 365.2425. Why is that more convenient? Because 0.2425 = 97/400, and 400 is a convenient number to work with.

A calendar based on a year consisting of an average of 365.2425 days would have a 365 days most years, with 97 out of 400 years having 366 days.

In order to spread 97 longer years among the cycle of 400 years, you could insert an extra day every four years, but make three exceptions, such as years that are divisible by 100 but not by 400. That’s the Gregorian calendar that we use.

It’s predecessor, the Julian calendar, had an average year of 365.25 days, which was good enough for a while, but the errors began to accumulate to the point that the seasons were drifting noticeably with respect to the calendar.

Not much room for improvement

It would be possible to create a calendar with an even more accurate average year length, but at the cost of more complexity. Even so, such a calendar wouldn’t be much more accurate. After all, even the number we’ve been trying to approximate, 365.2422 isn’t entirely accurate.

The ratio of the time of the earth’s orbit to the time of its rotation isn’t even entirely constant. The Gregorian calendar is off by about 1 day in 3030 years, but the length of the year varies by about 1 day in 7700 years.

I don’t know how accurately the length of the solar year was known when the Gregorian calendar was designed over four centuries ago. Maybe the error in the calendar was less than the uncertainty in the length of the solar year.

Days of the week

Four centuries of the Gregorian calendar contain 146097 days, which is a multiple of 7. This seems to be a happy coincidence. There was no discussion of weeks in derivation above.

The implicit optimization criteria in the design of the calendar were minimizing the discrepancy between the lengths of the average calendar year and the solar year, minimizing the length of the calendar cycle, and using a cycle length that is a round number. It’s plausible that there was no design goal of making the calendar cycle an integer number of weeks.

Related posts

Coiled logarithmic graph

A logarithmic scale is very useful when you need to plot data over an extremely wide range. However, sometimes even a logarithmic scale may not reduce the visual range enough.

I recently saw a timeline-like graph that was coiled into a spiral, packing more information into a limited visual window [1].

I got to thinking about when this could be useful. This raises two questions.

  1. When might you want to visualize something that grows faster than exponentially?
  2. How would this compare to the radial growth of a spiral as a function of arc length?

Let’s address the second question first. What exactly do we mean by spiral? Archemedian spirals are a class of spirals that include what many people think of as a spiral. These spirals have polar equation

r = b θ1/n

where n is a constant. The choice n = 1 corresponds to spirals with evenly spaced arms, such as a roll of carpet.

When n = 1, the length of the spiral for θ running from 0 to T is T on the order of T² when T is large, as shown in this post. For general n, the length is on the order of T1 + 1/n.

If n = 1 and the distance of a spiral from the origin grows linearly as a function of θ, the arc length is growing quadratically. If the logarithm is growing quadratically, the function itself must be something like exp(kθ²).

So now back to the first question. What is something that grows super exponentially that we might want to plot? The first thing that came to my mind was factorials. Stirling’s approximation shows that the logarithm of factorial grows faster than linearly but slower than any power with exponent larger than 1.

log(x!) = x log xx + O(log x).

and so if we plot x! on a coiled logarithmic scale, the distance from the image of a point to the origin will grow less than linearly, even if we allow the spiral parameter n to be larger than 1. But for a limited range of x, a coiled logarithmic plot works well. Here’s a polar plot of log(x!)/10.

I don’t know of a natural example of something that grows like exp(kθ²). Of course you could always construct something to have this growth, but I can’t think of a common algorithm, for example, whose run time is the exponential of a quadratic. If you know a good example, please a comment.

[1] I can trace the provenance of the image to here, but that page doesn’t say where the image came from.

Multiple angles and Osborn’s rule

This post was motivated by an exercise in [1] that says

Prove that for the hyperbolic functions … formulas hold similar to those in Section 2.3 with all the minuses replaced by pluses.

My first thought was that this sounds like Osborn’s rule, a heuristic for translating between (circular) trig identities and hyperbolic trig identities. As explained in that post, Osborn’s rule is an easy consequence of Euler’s identity.

Now what are the formulas the exercise refers to?

Sine to hyperbolic sine

Here’s the identity for sine.

\sin n\theta = \binom{n}{1} \cos^{n-1}\theta \sin\theta - \binom{n}{3} \cos^{n-3}\theta \sin^3\theta + \binom{n}{5} \cos^{n-5}\theta \sin^5\theta -\cdots

Osborn’s rule says to change sin to sinh and cos to cosh, and flip signs whenever two sinh terms are multiplied together. The term with sin³ θ loses its minus sign because two sines are multiplied together. The term with sin5 θ changes sign twice, and so the net result is that it doesn’t change sign. So we have the following.

\sinh n\theta = \binom{n}{1} \cosh^{n-1}\theta \sinh\theta + \binom{n}{3} \cosh^{n-3}\theta \sinh^3\theta + \binom{n}{5} \cosh^{n-5}\theta \sinh^5\theta + \cdots

Cosine to hyperbolic cosine

The cosine identity

\cos n\theta = \cos^n \theta - \binom{n}{2} \cos^{n-2}\theta \sin^2 \theta + \binom{n}{4} \cos^{n-4} \theta \sin^4\theta - \cdots

becomes

\cosh n\theta = \cosh^n \theta - \binom{n}{2} \cosh^{n-2}\theta \sinh^2 \theta + \binom{n}{4} \cosh^{n-4} \theta \sinh^4\theta - \cdots

by similar reasoning.

Tangent to hyperbolic tangent

Osborn’s rule applies to tan and tanh as well, if you imagine each tangent as sin/cos.

Thus

\tan n\theta = \frac{\binom{n}{1} \tan \theta - \binom{n}{3} \tan^3 \theta + \binom{n}{5} \tan^5 \theta - \cdots}{1 - \binom{n}{2} \tan^2 \theta + \binom{n}{4} \tan^4\theta - \cdots}

becomes

\tanh n\theta = \frac{\binom{n}{1} \tanh \theta + \binom{n}{3} \tanh^3 \theta + \binom{n}{5} \tanh^5 \theta + \cdots}{1 - \binom{n}{2} \tanh^2 \theta + \binom{n}{4} \tanh^4\theta + \cdots}

Related posts

[1] Dmitry Fuchs and Serge Tabachnikov. Mathematical Omnibus: Thirty Lectures on Classical Mathematics.

Impersonating an Edwardian math professor

I’ve read some math publications from around a century or so ago, and I wondered if I could pull off being a math professor if a time machine dropped me into a math department from the time. I think I’d come across as something of an autistic savant, ignorant of what contemporaries would think of as basic math but fluent in what they’d consider more advanced.

There are two things in particular that were common knowledge at the time that I would be conspicuously ignorant of: interpolation tricks and geometry.

People from previous eras knew interpolation at a deeper level than citing the Lagrange interpolation theorem, out of necessity. They learned time-saving tricks have since been forgotten.

The biggest gap in my knowledge would be geometry. Mathematicians a century ago had a far deeper knowledge of geometry, particularly synthetic geometry, i.e. geometry in the style of Euclid rather than in the style of Descartes.

Sometimes older math books use notation or terminology that has since changed. I imagine I’d make a few gaffs, not immediately understanding a basic term or using a term that wasn’t coined until later.

If I had to teach a class, I’d choose something like real and complex analysis. Whittaker & Watson’s book on the subject was first published in 1902 and remains a common reference today. The only thing I find jarring about that book is that “show” is spelled “shew.” Makes me think of Ed Sullivan. But I think I’d have a harder time teaching a less advanced class.

Related posts

Floating point: Everything old is new again

In the early days of computing hardware (and actually before) mathematicians put a lot of effort into understanding and mitigating the limitations of floating point arithmetic. They would analyze mundane tasks such as adding a list of numbers and think carefully about the best way to carry out such tasks as accurately as possible.

Now that most arithmetic is carried out in double precision, you can often get away with not thinking about such things. Except when you can’t. The vagaries of floating point computation still matter occasionally, even with double precision, though not as often as they did with single precision.

Although most computing has moved from single precision to double precision, there is increasing interest in going the opposite direction, from single precision to half precision. The main driver is neural networks. You don’t need a lot of precision in weights, and you’ve got a lot of numbers to store. So instead of taking 64 bits to store a double precision number, or 32 bits to store a single precision number. you might want to use a 16 bit or even 8 bit floating point number. That way you can fit more weights in memory at once.

However, when you move to lower precision numbers, you now have to think again about the things numerical analysts thought about a couple generations ago, such as different ways of rounding. You might think that floating point rounding could be modeled by random variables. If so, you’re in good company, because John von Neumann suggested this in 1947. But a few years later people began to realize that floating point rounding errors are not random. Or to be more precise, they began to realize that modeling rounding errors as random was inadequate; of course they knew that rounding errors weren’t literally random.

But what it rounding errors were random? This would lead to more error cancellation than we see in practice with floating point arithmetic. With stochastic rounding, the rounded values become unbiased estimators of the values they would like to represent but cannot represent exactly. Now the central limit theorem and all that come to your aid. More on applications of stochastic rounding here.

(To be pedantic a moment, stochastic rounding isn’t truly random, but uses pseudorandom numbers to implement a procedure which is well modeled by randomness. Random is as random does.)

Related posts

Birthday problem approximation

The birthday problem is a party trick with serious practical applications. It’s well known to people who have studied probability, but the general public is often amazed by it.

If you have a group of 23 people, there’s a 50-50 chance that at least two people have the same birthday. With a larger group, say 30, its quite likely two birthdays are the same. Not only is this a theoretical result, based on certain modeling assumptions, it actually works in practice, essentially as predicted.

Variations of the birthday problem come up routinely in applications. For example, in cryptography it’s important to know the probability of secure hash collisions. Hash functions are deterministic, but for practical purposes they act random. If you are hashing into a space of N possible hash values, you can expect to compute about √N items before two items have the same hash value.

The square root rule of thumb is very useful. For example, if you’re computing 128-bit hash values, there’s about a 50-50 chance of seeing two duplicate hash values after hashing about 264 items.

The square root heuristic works well for large N, but gives mediocre results for N as small as 365. When applied to the original birthday problem, it predicts even odds for seeing a pair of equal birthdays in a group of 19 people. That’s a little low, but not too far off.

As useful as the square root rule is, it is only good for finding when the probability of duplication is 1/2. What if you’d like to know when the probability of a collision is, say, 0.01?

Let N be the number of possible options and let r be the number of items chosen independently from the set of N options. Let P(N, r) be the probability that all r choices are distinct. Then

P(N, r) ≈ exp( −r²/2N).

This approximation [1] is valid when N is large and r is small relative to N. We could be more precise about the error bounds, but suffice it to say that bigger N is better.

When N = 365 and r = 23, the approximation above computes the probability that all 23 choices are distinct as 0.48, matching the canonical birthday problem and showing an improvement over the square root heuristic.

Related posts

[1] Anthony C. Robin. The Birthday Distribution: Some More Approximations. The Mathematical Gazette, Vol. 68, No. 445 (October, 1984), pp. 204–206

(1 − z) / (1 + z)

“I keep running into the function f(z) = (1 − z)/(1 + z).” I wrote this three years ago and it’s still true.

This function came up implicitly in the previous post. Ramanujan’s excellent approximation for the perimeter of an ellipse with semi-axes a and b begins by introducing

λ = (ab)/(a + b).

If the problem is scaled so that a = 1, then λ = f(a). Kummer’s series for the exact perimeter of an ellipse begins by introducing the same variable squared.

As this post points out, the function f(z) comes up in the Smith chart from electrical engineering, and is also useful in mental calculation of roots. It also comes up in mentally calculating logarithms.

The function f(z) is also useful for computing the tangent of angles near half a right angle because

tan(π/4 − z) ≈ f(z)

with an error on the order of z³. So when z is small, the error is very, very small, much like the approximation sin(x) ≈ x for small angles.