Fibonacci numbers, arctangents, and pi

Here’s an unusual formula for π. Let Fn be the nth Fibonacci number. Then

\pi = 4 \sum_{n=1}^\infty \arctan\left( \frac{1}{F_{2n+1}} \right)

As mysterious as this equation may seem, it’s not hard to prove. The arctangent identity

\arctan\left(\frac{1}{F_{2n+1}}\right) = \arctan\left(\frac{1}{F_{2n}}\right) - \arctan\left(\frac{1}{F_{2n+2}}\right)

shows that the sum telescopes, leaving only the first term, arctan(1) = π/4. To prove the arctangent identity, take the tangent of both sides, use the addition law for tangents, and use the Fibonacci identity

F_{n+1} F_{n-1} - F_n^2 = (-1)^n

(By the way, I didn’t come up with this. I saw it somewhere that implied it’s been around a while.)

Number of digits in n!

The other day I ran across the fact that 23! has 23 digits. That made me wonder how often n! has n digits.

There can only be a finite number of cases, because n! grows faster than 10n for n > 10, and it’s reasonable to guess that 23 might be the largest case. Turns out it’s not, but it’s close. The only cases where n! has n digits are 1, 22, 23, and 24. Once you’ve found these by brute force, it’s not hard to show that they must be the only ones because of the growth rate of n!.

Is there a convenient way to find the number of digits in n! without having to compute n! itself? Sure. For starters, the number of digits in the base 10 representation of a number x is

⌊ log10 x ⌋ + 1.

where ⌊ z ⌋ is the floor of z, the largest integer less than or equal to z. The log of the factorial function is easier to compute than the factorial itself because it won’t overflow. You’re more likely to find a function to compute the log of the gamma function than the log of factorial, and more likely to find software that uses natural logs than logs base 10. So in Python, for example, you could compute the number of digits with this:

from scipy.special import gammaln
from math import log, floor

def digits_in_factorial(n):
    return floor( gammaln(n+1)/log(10.0) ) + 1

What about a more elementary formula, one that doesn’t use the gamma function? If you use Stirling’s approximation for factorial and take log of that you should at least get a good approximation. Here it is again in Python:

from math import log, floor, pi

def stirling(n):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(10) ) + 1

The code above is exact for every n > 2 as far as I’ve tested, up to n = 1,000,000. (Note that one million factorial is an extremely large number. It has 5,565,709 digits. And yet we can easily say something about this number, namely how many digits it has!)

The code may break down somewhere because the error in Stirling’s approximation or the limitations of floating point arithmetic. Stirling’s approximation gets more accurate as n increases, but it’s conceivable that a factorial value could be so close to a power of 10 that the approximation error pushes it from one side of the power of 10 to the other. Maybe that’s not possible and someone could prove that it’s not possible.

You could extend the code above to optionally take another base besides 10.

def digits_in_factorial(n, b=10):
    return floor( gammaln(n+1)/log(b) ) + 1

def stirling(n, b=10):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(b) ) + 1

The code using Stirling’s approximation still works for all n > 2, even for b as small as 2. This is slightly surprising since the number of bits in a number is more detailed information than the number of decimal digits.

Doubly and triply periodic functions

A function f is periodic if there exists a constant period ω such that f(x) = f(x + ω) for all x. For example, sine and cosine are periodic with period 2π.

There’s only one way a function on the real line can be periodic. But if you think of functions of a complex variable, it makes sense to look at functions that are periodic in two different directions. Sine and cosine are periodic as you move horizontally across the complex plane, but not if you move in any other direction. But you could imagine a function that’s periodic vertically as well as horizontally.

A doubly periodic function satisfies f(x) = f(x + ω1) and f(x) = f(x + ω2) for all x and for two different fixed complex periods, ω1 and ω2, with different angular components, i.e. the two periods are not real multiples of each other. For example, the two periods could be 1 and i.

How many doubly periodic functions are there? The answer depends on how much regularity you require. If you ask that the functions be differentiable everywhere as functions of a complex variable (i.e. entire), the only doubly periodic functions are constant functions [1]. But if you relax your requirements to allow functions to have singularities, there’s a wide variety of functions that are doubly periodic. These are the elliptic functions. They’re periodic in two independent directions, and meromorphic (i.e. analytic except at isolated poles). [2]

What about triply periodic functions? If you require them to be meromorphic, then the only triply periodic functions are constant functions. To put it another way, if a meromorphic function is periodic in three directions, it’s periodic in every direction for every period, i.e. constant. If a function has three independent periods, you can construct a sequence with a limit point where the function is constant, and so it’s constant everywhere.

* * *

[1] Another way to put this is to say that elliptic functions must have at least one pole inside the parallelogram determined by the lines from the origin to ω1 and ω2. A doubly periodic function’s values everywhere are repeats of its values on this parallelogram. If the function were continuous over this parallelogram (i.e. with no poles) then it would be bounded over the parallelogram and hence bounded everywhere. But Liovuille’s theorem says a bounded entire function must be constant.

[2] We don’t consider arbitrary singularities, only isolated poles. There are doubly periodic functions with essential singularities, but these are outside the definition of elliptic functions.

Generalization of Fibonacci ratios

Each Fibonacci number is the sum of its two predecessors. My previous post looked at generalizing this to the so-called Tribonacci numbers, each being the sum of its three predecessors. One could keep going, defining the Tetrabonacci numbers and in general the n-Fibonacci numbers for any n at least 2.

For the definition to be complete, you have to specify the first n of the n-Fibonacci numbers. However, these starting values hardly matter for our purposes. We want to look at the limiting ratio of consecutive n-Fibonacci numbers, and this doesn’t depend on the initial conditions. (If you were determined, you could find starting values where this isn’t true. It’s enough to pick integer initial values, at least one of which is not zero.)

As shown in the previous post, the ratio is the largest eigenvalue of an n by n matrix with 1’s on the first row and 1’s immediately below the main diagonal. The characteristic polynomial of such a matrix is

λn – λn-1 – λn-2 – … -1

and so we look for the largest zero of this polynomial. We can sum the terms with negative coefficients as a geometric series and show that the eigenvalues satisfy

λn – 1/(2 – λ) = 0.

So the limiting ratio of consecutive n-Fibonacci numbers is the largest root of the above equation. You could verify that when n = 2, we get the golden ratio φ as we should, and when n = 3 we get around 1.8393 as in the previous post.

As n gets large, the limiting ratio approaches 2. You can see this by taking the log of the previous equation.

n = -log(2 – λ)/log(λ).

As n goes to infinity, λ must approach 2 so that the right side of the equation also goes to infinity.

Power method and Fibonacci numbers

Take an n × n matrix A and a vector x of length n. Now multiply x by A, then multiply the result by A, over and over again. The sequence of vectors generated by this process will converge to an eigenvector of A. (An eigenvector is a vector whose direction is unchanged when multiplied by A. Multiplying by A may stretch or shrink the vector, but it doesn’t rotate it at all. The amount of stretching is call the corresponding eigenvalue.)

The eigenvector produced by this process is the eigenvector corresponding to the largest eigenvalue of A, largest in absolute value. This assumes A has a unique eigenvector associated with its largest eigenvalue. It also assumes you’re not spectacularly unlucky in your choice of vector to start with.

Assume your starting vector x has some component in the direction of the v, the eigenvector corresponding to the largest eigenvalue. (The vectors that don’t have such a component lie in an n-1 dimensional subspace, which would has measure zero. So if you pick a starting vector at random, with probability 1 it will have some component in the direction we’re after. That’s what I meant when I said you can’t start with a spectacularly unlucky initial choice.) Each time you multiply by A, the component in the direction of v gets stretched more than the components orthogonal to v. After enough iterations, the component in the direction of v dominates the other components.

What does this have to do with Fibonacci numbers? The next number in the Fibonacci sequence is the sum of the previous two. In matrix form this says

\left[\begin{array}{c} x_{n+1} \\\ x_{n} \end{array}\right] = \left[\begin{array}{cc} 1 & 1 \\\ 1 & 0 \end{array}\right] \left[\begin{array}{c} x_{n} \\\ x_{n-1} \end{array}\right]

The ratio of consecutive Fibonacci numbers converges to the golden ratio φ because φ is the largest eigenvalue of the matrix above.

The first two Fibonacci numbers are 1 and 1, so the Fibonacci sequence corresponds to repeatedly multiplying by the matrix above, starting with the initial vector x = [1 1]T. But you could start with any other vector and the ratio of consecutive terms would converge to the golden ratio, provided you don’t start with a vector orthogonal to [1 φ]T. Starting with any pair of integers, unless both are zero, is enough to avoid this condition, since φ is irrational.

We could generalize this approach to look at other sequences defined by a recurrence relation. For example, we could look at the “Tribonacci” numbers. The Tribonacci sequence starts out 1, 1, 2, and then each successive term is the sum of the three previous terms. We can find the limiting ratio of Tribonacci numbers by finding the largest eigenvalue of the matrix below.

\left[\begin{array}{ccc} 1 & 1 & 1 \\\ 1 & 0 & 0 \\\ 0 & 1 & 0 \end{array}\right]

This eigenvalue is the largest root of x3x2x – 1 = 0, which is about 1.8393. As before, the starting values hardly matter. Start with any three integers, at least one of them non-zero, and define each successive term to be the sum of the previous three terms. The ratio of consecutive terms in this series will converge to 1.8393.

By the way, you could compute the limiting ratio of Tribonacci numbers with the following bit of Python code:

      from scipy import matrix, linalg
      M = matrix([[1, 1, 1], [1, 0, 0], [0, 1, 0]])
      print( linalg.eig(M) )

Update: The next post generalizes this one to n-Fibonacci numbers.

Casting out sevens

A while back I wrote about a method to test whether a number is divisible by seven. I recently ran across another method for testing divisibility by 7 in Martin Gardner’s book The Unexpected Hanging and Other Mathematical Diversions. The method doesn’t save too much effort compared to simply dividing by 7, but it’s interesting. It looks a little mysterious at first, though the explanation of why it works is very simple.

Suppose you want to find whether a number n is divisible by 7. Start with the last digit of n and write a 1 under the last digit, and a 3 under the next to last digit. The digits under the digits of n cycle through 1, 3, 2, 6, 4, 5, repeatedly until there is something under each digit in n. Now multiply each digit of n by the digit under it and add the results.

For example, suppose n = 1394. Write the digits 1, 3, 2, and 6 underneath, from right to left:


The sum we need to compute is 1*6 + 3*2 + 9*3 + 4*1 = 6 + 6 + 27 + 4 = 43.

This sum, 43 in our example, has the same remainder when divided by 7 as the original number, 1394 in our example. Since 43 is not divisible by 7, neither is 1394. Not only that, the result of our method has the same remainder by 7 as the number we started with. In our example, 43 leaves a remainder of 1 by 7, so 1394 also leaves a remainder of 1.

You could apply this method repeatedly, though in this case 43 is small enough that it’s easy enough to see that it leaves a remainder of 1.

Suppose you started with a 1000-digit number n. Each digit is no more than 9, and is being multiplied by a number no more than 6. So the sum would be less than 54000. So you’ve gone from a 1000-digit number to at most a 5-digit number in one step. One or two more steps should be enough for the remainder to be obvious.

Why does this method work? The key is that the multipliers 1, 3, 2, 6, 4, 5 are the remainders when the powers of 10 are divided by 7. Since 106 has a remainder of 1 when divided by 7, the numbers 106a+b and 10b have the same remainder by 7, and that’s why the multipliers have period 6.

All the trick is doing is expanding the base 10 representation of a number and adding up the remainders when each term is divided by seven. In our example, 1394 = 1000 + 3*100 + 9*10 + 4, and mod 7 this reduces to 1*6 + 3*2 + 9*3 + 4*1, the exact calculation above.

The trick presented here is analogous to casting out nines. But since every power of 10 leaves a remainder of 1 when divided by 9, all the multipliers in casting out nines are 1.

You could follow the pattern of this method to create a divisibility rule for any other divisor, say 13 for example, by letting the multipliers be the remainders when powers of 10 are divided by 13.

Related post: Divisibility rules in hex

Computing square triangular numbers

The previous post stated a formula for f(n), the nth square triangular number (i.e. the nth triangular number that is also a square number):

((17 + 12√2)n + (17 – 12√2)n – 2)/32

Now 17 – 12√2 is 0.029… and so the term (17 – 12√2)n approaches zero very quickly as n increases. So the f(n) is very nearly

((17 + 12√2)n – 2)/32

The error in the approximation is less than 0.001 for all n, so the approximation is exact when rounded to the nearest integer. So the nth square triangular number is

⌊((17 + 12√2)n +14)/32⌋

where ⌊x⌋ is the greatest integer less than x.

Here’s how you might implement this in Python:

    from math import sqrt, floor

    def f(n):
        x = 17 + 12*sqrt(2)
        return floor((x**n + 14)/32.0)

Unfortunately this formula isn’t that useful in ordinary floating point computation. When n = 11 or larger, the result is needs more bits than are available in the significand of a floating point number. The result is accurate to around 15 digits, but the result is longer than 15 digits.

The result can also be computed with a recurrence relation:

f(n) = 34 f(n-1) – f(n-2) + 2

for n > 2. (Or n > 1 if you define f(0) to be 0, a reasonable thing to do).

This only requires integer arithmetic, so it’s only limitation is the size of the integers you can compute with. Since Python has arbitrarily large integers, the following works for very large integers.

    def f(n):
        if n < 2:
            return n
        return f(n-1)*34 - f(n-2) + 2

This is a direct implementation, though it’s inefficient because of the redundant function evaluations. It could be made more efficient by, for example, using memoization.

When is a triangle a square?

Of course a triangle cannot be a square, but a triangular number can be a square number.

A triangular number is the sum of the first so many positive integers. For example, 10 is a triangular number because it equals 1+2+3+4. These numbers are called triangle numbers because you can form a triangle by having a row of one coin, two coin, three coins, etc. forming a triangle.

The smallest number that is both triangular and square is 1. The next smallest is 36. There are infinitely many numbers that are both triangular and square, and there’s even a formula for the nth number that is both a triangle and a square:

((17 + 12√2)n + (17 – 12√2)n – 2)/32

Source: American Mathematical Monthly, February 1962, page 169.

For more on triangle numbers and their generalizations, see Twelve Days of Christmas and Tetrahedral Numbers.

There is also a way to compute the square triangular numbers recursively discussed in the next post.

Attributing changes to numerator and denominator

This afternoon I helped someone debug a financial spreadsheet. One of the reasons spreadsheets can be so frustrating to work with is that assumptions are hard to see. You have to click on cells one at a time to find formulas, then decode cell coordinates into their meanings.

The root problem turned out to be an assumption that sounds reasonable. You’re making two changes, one to a numerator and one to a denominator. The total change equals the sum of the results of each change separately. Except that’s not so.

At this point, a mathematician would say “Of course you can’t split the effects like that. It’s nonlinear.” But it’s worth pursuing a little further. For one thing, it doesn’t help a general audience to just say “it’s nonlinear.” For another, it’s worth seeing when it is appropriate, at least approximately, to attribute the effects this way.

You start with a numerator and denominator, N/D, then change N to N+n and change D to D+d. The total change is then (N+n)/(D+d) – N/D.

The result from only the change in the numerator is n/D. The result from only the change in denominator is N/(D+d) – N/D.

The difference between the total change and the sum of the two partial changes is


The assumption that you can take the total change and attribute it to each change separately is wrong in general. But it is correct if n or d is zero, and it is approximately correct with nd is small. This can make the bug harder to find. It could also be useful when nd is indeed small and you don’t need to be exact.

Also, if all the terms are positive, the discrepancy is negative, i.e. the total change is less than the sum of the partial changes. Said another way, allocating the change to each cause separately over-estimates the total change.


Last digit of largest known prime

In my previous post, we looked at the largest known prime,  P = 257885161 – 1, and how many digits it has in various bases. This post looks at how to find the last digit of P in base b. We assume b < P. (If b = P then the last digit is 0, and if b > P the last digit is P.)

If b is a power of 2, we showed in the previous post that the last digit of P is b-1.

If b is odd, we can find the last digit using Euler’s totient theorem. Let φ(b) be the number of positive integers less than b and relatively prime to b. Then Euler’s theorem tells us that 2φ(b) ≡ 1 mod b since b is odd. So if r is the remainder when 57885161 is divided by φ(b), then the last digit of Q = P+1 in base b is the same as the last digit of 2r in base b.

For example, suppose we wanted to know the last digit of P in base 15. Since φ(15) = 8, and the remainder when 57885161 is divided by 8 is 1, the last digit of Q base 15 is 2. So the last digit of P is 1.

So we know how to compute the last digit in base b if b is a power or 2 or odd. Factor out all the powers of 2 from b so that b = 2k d and d is odd. We can find the last digit base 2k, and we can find the last digit base d, so can we combine these to find the last digit base b? Yes, that’s exactly what the Chinese Remainder Theorem is for.

To illustrate, suppose we want to find the last digit of P in base 12. P ≡ 3 mod 4 and P ≡ 1 mod 3, so P ≡ 7 mod 12. (The numbers are small enough here to guess. For a systematic approach, see the post mentioned above.) So the last digit of P is 7 in base 12.

If you’d like to write a program to play around with this, you need to be able to compute φ(n). You may have an implementation of this function in your favorite environment. For example, it’s sympy.ntheory.totient in Python. If not, it’s easy to compute φ(n) if you can factor n. You just need to know two things about φ. First, it’s a multiplicative function, meaning that if a and b are relatively prime, then φ(ab) = φ(a) φ(b). Second, if p is a prime, then φ(pk) = pkpk-1.

Playing with the largest known prime

The largest known prime at the moment is P = 257885161 – 1. This means that in binary, the number is a string of 57,885,161 ones.

You can convert binary numbers to other bases that are powers of two, say 2k, by starting at the right end and converting to the new base in blocks of size k. So in base 4, we’d start at the right end and convert pairs of bits to base 4. Until we get to the left end, all the pairs are 11, and so they all become 3’s in base four. So in base 4, P would be written as a 1 followed by 28,942,580 threes.

Similarly, we could write P in base 8 by starting at the right end and converting triples of bits to base 8. Since 57885161 = 3*19295053 + 2, we’ll have two bits left over when we get to the left end, so in base 8, P would be written as 3 followed by 19,295,053 sevens. And since 57885161 = 4*14471290 + 1, the hexadecimal (base 16) representation of P is a 1 followed by 14,471,290 F’s.

What about base 10, i.e. how many digits does P have? The number of digits in a positive integer n is ⌊log10n⌋ + 1 where ⌊x⌋ is the greatest integer less than x. For positive numbers, this just means chop off the decimals and keep the integer part.

We’ll make things slightly easier for a moment and find how many digits P+1 has.

log10(P+1) = log10(257885161) = 57885161 log10(2)  = 17425169.76…

and so P+1 has 17425170 digits, and so does P. How do we know P and P+1 have the same number of digits? There are several ways you could argue this. One is that P+1 is not a power of 10, so the number of digits couldn’t change between P and P+1.

Update: How many digits does P have in other bases?

We can take the formula above and simply substitute b for 10: The base b representation of a positive integer n has ⌊logbn⌋ + 1 digits.

As above, we’d rather work with Q = P+1 than P. The numbers P and Q will have the same number of digits unless Q is a power of the base b. But since Q is a power of 2, this could only happen if b is also a power of two, say b = 2k, and k is a divisor of 57885161. But 57885161 is prime, so this could only happen if b = Q. If b > P, then P has one digit base b. So we can concentrate on the case b < Q, in which case P and Q have the same number of digits. The number of digits in Q is then 57885161 logb(2) + 1.

If b = 2k, we can generalize the discussion above about bases 4, 8, and 16. Let q and r be the quotient and remainder when 57885161 is divided by k. The first digit is r and the remaining q digits are b-1.

Next post: Last digit of largest known prime


Last digits of Fibonacci numbers

If you write out a sequence of Fibonacci numbers, you can see that the last digits repeat every 60 numbers.

The 61st Fibonacci number is 2504730781961. The 62nd is 4052739537881. Since these end in 1 and 1, the 63rd Fibonacci number must end in 2, etc. and so the pattern starts over.

It’s not obvious that the cycle should have length 60, but it is fairly easy to see that there must be a cycle. There are only 10*10 possibilities for two consecutive digits. Since the Fibonacci numbers are determined by a two-term recurrence, and since the last digit of a sum is determined by the sum of the last digits, the sequence of last digits must repeat eventually. Here “eventually” means after at most 10*10 terms.

Replace “10” by any other base in the paragraph above to show that the sequence of last digits must be cyclic in any base. In base 16, for example, the period is 24. In hexadecimal notation the 25th Fibonacci number is 12511 and the 26th is 1DA31, so the 27th must end in 2, etc.

Here’s a little Python code to find the period of the last digits of Fibonacci numbers working in any base b.

from sympy import fibonacci as f

def period(b):
    for i in range(1, b*b+1):
        if f(i)%b == 0 and f(i+1)%b == 1:

This shows that in base 100 the period is 300. So in base 10 the last two digits repeat every 300 terms.

The period seems to vary erratically with base as shown in the graph below.

Life lessons from differential equations

Ten life lessons from differential equations:

  1. Some problems simply have no solution.
  2. Some problems have no simple solution.
  3. Some problems have many solutions.
  4. Determining that a solution exists may be half the work of finding it.
  5. Solutions that work well locally may blow up when extended too far.

Related posts:


High-dimensional integration

Numerically integrating functions of many variables almost always requires Monte Carlo integration or some variation. Numerical analysis textbooks often emphasize one-dimensional integration and, almost as a footnote, say that you can use a product scheme to bootstrap one-dimensional methods to higher dimensions. True, but impractical.

Traditional numerical integration routines work well in low dimensions. (“Low” depends on context, but let’s say less than 10 dimensions.) When you have the choice of a well-understood, deterministic method, and it works well, by all means use it. I’ve seen people who are so used to problems requiring Monte Carlo methods that they use these methods on low-dimensional problems where they’re not the most efficient (or robust) alternative.

Monte Carlo

Except for very special integrands, high dimensional integration requires either Monte Carlo integration or some variation such as quasi-Monte Carlo methods.

As I wrote about here, naive Monte Carlo integration isn’t practical for high dimensions. It’s unlikely that any of your integration points will fall in the region of interest without some sort of importance sampler. Constructing a good importance sampler requires some understanding of your particular problem. (Sorry. No one-size-fits-all black-box methods are available.)

Quasi-Monte Carlo (QMC)

Quasi-Monte Carlo methods (QMC) are interesting because they rely on something like a random sequence, but “more random” in a sense, i.e. more effective at exploring a high-dimensional space. These methods work well when an integrand has low effective dimension. The effective dimension is low when nearly all the action takes place on a lower dimensional manifold. For example, a function may ostensibly depend on 10,000 variables, while for practical purposes 12 of these are by far the most important. There are some papers by Art Owen that say, roughly, that Quasi-Monte Carlo methods work well if and only if the effective dimension is much smaller than the nominal dimension. (QMC methods are especially popular in financial applications.)

While QMC methods can be more efficient than Monte Carlo methods, it’s hard to get bounds on the integration error with these methods. Hybrid approaches, such as randomized QMC can perform remarkably well and give theoretical error bounds.

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) is a common variation on Monte Carlo integration that uses dependent random samples. In many applications, such as Bayesian statistics, you’d like to be able to create independent random samples from some probability distribution, but this is not practical. MCMC is a compromise. It produces a Markov chain of dependent samples, and asymptotically these samples have the desired distribution. The dependent nature of the samples makes it difficult to estimate error and to determine how well the integration estimates using the Markov chain have converged.

(Some people speak of the Markov chain itself converging. It doesn’t. The estimates using the Markov chain may converge. Speaking about the chain converging may just be using informal language, or it may betray a lack of understanding.)

There is little theory regarding the convergence of estimates from MCMC, aside from toy problems. An estimate can appear to have converged, while an important region of the integration problem remains unexplored. Despite its drawbacks, MCMC is the only game in town for many applications.


If you’d like help with high dimensional integration, or any other kind of numerical integration, please contact me to discuss how we might work together.

When the last digits of powers don’t change

If you raise any integer to the fifth power, its last digit doesn’t change. For example, 25 = 32.

It’s easy to prove this assertion by brute force. Since the last digit of bn only depends on the last digit of b, it’s enough to verify that the statement above holds for 0, 1, 2, …, 9.

Although that’s a valid proof, it’s not very satisfying. Why fifth powers? We’re implicitly working in base 10, as humans usually do, so what is the relation between 5 and 10 in this context? How might we generalize it to other bases?

The key is that 5 = φ(10) + 1 where φ(n) is the number of positive integers less than n and relatively prime to n.

Euler discovered the φ function and proved that if a and m are relatively prime, then

aφ(m) = 1 (mod m)

This means that aφ(m) – 1 is divisible by m. (Technically I should use the symbol ≡ (U+2261) rather than = above since the statement is a congruence, not an equation. But since Unicode font support on various devices is disappointing, not everyone could see the proper symbol.)

Euler’s theorem tells us that if a is relatively prime to 10 then a4 ends in 1, so a5 ends in the same digit as a. That proves our original statement for numbers ending in 1, 3, 7, and 9. But what about the rest, numbers that are divisible by 2 or 5? We’ll answer that question and a little more general one at the same time. Let m = αβ where α and β are distinct primes. In particular, we could choose α = 2 and β = 5; We will show that

aφ(m)+1 = a (mod m)

for all a, whether relatively prime to m or not. This would show in addition, for example, that in base 15, every number keeps the same last “digit” when raised to the 9th power since φ(15) = 8.

We only need to be concerned with the case of a being a multiple of α or β since Euler’s theorem proves our theorem for the rest of the cases. If a = αβ our theorem obviously holds, so assume a is some multiple of α, a = kα with k less than β (and hence relatively prime to β).

We need to show that αβ divides (kα)φ(αβ)+1kα. This expression is clearly divisible by α, so the remaining task is to show that it is divisible by β. We’ll show that (kα)φ(αβ) – 1 is divisible by β.

Since α and k are relatively prime to β, Euler’s theorem tells us that αφ(β) and kφ(β) are congruent to 1 (mod β). This implies that kαφ(β) is congruent to 1, and so kαφ(α)φ(β) is also congruent to 1 (mod β). One of the basic properties of φ is that for relatively prime arguments α and β, φ(αβ) = φ(α)φ(β) and so we’re done.

Exercise: How much more can you generalize this? Does the base have to be the product of two distinct primes?