Numerical differentiation

Today I needed to the derivative of the zeta function. SciPy implements the zeta function, but not its derivative, so I needed to write my own version.

The most obvious way to approximate a derivative would be to simply stick a small step size into the definition of derivative:

f’(x) ≈ (f(x+h) – f(x)) / h

However, we could do much better using

f’(x) ≈ (f(x+h) – f(x-h)) / 2h

To see why, expand f(x) in a power series:

f(x + h) = f(x) + h f‘(x) + h2 f”(x)/2 + O(h3)

A little rearrangement shows that the error in the one-sided difference, the first approximation above, is O(h). Now if you replace h with –h and do a little algebra you can also show that the two-sided difference is O(h2). When h is small, h2 is very small, so the two-sided version will be more accurate for sufficiently small h.

So how small should h be? The smaller the better, in theory. In computer arithmetic, you lose precision whenever you subtract two nearly equal numbers. The more bits two numbers share, the more bits of precision you may lose in the subtraction. In my application, h = 10-5 works well: the precision after the subtraction in the numerator is comparable to the precision of the (two-sided) finite difference approximation. The following code was adequate for my purposes.

    from scipy.special import zeta

    def zeta_prime(x):
        h = 1e-5
        return (zeta(x+h,1) - zeta(x-h,1))/(2*h)

The zeta function in SciPy is Hurwitz zeta function, a generalization of the Riemann zeta function. Setting the second argument to 1 gives the Riemann zeta function.

There’s a variation on the method above that works for real-valued functions that extend to a complex analytic function. In that case you can use the complex step differentiation trick to use

Im( f(x+ih)/h )

to approximate the derivative. It amounts to the two-sided finite difference above, except you don’t need to have a computer carry out the subtraction, and so you save some precision. Why’s that? When x is real, xih and xih are complex conjugates, and f(x – ih) is the conjugate of f(x + ih), i.e. conjugation and function application commute in this setting. So (f(x+ih) – f(x-ih)) is twice the imaginary part of f(x + ih).

SciPy implements complex versions many special functions, but unfortunately not the zeta function.

Splitting proofs in two

“Ever since Euclid, mathematical proofs have served a dual purpose: certifying that a statement is true and explaining why it is true. In the future these two epistemological functions may be divorced. In the future, the computer assistant may take care of the certification and leave the mathematician to look for an explanation that humans can understand.”

Dana Mackenzie, “What in the Name of Euclid Is Going On Here?”, Science, 2005


Distance to Mars

The distance between the Earth and Mars depends on their relative positions in their orbits and varies quite a bit over time. This post will show how to compute the approximate distance over time. We’re primarily interested in Earth and Mars, though this shows how to calculate the distance between any two planets.

The planets have elliptical orbits with the sun at one focus, but these ellipses are nearly circles centered at the sun. We’ll assume the orbits are perfectly circular and lie in the same plane. (Now that Pluto is not classified as a planet, we can say without qualification that the planets have nearly circular orbits. Pluto’s orbit is much more elliptical than any of the planets.)

We can work in astronomical units (AUs) so that the distance from the Earth to the sun is 1. We can also work in units of years so that the period is also 1. Then we could describe the position of the Earth at time t as exp(2πit).

Mars has a larger orbit and a longer period. By Kepler’s third law, the size of the orbit and the period are related: the square of the period is proportional to the cube of the radius. Because we’re working in AUs and years, the proportionality constant is 1. If we denote the radius of Mars’ orbit by r, then its orbit can be described by

r exp(2πi (r-3/2 t ))

Here we pick our initial time so that at t = 0 the two planets are aligned.

The distance between the planets is just the absolute value of the difference between their positions:

| exp(2πit) – r exp(2πi (r-3/2 t)) |

The following code computes and plots the distance from Earth to Mars over time.

from scipy import exp, pi, absolute, linspace
import matplotlib.pyplot as plt

    def earth(t):
        return exp(2*pi*1j*t)

    def mars(t):
        r = 1.524 # semi-major axis of Mars orbit in AU
        return r*exp(2*pi*1j*(r**-1.5*t))

    def distance(t):
        return absolute(earth(t) - mars(t))

    x = linspace(0, 20, 1000)
    plt.plot(x, distance(x))
    plt.xlabel("Time in years")
    plt.ylabel("Distance in AU")
    plt.ylim(0, 3)

And the output looks like this:

Notice that the distance varies from about 0.5 to about 2.5. That’s because the radius of Mars’ orbit is about 1.5 AU. So when the planets are exactly in phase, they are 0.5 AU apart and when they’re exactly out of phase they are 2.5 AU apart. In other words the distance ranges from 1.5 – 1 to 1.5 + 1.

The distance function seems to be periodic with period about 2 years. We can do a little calculation by hand to show that is the case and find the period exactly.

The distance squared is the distance times its complex conjugate. If we let ω = -3/2 then the distance squared is

d2(t) = (exp(2πit) – r exp(2πiωt)) (exp(-2πit) – r exp(-2πiωt))

which simplifies to

1 + r2 – 2r cos(2π(1 – ω)t)

and so the (squared) distance is periodic with period 1/(1 – ω) = 2.13.

Notice that the plot of distance looks more angular at the minima and more rounded near the maxima. Said another way, the distance changes more rapidly when the planets leave their nearest approach than their furthest approach. You can prove this by taking square root of d2(t) and computing its derivative.

Let f(t) = 1 + r2 – 2r cos(2π(1 – ω)t). By the chain rule, the derivative of the square root of  f(t) is 1/2  f(t)-1/2 f‘(t). Near a maximum or a minimum, f‘(t) takes on the same values. But the term f(t)-1/2 is largest when f(t) is smallest and vice versa because of the negative exponent.

Permutations and tests

Suppose a test asks you to place 10 events in chronological order. Label these events A through J so that chronological order is also alphabetical order.

If a student answers BACDEFGHIJ, then did they make two mistakes or just one? Two events are in the wrong position, but they made one transposition error. The simplest way to grade such a test would be to count the number of events that are in the correct position. Is this the most fair way to grade?

If you decide to count how many transpositions are needed to correct a student’s answer, do you count any transposition or only adjacent transpositions? For example, if someone answered JBCDEFGHIA, then transposing the A and the J is enough to put the results in order. But reversing the first and last event seems like a bigger mistake than reversing the first two events. Counting only adjacent transpositions would penalize this mistake more. You would have to swap the J with each of the eight letters between J and A. But it hardly seems that answering JBCDEFGHIA is eight times worse than answering BACDEFGHIJ.

Maybe counting transpositions is too much work. So we just go back to counting how many events are in the right place. But then suppose someone answers JABCDEFGHI. This is completely wrong since every event is in the wrong position. But the student obviously knows something, since the relative order of nearly all of the events is correct. From one perspective there was only one mistake: J comes last, not first.

What is the worst possible answer? Maybe getting the order exactly backward? If you have an odd number of events, then getting the order backward means one event is in the right place, and so that doesn’t receive the lowest possible score.

This is an interesting problem beyond grading exams. (As for grading exams, I’d suggest simply not using questions of this type on an exam.) In manufacturing, how serious a mistake is it to reverse two consecutive components versus two distant components? You could also ask the same question when comparing DNA sequences or other digital signals. The best way to assign a distance between the actual and desired sequence would depend entirely on context.

Fibonacci formula for pi

Here’s an unusual formula for pi based on the product and least common multiple of the first m Fibonacci numbers.


\pi = \lim_{m\to\infty} \sqrt{\frac{6 \log F_1 \cdots F_m}{\log \mbox{lcm}( F_1, \ldots, F_m )}}

Unlike the formula I wrote about a few days ago relating Fibonacci numbers and pi, this one is not as simple to prove. The numerator inside the root is easy enough to estimate asymptotically, but estimating the denominator depends on the distribution of primes.

Source: Yuri V. Matiyasevich and Richard K. Guy, A new formula for π, American Mathematical Monthly, Vol 93, No. 8 (October 1986), pp. 631-635.


Fibonacci numbers, arctangents, and pi

Here’s an unusual formula for π. Let Fn be the nth Fibonacci number. Then

\pi = 4 \sum_{n=1}^\infty \arctan\left( \frac{1}{F_{2n+1}} \right)

As mysterious as this equation may seem, it’s not hard to prove. The arctangent identity

\arctan\left(\frac{1}{F_{2n+1}}\right) = \arctan\left(\frac{1}{F_{2n}}\right) - \arctan\left(\frac{1}{F_{2n+2}}\right)

shows that the sum telescopes, leaving only the first term, arctan(1) = π/4. To prove the arctangent identity, take the tangent of both sides, use the addition law for tangents, and use the Fibonacci identity

F_{n+1} F_{n-1} - F_n^2 = (-1)^n

See this post for an even more remarkable formula relating Fibonacci numbers and π.

Number of digits in n!

The other day I ran across the fact that 23! has 23 digits. That made me wonder how often n! has n digits.

There can only be a finite number of cases, because n! grows faster than 10n for n > 10, and it’s reasonable to guess that 23 might be the largest case. Turns out it’s not, but it’s close. The only cases where n! has n digits are 1, 22, 23, and 24. Once you’ve found these by brute force, it’s not hard to show that they must be the only ones because of the growth rate of n!.

Is there a convenient way to find the number of digits in n! without having to compute n! itself? Sure. For starters, the number of digits in the base 10 representation of a number x is

⌊ log10 x ⌋ + 1.

where ⌊ z ⌋ is the floor of z, the largest integer less than or equal to z. The log of the factorial function is easier to compute than the factorial itself because it won’t overflow. You’re more likely to find a function to compute the log of the gamma function than the log of factorial, and more likely to find software that uses natural logs than logs base 10. So in Python, for example, you could compute the number of digits with this:

from scipy.special import gammaln
from math import log, floor

def digits_in_factorial(n):
    return floor( gammaln(n+1)/log(10.0) ) + 1

What about a more elementary formula, one that doesn’t use the gamma function? If you use Stirling’s approximation for factorial and take log of that you should at least get a good approximation. Here it is again in Python:

from math import log, floor, pi

def stirling(n):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(10) ) + 1

The code above is exact for every n > 2 as far as I’ve tested, up to n = 1,000,000. (Note that one million factorial is an extremely large number. It has 5,565,709 digits. And yet we can easily say something about this number, namely how many digits it has!)

The code may break down somewhere because the error in Stirling’s approximation or the limitations of floating point arithmetic. Stirling’s approximation gets more accurate as n increases, but it’s conceivable that a factorial value could be so close to a power of 10 that the approximation error pushes it from one side of the power of 10 to the other. Maybe that’s not possible and someone could prove that it’s not possible.

You could extend the code above to optionally take another base besides 10.

def digits_in_factorial(n, b=10):
    return floor( gammaln(n+1)/log(b) ) + 1

def stirling(n, b=10):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(b) ) + 1

The code using Stirling’s approximation still works for all n > 2, even for b as small as 2. This is slightly surprising since the number of bits in a number is more detailed information than the number of decimal digits.

Doubly and triply periodic functions

A function f is periodic if there exists a constant period ω such that f(x) = f(x + ω) for all x. For example, sine and cosine are periodic with period 2π.

There’s only one way a function on the real line can be periodic. But if you think of functions of a complex variable, it makes sense to look at functions that are periodic in two different directions. Sine and cosine are periodic as you move horizontally across the complex plane, but not if you move in any other direction. But you could imagine a function that’s periodic vertically as well as horizontally.

A doubly periodic function satisfies f(x) = f(x + ω1) and f(x) = f(x + ω2) for all x and for two different fixed complex periods, ω1 and ω2, with different angular components, i.e. the two periods are not real multiples of each other. For example, the two periods could be 1 and i.

How many doubly periodic functions are there? The answer depends on how much regularity you require. If you ask that the functions be differentiable everywhere as functions of a complex variable (i.e. entire), the only doubly periodic functions are constant functions [1]. But if you relax your requirements to allow functions to have singularities, there’s a wide variety of functions that are doubly periodic. These are the elliptic functions. They’re periodic in two independent directions, and meromorphic (i.e. analytic except at isolated poles). [2]

What about triply periodic functions? If you require them to be meromorphic, then the only triply periodic functions are constant functions. To put it another way, if a meromorphic function is periodic in three directions, it’s periodic in every direction for every period, i.e. constant. If a function has three independent periods, you can construct a sequence with a limit point where the function is constant, and so it’s constant everywhere.

* * *

[1] Another way to put this is to say that elliptic functions must have at least one pole inside the parallelogram determined by the lines from the origin to ω1 and ω2. A doubly periodic function’s values everywhere are repeats of its values on this parallelogram. If the function were continuous over this parallelogram (i.e. with no poles) then it would be bounded over the parallelogram and hence bounded everywhere. But Liovuille’s theorem says a bounded entire function must be constant.

[2] We don’t consider arbitrary singularities, only isolated poles. There are doubly periodic functions with essential singularities, but these are outside the definition of elliptic functions.

Generalization of Fibonacci ratios

Each Fibonacci number is the sum of its two predecessors. My previous post looked at generalizing this to the so-called Tribonacci numbers, each being the sum of its three predecessors. One could keep going, defining the Tetrabonacci numbers and in general the n-Fibonacci numbers for any n at least 2.

For the definition to be complete, you have to specify the first n of the n-Fibonacci numbers. However, these starting values hardly matter for our purposes. We want to look at the limiting ratio of consecutive n-Fibonacci numbers, and this doesn’t depend on the initial conditions. (If you were determined, you could find starting values where this isn’t true. It’s enough to pick integer initial values, at least one of which is not zero.)

As shown in the previous post, the ratio is the largest eigenvalue of an n by n matrix with 1’s on the first row and 1’s immediately below the main diagonal. The characteristic polynomial of such a matrix is

λn – λn-1 – λn-2 – … -1

and so we look for the largest zero of this polynomial. We can sum the terms with negative coefficients as a geometric series and show that the eigenvalues satisfy

λn – 1/(2 – λ) = 0.

So the limiting ratio of consecutive n-Fibonacci numbers is the largest root of the above equation. You could verify that when n = 2, we get the golden ratio φ as we should, and when n = 3 we get around 1.8393 as in the previous post.

As n gets large, the limiting ratio approaches 2. You can see this by taking the log of the previous equation.

n = -log(2 – λ)/log(λ).

As n goes to infinity, λ must approach 2 so that the right side of the equation also goes to infinity.

Power method and Fibonacci numbers

Take an n × n matrix A and a vector x of length n. Now multiply x by A, then multiply the result by A, over and over again. The sequence of vectors generated by this process will converge to an eigenvector of A. (An eigenvector is a vector whose direction is unchanged when multiplied by A. Multiplying by A may stretch or shrink the vector, but it doesn’t rotate it at all. The amount of stretching is call the corresponding eigenvalue.)

The eigenvector produced by this process is the eigenvector corresponding to the largest eigenvalue of A, largest in absolute value. This assumes A has a unique eigenvector associated with its largest eigenvalue. It also assumes you’re not spectacularly unlucky in your choice of vector to start with.

Assume your starting vector x has some component in the direction of the v, the eigenvector corresponding to the largest eigenvalue. (The vectors that don’t have such a component lie in an n-1 dimensional subspace, which would has measure zero. So if you pick a starting vector at random, with probability 1 it will have some component in the direction we’re after. That’s what I meant when I said you can’t start with a spectacularly unlucky initial choice.) Each time you multiply by A, the component in the direction of v gets stretched more than the components orthogonal to v. After enough iterations, the component in the direction of v dominates the other components.

What does this have to do with Fibonacci numbers? The next number in the Fibonacci sequence is the sum of the previous two. In matrix form this says

\left[\begin{array}{c} x_{n+1} \\\ x_{n} \end{array}\right] = \left[\begin{array}{cc} 1 & 1 \\\ 1 & 0 \end{array}\right] \left[\begin{array}{c} x_{n} \\\ x_{n-1} \end{array}\right]

The ratio of consecutive Fibonacci numbers converges to the golden ratio φ because φ is the largest eigenvalue of the matrix above.

The first two Fibonacci numbers are 1 and 1, so the Fibonacci sequence corresponds to repeatedly multiplying by the matrix above, starting with the initial vector x = [1 1]T. But you could start with any other vector and the ratio of consecutive terms would converge to the golden ratio, provided you don’t start with a vector orthogonal to [1 φ]T. Starting with any pair of integers, unless both are zero, is enough to avoid this condition, since φ is irrational.

We could generalize this approach to look at other sequences defined by a recurrence relation. For example, we could look at the “Tribonacci” numbers. The Tribonacci sequence starts out 1, 1, 2, and then each successive term is the sum of the three previous terms. We can find the limiting ratio of Tribonacci numbers by finding the largest eigenvalue of the matrix below.

\left[\begin{array}{ccc} 1 & 1 & 1 \\\ 1 & 0 & 0 \\\ 0 & 1 & 0 \end{array}\right]

This eigenvalue is the largest root of x3x2x – 1 = 0, which is about 1.8393. As before, the starting values hardly matter. Start with any three integers, at least one of them non-zero, and define each successive term to be the sum of the previous three terms. The ratio of consecutive terms in this series will converge to 1.8393.

By the way, you could compute the limiting ratio of Tribonacci numbers with the following bit of Python code:

      from scipy import matrix, linalg
      M = matrix([[1, 1, 1], [1, 0, 0], [0, 1, 0]])
      print( linalg.eig(M) )

Update: The next post generalizes this one to n-Fibonacci numbers.

Casting out sevens

A while back I wrote about a method to test whether a number is divisible by seven. I recently ran across another method for testing divisibility by 7 in Martin Gardner’s book The Unexpected Hanging and Other Mathematical Diversions. The method doesn’t save too much effort compared to simply dividing by 7, but it’s interesting. It looks a little mysterious at first, though the explanation of why it works is very simple.

Suppose you want to find whether a number n is divisible by 7. Start with the last digit of n and write a 1 under the last digit, and a 3 under the next to last digit. The digits under the digits of n cycle through 1, 3, 2, 6, 4, 5, repeatedly until there is something under each digit in n. Now multiply each digit of n by the digit under it and add the results.

For example, suppose n = 1394. Write the digits 1, 3, 2, and 6 underneath, from right to left:


The sum we need to compute is 1*6 + 3*2 + 9*3 + 4*1 = 6 + 6 + 27 + 4 = 43.

This sum, 43 in our example, has the same remainder when divided by 7 as the original number, 1394 in our example. Since 43 is not divisible by 7, neither is 1394. Not only that, the result of our method has the same remainder by 7 as the number we started with. In our example, 43 leaves a remainder of 1 by 7, so 1394 also leaves a remainder of 1.

You could apply this method repeatedly, though in this case 43 is small enough that it’s easy enough to see that it leaves a remainder of 1.

Suppose you started with a 1000-digit number n. Each digit is no more than 9, and is being multiplied by a number no more than 6. So the sum would be less than 54000. So you’ve gone from a 1000-digit number to at most a 5-digit number in one step. One or two more steps should be enough for the remainder to be obvious.

Why does this method work? The key is that the multipliers 1, 3, 2, 6, 4, 5 are the remainders when the powers of 10 are divided by 7. Since 106 has a remainder of 1 when divided by 7, the numbers 106a+b and 10b have the same remainder by 7, and that’s why the multipliers have period 6.

All the trick is doing is expanding the base 10 representation of a number and adding up the remainders when each term is divided by seven. In our example, 1394 = 1000 + 3*100 + 9*10 + 4, and mod 7 this reduces to 1*6 + 3*2 + 9*3 + 4*1, the exact calculation above.

The trick presented here is analogous to casting out nines. But since every power of 10 leaves a remainder of 1 when divided by 9, all the multipliers in casting out nines are 1.

You could follow the pattern of this method to create a divisibility rule for any other divisor, say 13 for example, by letting the multipliers be the remainders when powers of 10 are divided by 13.

Related post: Divisibility rules in hex

Computing square triangular numbers

The previous post stated a formula for f(n), the nth square triangular number (i.e. the nth triangular number that is also a square number):

((17 + 12√2)n + (17 – 12√2)n – 2)/32

Now 17 – 12√2 is 0.029… and so the term (17 – 12√2)n approaches zero very quickly as n increases. So the f(n) is very nearly

((17 + 12√2)n – 2)/32

The error in the approximation is less than 0.001 for all n, so the approximation is exact when rounded to the nearest integer. So the nth square triangular number is

⌊((17 + 12√2)n +14)/32⌋

where ⌊x⌋ is the greatest integer less than x.

Here’s how you might implement this in Python:

    from math import sqrt, floor

    def f(n):
        x = 17 + 12*sqrt(2)
        return floor((x**n + 14)/32.0)

Unfortunately this formula isn’t that useful in ordinary floating point computation. When n = 11 or larger, the result is needs more bits than are available in the significand of a floating point number. The result is accurate to around 15 digits, but the result is longer than 15 digits.

The result can also be computed with a recurrence relation:

f(n) = 34 f(n-1) – f(n-2) + 2

for n > 2. (Or n > 1 if you define f(0) to be 0, a reasonable thing to do).

This only requires integer arithmetic, so it’s only limitation is the size of the integers you can compute with. Since Python has arbitrarily large integers, the following works for very large integers.

    def f(n):
        if n < 2:
            return n
        return f(n-1)*34 - f(n-2) + 2

This is a direct implementation, though it’s inefficient because of the redundant function evaluations. It could be made more efficient by, for example, using memoization.

When is a triangle a square?

Of course a triangle cannot be a square, but a triangular number can be a square number.

A triangular number is the sum of the first so many positive integers. For example, 10 is a triangular number because it equals 1+2+3+4. These numbers are called triangle numbers because you can form a triangle by having a row of one coin, two coin, three coins, etc. forming a triangle.

The smallest number that is both triangular and square is 1. The next smallest is 36. There are infinitely many numbers that are both triangular and square, and there’s even a formula for the nth number that is both a triangle and a square:

((17 + 12√2)n + (17 – 12√2)n – 2)/32

Source: American Mathematical Monthly, February 1962, page 169.

For more on triangle numbers and their generalizations, see Twelve Days of Christmas and Tetrahedral Numbers.

There is also a way to compute the square triangular numbers recursively discussed in the next post.

Attributing changes to numerator and denominator

This afternoon I helped someone debug a financial spreadsheet. One of the reasons spreadsheets can be so frustrating to work with is that assumptions are hard to see. You have to click on cells one at a time to find formulas, then decode cell coordinates into their meanings.

The root problem turned out to be an assumption that sounds reasonable. You’re making two changes, one to a numerator and one to a denominator. The total change equals the sum of the results of each change separately. Except that’s not so.

At this point, a mathematician would say “Of course you can’t split the effects like that. It’s nonlinear.” But it’s worth pursuing a little further. For one thing, it doesn’t help a general audience to just say “it’s nonlinear.” For another, it’s worth seeing when it is appropriate, at least approximately, to attribute the effects this way.

You start with a numerator and denominator, N/D, then change N to N+n and change D to D+d. The total change is then (N+n)/(D+d) – N/D.

The result from only the change in the numerator is n/D. The result from only the change in denominator is N/(D+d) – N/D.

The difference between the total change and the sum of the two partial changes is


The assumption that you can take the total change and attribute it to each change separately is wrong in general. But it is correct if n or d is zero, and it is approximately correct with nd is small. This can make the bug harder to find. It could also be useful when nd is indeed small and you don’t need to be exact.

Also, if all the terms are positive, the discrepancy is negative, i.e. the total change is less than the sum of the partial changes. Said another way, allocating the change to each cause separately over-estimates the total change.


Last digit of largest known prime

In my previous post, we looked at the largest known prime,  P = 257885161 – 1, and how many digits it has in various bases. This post looks at how to find the last digit of P in base b. We assume b < P. (If b = P then the last digit is 0, and if b > P the last digit is P.)

If b is a power of 2, we showed in the previous post that the last digit of P is b-1.

If b is odd, we can find the last digit using Euler’s totient theorem. Let φ(b) be the number of positive integers less than b and relatively prime to b. Then Euler’s theorem tells us that 2φ(b) ≡ 1 mod b since b is odd. So if r is the remainder when 57885161 is divided by φ(b), then the last digit of Q = P+1 in base b is the same as the last digit of 2r in base b.

For example, suppose we wanted to know the last digit of P in base 15. Since φ(15) = 8, and the remainder when 57885161 is divided by 8 is 1, the last digit of Q base 15 is 2. So the last digit of P is 1.

So we know how to compute the last digit in base b if b is a power or 2 or odd. Factor out all the powers of 2 from b so that b = 2k d and d is odd. We can find the last digit base 2k, and we can find the last digit base d, so can we combine these to find the last digit base b? Yes, that’s exactly what the Chinese Remainder Theorem is for.

To illustrate, suppose we want to find the last digit of P in base 12. P ≡ 3 mod 4 and P ≡ 1 mod 3, so P ≡ 7 mod 12. (The numbers are small enough here to guess. For a systematic approach, see the post mentioned above.) So the last digit of P is 7 in base 12.

If you’d like to write a program to play around with this, you need to be able to compute φ(n). You may have an implementation of this function in your favorite environment. For example, it’s sympy.ntheory.totient in Python. If not, it’s easy to compute φ(n) if you can factor n. You just need to know two things about φ. First, it’s a multiplicative function, meaning that if a and b are relatively prime, then φ(ab) = φ(a) φ(b). Second, if p is a prime, then φ(pk) = pkpk-1.