Powers of 3 + √2

Yesterday’s post looked at the distribution of powers of x mod 1. For almost all x > 1 the distribution is uniform in the limit. But there are exceptions, and the post raised the question of whether 3 + √2 is an exception.

A plot made it look like 3 + √2 is an exception, but that turned out to be a numerical problem.

A higher precision calculation showed that the zeros on the right end of the plot were erroneous.

So this raises the question of how to calculate (3 + √2)n accurately for large n. The way I created the second plot was to use bc to numerically calculate the powers of 3 + √2. In this post, I’ll look at using Mathematica to calculate the powers symbolically.

For all positive integers n,

(3 + √2)n = an + bn√2

where an and bn are positive integers. We want to compute the a and b values.

If you ask Mathematica to compute (3 + √2)n it will simply echo the expression. But if you use the Expand function it will give you want you want. For example

    Expand[(3 + Sqrt[2])^10]

returns

    1404491 + 993054 √2

We can use the Coefficient function to split a + b √2 into a and b.

    parts[n_] := 
        Module[{x = (3 + Sqrt[2])^n}, 
            {Coefficient[x, Sqrt[2], 0], Coefficient[x, Sqrt[2], 1]}]

Now parts[10] returns the pair {1404491, 993054}.

Here’s something interesting. If we set

(3 + √2)n = an + bn√2

as above, then the two halves of the expression on the right are asymptotically equal. That is, as n goes to infinity, the ratio

anbn√2

converges to 1.

We can see this by defining

    ratio[n_] := 
        Module[ {a = Part[ parts[n], 1], b = Part[parts[n], 2]}, 
        N[a / (b Sqrt[2])]]

and evaluating ratio at increasing values of n. ratio[12] returns 1.00001 and ratio[13] returns 1, not that the ratio is exactly 1, but it is as close to 1 as a floating point number can represent.

This seems to be true more generally, as we can investigate with the following function.

    ratio2[p_, q_, r_, n_] := 
        Module[{x = (p + q Sqrt[r])^n}, 
            N[Coefficient[x, Sqrt[r], 0]/(Coefficient[x, Sqrt[r], 1] Sqrt[r])]]

When r is a prime and

(p + q√r)n = an + bnr

then it seems that the ratio an / bnr converges to 1 as n goes to infinity. For example, ratio2[3, 5, 11, 40] returns 1, meaning that the two halves of the expression for (3 + 5√11)n are asymptotically equal.

I don’t know whether the suggested result is true, or how to prove it if it is true. Feels like a result from algebraic number theory, which is not something I know much about.

Update: An anonymous person on X suggested a clever and simple proof. Observe that

\begin{align*} a_n &= \frac{(3+\sqrt{2})^n + (3-\sqrt{2})^n}{2} \\ b_n\sqrt{2} &= \frac{(3 + \sqrt{2})^n - (3-\sqrt{2})^n}{2} \end{align*}

In this form it’s clear that the ratio an / bn √2 converges to 1, and the proof can be generalized to cover more.

Multiples and powers mod 1

For a positive real number x, the expression x mod 1 means the fractional part of x:

x mod 1 = x − ⌊ x ⌋.

This post will look at the distributions of nx mod 1 and xn mod 1 for n = 1, 2, 3, ….

The distribution of nx mod 1 is easy to classify. If x is rational then nx will cycle through a finite number of values in the interval [0, 1]. If x is irrational then nx will be uniformly distributed in the interval.

The distribution of xn mod 1 is more subtle. We have three basic facts.

  1. If 0 ≤ x < 1, then xn → 0 as n → ∞.
  2. If x = 1 then xn = 1 for all n.
  3. For almost all x > 1 the sequence xn mod 1 is uniformly distributed. But for some values of x it is not, and there’s no known classification for these exceptional values of x.

The rest of the post will focus on the third fact.

Suppose x = √2. Then for all even n, xn mod 1 = 0. The sequence isn’t uniformly distributed in [0, 1] because half the sequence is piled up at 0.

For another example, let’s look at powers of the golden ratio φ = (1 + √5)/2. For even values of n the sequence φn converges to 1, and for odd values of n it converges to 0. You can find a proof here.

At one time it was not known whether xn mod 1 is uniformly distributed for x = 3 + √2.  (See HAKMEM, item 32.) I don’t know whether this is still unknown. Let’s see what we can tell from a plot.

Apparently sometime around n = 25 the sequence suddenly converges to 0. That looks awfully suspicious. Let’s calculate x25 using bc.

    $ echo '(3 + sqrt(2))^25' | bc -l
    13223107213151867.43024035874391918714

This says x25 mod 1 should be around 0.43, not near 0. The reason we’re seeing all zeros is that all the bits in the significand are being used to store the integer part and none are left over for the fractional part. A standard floating point number has 53 bits of significand and

(3 + √2)25 > 253.

For more details, see Anatomy of a floating point number.

Now let’s see what happens when we calculate xn mod 1 correctly using bc. Here’s the revised plot.

Is this uniformly distributed? Well, it’s not obviously not uniformly distributed. But we can’t tell by looking at a plot because uniform distribution is an asymptotic property.

We didn’t give the definition of a uniformly distributed sequence until now because an intuitive understanding of the term was sufficient. Now we should be more precise.

Given a set E, a sequence ω, and an integer N, define A(E, ω, N) to be the number of elements in the first N elements of the sequence ω that fall inside E. We say ω is uniformly distributed mod 1 if for every 0 ≤ ab ≤ 1,

limN → ∞ A([a, b), ω, N) / N = ba.

That is, the relative port of points that fall in an interval is equal to the length of the interval.

Now let’s go back to the example x = √2. We know that the powers of this value of x are not uniformly distributed mod 1. But we also said that for almost all x > 1 the powers of x are uniformly distributed. That means every tiny little interval around √2 contains a number y such that the powers of y are uniformly distributed mod 1. Now if y is very close to √2 and we plot the first few values of yn mod 1 then half of the values will be near 0. The sequence will not look uniformly distributed, though it is uniformly distributed in the limit.

Legendre and Ethereum

What does an eighteenth century French mathematician have to do with the Ethereum cryptocurrency?

A pseudorandom number generator based on Legendre symbols, known as Legendre PRF, has been proposed as part of a zero knowledge proof mechanism to demonstrate proof of custody in Ethereum 2.0. I’ll explain what each of these terms mean and include some Python code.

The equation x² = a mod p is solvable for some values of a and not for others. The Legendre symbol

\left(\frac{a}{p}\right)

is defined to be 1 if a has a square root mod p, and −1 if it does not. Here a is a positive integer and p is a (large) prime [1]. Note that this is not a fraction, though it looks like one.

As a varies, the Legendre symbols vary randomly. Not literally randomly, of course, but the act random enough to be useful as a pseudorandom number generator.

Legendre PRF

Generating bits by computing Legendre symbols is a lot more work than generating bits using a typical PRNG, so what makes the Legendre PRF of interest to Ethereum?

Legendre symbols can be computed fairly efficiently. You wouldn’t want to use the Legendre PRF to generate billions of random numbers for some numerical integration computation, but for a zero knowledge proof you only need to generate a few dozen bits.

To prove that you know a key k, you can generate a string of pseudorandom bits that depend on the key. If you can do this for a few dozen bits, it’s far more likely that you know the key than that you have guessed the bits correctly. Given k, the Legendre PRF computes

L_k(x) = \frac{1}{2}\left( 1 - \left( \frac{k+x}{p}\right) \right)

for n consecutive values of x [2].

One reason this function is of interest is that it naturally lends itself to multiparty computation (MPC). The various parties could divide up the range of x values that each will compute.

The Legendre PRF has not been accepted to be part of Ethereum 2.o. It’s only been proposed, and there is active research into whether it is secure enough.

Python code

Here’s a little Python scrypt to demonstrate using Lk(x).

    from sympy import legendre_symbol

    def L(x, k, p):
        return (1 - legendre_symbol(x + k, p)) // 2

    k = 20250626
    p = 2**521 - 1
    print([L(x, k, p) for x in range(10)])

This produces the following.

    [1, 1, 1, 1, 0, 0, 1, 0, 0, 1]

Related posts

[1] There is a third possibility: the Legendre symbol equals 0 if a is a multiple of p. We can safely ignore this possibility for this post.

[2] The Legendre symbol takes on the values ±1 and so we subtract this from 1 to get values {0, 2}, then divide by 2 to get bits {0, 1}.

Golden powers revisited

Years ago I wrote a post Golden powers are nearly integers. The post was picked up by Hacker News and got a lot of traffic. The post was commenting on a line from Terry Tao:

The powers φ, φ2, φ3, … of the golden ratio lie unexpectedly close to integers: for instance, φ11 = 199.005… is unusually close to 199.

In the process of writing my recent post on base-φ numbers I came across some equations that explain exactly why golden powers are nearly integers.

Let φ be the golden ratio and ψ = −1/φ. That is, φ and ψ are the larger and smaller roots of

x² − x − 1 = 0.

Then powers of φ reduce to an integer and an integer multiple of φ. This is true for negative powers of φ as well, and so powers of ψ also reduce to an integer and an integer multiple of ψ. And in fact, the integers alluded to are Fibonacci numbers.

φn = Fn φ + Fn − 1
ψn = Fn ψ + Fn − 1

These equations can be found in TAOCP 1.2.8 exercise 11.

Adding the two equations leads to [1]

φn = Fn + 1 + Fn − 1 − ψn

So yes, φn is nearly an integer. In fact, it’s nearly the sum of the (n + 1)st and (n − 1)st Fibonacci numbers. The error in this approximation is −ψn, and so the error decreases exponentially with alternating signs.

Related posts

[1] φn + ψn = Fn (φ + ψ) + 2 Fn − 1 = Fn + 2 Fn − 1 = Fn + Fn − 1 + Fn − 1 = Fn + 1 + Fn − 1

Golden ratio base numbers

It is possible to express every positive integer as a sum of powers of the golden ratio φ using each power at most once. This means it is possible to create a binary-like number system using φ as the base with coefficients of 0 and 1 in front of each power of φ.

This system is sometimes called phinary because of the analogy with binary. I’ll use that term here rather than more formal names such as base-φ or golden base number system.

An interesting feature of phinary is that in general you need to include negative powers of φ to represent positive integers. For example,

2 = \varphi + \varphi^{-2}

and so you could write 2 in this system as 10.01.

To state things more formally, every positive integer n satisfies the following equation where a finite number of coefficients coefficients ak are equal to 1 and the rest are equal to 0.

n = \sum_{k=-\infty}^\infty a_k\varphi^k

The golden ratio satisfies φ² = φ + 1 and so phinary representations are not unique. But if you add the rule that number representations must not have consecutive 1s, then representations are unique, analogous to the Fibonacci number system.

The original paper describing the phinary system [1] is awkwardly written. It has the flavor of “Here are some examples. You can see how this generalizes.” rather than a more typical mathematical style.

The end of the article says “Jr. High School 246 Brooklyn, N.Y.” and so when I got to that point I thought the style was due to the paper having been written by a public school teacher rather than a professional mathematician. I later learned from [2] that the author was not a math teacher but a student: George Bergman was 12 years old when he discovered and published his number system.

Phinary is not as simple to develop as you might expect. Bergman’s discovery was impressive, and not only because he was 12 years old at the time. You can find more sophisticated developments in [2] and in [3], but both require a few preliminaries and are not simple.

***

[1] George Bergman. A Number System with an Irrational Base. Mathematics Magazine31 (2): 98–110. 1957.

[2] Cecil Rousseau. The Phi Number System Revisited. Mathematics Magazine, Vol. 68, No. 4 (Oct., 1995), pp. 283-284

[3] Donald Knuth. The Art of Computer Programming, volume 1.

Additive and multiplicative persistence

Casting out nines is a well-known way of finding the remainder when a number is divided by 9. You add all the digits of a number n. And if that number is bigger than 9, add all the digits of that number. You keep this up until you get a number less than 9.

This is an example of persistence. The persistence of a number, relative to some operation, is the number of times you have to apply that operation before you reach a fixed point, a point where applying the operation further makes no change.

The additive persistence of a number is the number of times you have to take the digit sum before getting a fixed point (i.e. a number less than 9). For example, the additive persistence of 8675309 is 3 because the digits in 8675309 sum to 38, the digits in 38 sum to 11, and the digits in 11 sum to 2.

The additive persistence of a number in base b is bounded above by its iterated logarithm in base b.

The smallest number with additive persistence n is sequence A006050 in OEIS.

Multiplicative persistence is analogous, except you multiply digits together. Curiously, it seems that multiplicative persistence is bounded. This is true for numbers with less than 30,000 digits, and it is conjectured to be true for all integers.

The smallest number with multiplicative persistence n is sequence A003001 in OEIS.

Here’s Python code to compute multiplicative persistence.

def digit_prod(n):
    s = str(n)
    prod = 1
    for d in s:
        prod *= int(d)
    return prod

def persistence(n):
    c = 0
    while n > 9:
        n = digit_prod(n)
        print(n)
        c += 1
    return c   

You could use this to verify that 277777788888899 has persistence 11. It is conjectured that no number has larger persistence larger than 11.

The persistence of a large number is very likely 1. If you pick a number with 1000 digits at random, for example, it’s very likely that at least of these digits will be 0. So it would seem that the probability of a large number having persistence larger than 11 would be incredibly small, but I would not have expected it to be zero.

Here are plots to visualize the fixed points of the numbers less than N for increasing values of N.


It’s not surprising that zeros become more common as N increases. And a moments thought will explain why even numbers are more common than odd numbers. But it’s a little surprising that 4 is much less common than 2, 6, or 8.

Incidentally, we have worked in base 10 so far. In base 2, the maximum persistence is 1: when you multiply a bunch of 0s and 1s together, you either get 0 or 1. It is conjectured that the maximum persistence in base 3 is 3.

False witnesses

Pinocchio talking to a judge

Fermat’s primality test

Fermat’s little theorem says that if p is a prime and a is not a multiple of p, then

ap−1 = 1 (mod p).

The contrapositive of Fermat’s little theorem says if

ap−1 ≠ 1 (mod p)

then either p is not prime or a is a multiple of p. The contrapositive is used to test whether a number is prime. Pick a number a less than p (and so a is clearly not a multiple of p) and compute ap−1 mod p.  If the result is not congruent to 1 mod p, then p is not a prime.

So Fermat’s little theorem gives us a way to prove that a number is composite: if ap−1 mod p equals anything other than 1, p is definitely composite. But if ap−1 mod p does equal 1, the test is inconclusive. Such a result suggests that p is prime, but it might not be.

Examples

For example, suppose we want to test whether 57 is prime. If we apply Fermat’s primality test with a = 2 we find that 57 is not prime, as the following bit of Python shows.

    >>> pow(2, 56, 57)
    4

Now let’s test whether 91 is prime using a = 64. Then Fermat’s test is inconclusive.

    >>> pow(64, 90, 91)
    1

In fact 91 is not prime because it factors into 7 × 13.

If we had used a = 2 as our base we would have had proof that 91 is composite. In practice, Fermat’s test is applied with multiple values of a (if necessary).

Witnesses

If ap−1 = 1 (mod p), then a is a witness to the primality of p. It’s not proof, but it’s evidence. It’s still possible p is not prime, in which case a is a false witness to the primality of p, or p is a pseudoprime to the base a. In the examples above, 64 is a false witness to the primality of 91.

Until I stumbled on a paper [1] recently, I didn’t realize that on average composite numbers have a lot of false witnesses. For large N, the average number of false witnesses for composite numbers less than N is greater than N15/23. Although N15/23 is big, it’s small relative to N when N is huge. And in applications, N is huge, e.g. N = 22048 for RSA encryption.

However, your chances of picking one of these false witnesses, if you were to choose a base a at random, is small, and so probabilistic primality testing based on Fermat’s little theorem is practical, and in fact common.

Note that the result quoted above is a lower bound on the average number of false witnesses. You really want an upper bound if you want to say that you’re unlikely to run into a false witness by accident. The authors in [1] do give upper bounds, but they’re much more complicated expressions than the lower bound.

Related posts

[1] Paul Erdős and Carl Pomerance. On the Number of False Witnesses for a Composite Number. Mathematics of Computation. Volume 46, Number 173 (January 1986), pp. 259–279.

The bad version of a good test

Ever since 1952, the largest known primes have all had the form 2n − 1, with one exception in 1989. The reason the largest known primes have this form is that it is easier to test whether these numbers are prime than other numbers.

A number of the form 2n − 1 is called a Mersenne number, denoted Mn. A Mersenne prime is a Mersenne number that is also prime. There is a theorem that says that if Mn is prime then n must be prime.

Lehmer’s theorem

Derrick Henry Lehmer proved in 1930 that Mp is prime if p is an odd prime and Mp divides sp−2 where the s terms are defined recursively as follows. Define s0 = 4 and sn = sn−1² − 2 for n > 0. This is the Lucas-Lehmer primality test for Mersenne numbers, the test alluded to above.

Let’s try this out on M7 = 27 − 1 = 127. We need to test whether 127 divides s5. The first six values of sn are

4, 14, 194, 37634, 1416317954, 2005956546822746114

and indeed 127 does divide 2005956546822746114. As you may have noticed, s5 is a big number, and s6 will be a lot bigger. It doesn’t look like Lehmer’s theorem is going to be very useful.

The missing piece

Here’s the idea that makes the Lucas-Lehmer [1] test practical. We don’t need to know  sp−2 per se; we only need to know whether it is divisible by Mp. So we can carry out all our calculations mod Mp. In our example, we only need to calculate the values of sn mod 127:

4, 14, 67, 42, 111, 0

Much better.

Lucas-Lehmer test takes the remainder by Mp at every step along the way when computing the recursion for the s terms. The naive version does not, but computes the s terms directly.

To make the two versions of the test explicit, here are Python implementations.

Naive code

def lucas_lehmer_naive(p):
    M = 2**p - 1
    s = 4
    for _ in range(p-2):
        s = (s*s - 2) 
    return s % M  == 0

Good code

def lucas_lehmer(p):
    M = 2**p - 1
    s = 4
    for _ in range(p-2):
        s = (s*s - 2) % M
    return s == 0

The bad version

How bad is the naive version of the Lucas-Lehmer test?

The OEIS sequence A003010 consists of the sn. OEIS gives the following formula:

s_n = \left\lceil (2 + \sqrt{3})^{2^n}\right\rceil

This shows that the sn grow extremely rapidly, which suggests the naive algorithm will hit a wall fairly quickly

When I used the two versions of the test above to verify that the first few Mersenne primes Mp really are prime, the naive version gets stuck after p = 19. The better version immediately works through p = 9689 before slowing down noticeably.

Historical example

Let’s look at M521. This was the first Mersenne number verified by a computer to be prime. This was done on the vacuum tube-based SWAC computer in 1952 using the smart version of the Lucas-Lehmer test.

Carrying out the Lucas-Lehmer test to show that this number is prime requires doing modular arithmetic with 521-bit numbers, which was well within the 9472 bits of memory in the vacuum tube-based SWAC computer that proved M521 was prime.

But it would not be possible now, or ever, to verify that M521 is prime using the naive version of the Lucas-Lehmer test because we could not store, much less compute, s519.

Since

\log_{2} s_{519} = 2^{519} \log_{2} (2 + \sqrt{3}) \approx 2^{520}

we’d need around 2520 bits to store s519. The number of bits we’d need is itself a 157-digit number. We’d need more bits than there are particles in the universe, which has been estimated to be on the order of 1080.

The largest Mersenne prime that the SWAC could verify using the naive Lucas-Lehmer test would have been M13 = 8191, which was found to be prime in the Middle Ages.

Related posts

[1] Édouard Lucas conjectured in 1878 the theorem that Lehmer proved in 1930.

Distribution of run times for Euclidean algorithm

The worst case run time for the Euclidean algorithm occurs when you give the algorithm a pair of consecutive Fibonacci numbers. The algorithm takes n steps to compute the greatest common divisor of Fn+1 and Fn+2.

The nth Fibonacci number is the nearest integer to φn/√5 where φ = (1 + √5)/2 is the golden ratio. You can take logs and solve for the largest n such that Fn is less than x, subtract 2, and have an upper bound on the number of steps necessary to find the gcd of two numbers less than x.

But what about the average run time, or the distribution of run times?

For large x, the distribution of number of steps needed to calculate the gcd of two numbers 0 < abx using the Euclidean algorithm is approximately normal with mean

12 log(2) log(x) / π².

See, for example, [1].

Let’s illustrate this with a simulation. Here’s Python code to return the number of steps used in computing gcd(ab).

    def euclid(a, b):
        steps = 0
        while a != 0:
            a, b = b%a, a
            steps += 1
        return steps # gcd = b

I generated 10,000 pairs of random integers less than 263 (because the maximum signed 64 bit integer is 263 − 1) and plotted a histogram of the run times.

I got a nice bell curve with mean 36.3736. The theoretical mean is 0.8428 log(263) = 36.8021.

[1] Doug Hensley. The Number of Steps in the Euclidean Algorithm. Journal of Number Theory 49. 142–182 (1994).

Equipentatonic scale

I ran across a video that played around with the equipentatonic scale [1]. Instead of dividing the octave into 12 equal parts, as is most common in Western music, you divide the octave into 5 equal parts. Each note in the equipentatonic scale has a frequency 21/5 times its predecessor.

The equipentatonic scale is used in several non-Western music systems. For example, the Javanese slendro tuning system is equipentatonic, as is Babanda music from Uganda.

In the key of C, the nearest approximants of the notes in the equipentatonic scale are C, D, F, G, A#. Approximate equipentatonic scale

In the image above [2], the D is denoted as half sharp, 50 cents higher than D. (A cent is 1/100 of a half step.) The actual pitch is a D plus 40 cents, so the half sharp is closer, but still not exactly right.

The F should be 20 cents lower, the G should be 20 cents higher, and the A# should be 10 cents higher.

Notation

The conventional chromatic scale is denoted 12-TET, an abbreviation for 12 tone equal temperament. In general n-TET denotes the scale that results from dividing the octave into n parts. The previous discussion was looking at how 5-TET aligns with 12-TET.

A step in 5-TET corresponds to 2.4 steps in 12-TET. This is approximately 5 steps in 24-TET, the scale we’d get by adding quarter tones between all the notes in the chromatic scale.

Math

When we talk of dividing the octave evenly into n parts, we mean evenly on a logarithmic scale because we perceive music on a log scale.

The notation will be a little cleaner if we start counting from 0. Then the kth note the n-TET scale is proportional to 2k/n.

The proportionality constant is the starting pitch. So if we start on middle C, the frequencies are 262 × 2k/n Hz. The nth frequency is twice the starting frequency, i.e. an octave higher.

We can measure how well an m-TET scale can be approximated by notes from an n-TET scale with the following function:

d(m, n) = \max_j \min_k \left|\frac{j}{m} - \frac{k}{n} \right|

Note that this function is asymmetric: d(mn) does not necessarily equal d(nm). For example, d(12, 24) = 0 because a quarter tone scale contains exact matches for every note in a semitone scale. But d(24, 12) = 1/24 because the quarter tone scale contains notes in between the notes of the semitone scale.

The equipentatonic scale lines up better with the standard chromatic scale than would a 7-note scale or an 11-note scale. That is, d(5, 12) is smaller than d(7, 12) or d(11, 12). Something similar holds if we replace 12 with 24: d(5, 24) is smaller than d(m, 24) for any m > 1 that is relatively prime to 24.

Related posts

[1] The video first presents 10-TET and then defines 5-TET as taking every other note from this scale.

[2] The image was created with the following Lilypond code.

\score {
  \new Staff {
    \relative c' {
      c1 dih f g aih c \bar "|."
    }
  }
  \layout {
    \context {
      \Staff
      \remove "Bar_engraver"
      \remove "Time_signature_engraver"
    }
  }
}