# Golden ratio primes

The golden ratio is the larger root of the equation

φ² – φ – 1 = 0.

By analogy, golden ratio primes are prime numbers of the form

p = φ² – φ – 1

where φ is an integer. To put it another way, instead of solving the equation

φ² – φ – 1 = 0

over the real numbers, we’re looking for prime numbers p where the equation can be solved in the integers mod p.

## Application

When φ is a large power of 2, these prime numbers are useful in cryptography because their special form makes modular multiplication more efficient. (See the previous post on Ed448.) We could look for such primes with the following Python code.

    from sympy import isprime

for n in range(1000):
phi = 2**n
q = phi**2 - phi - 1
if isprime(q):
print(n)


This prints 19 results, including n = 224, corresponding to the golden ratio prime in the previous post. This is the only output where n is a multiple of 32, which was useful in the design of Ed448.

## Golden ratio primes in general

Of course you could look for golden ratio primes where φ is not a power of 2. It’s just that powers of 2 are the application where I first encountered them.

A prime number p is a golden ratio prime if there exists an integer φ such that

p = φ² – φ – 1

which, by the quadratic theorem, is equivalent to requiring that m = 4p + 5 is a square. In that case

φ = (1 + √m)/2.

Here’s some code for seeing which primes less than 1000 are golden ratio primes.

    from sympy import primerange

def issquare(m):
return int(m**0.5)**2 == m

for p in primerange(2, 1000):
m = 4*p + 5
if issquare(m):
phi = (int(m**0.5) + 1) // 2
assert(p == phi**2 - phi - 1)
print(p)


By the way, there are faster ways to determine whether an integer is a square. See this post for algorithms.

(Update: Aaron Meurer pointed out in the comments that SymPy has an efficient function sympy.ntheory.primetest.is_square for testing whether a number is a square.)

Instead of looping over primes and testing whether it’s possible to solve for φ, we could loop over φ and test whether φ leads to a prime number.

    for phi in range(1000):
p = phi**2 - phi - 1
if isprime(p):
print(phi, p)


## Examples

The smallest golden ratio prime is p = 5, with φ = 3.

Here’s a cute one: the pi prime 314159 is a golden ratio prime, with φ = 561.

The golden ratio prime that started this rabbit trail was the one with φ = 2224, which Mike Hamburg calls the Goldilocks prime in his design of Ed448.

# Goldilocks and the three multiplications Mike Hamburg designed an elliptic curve for use in cryptography he calls Ed448-Goldilocks. The prefix Ed refers to the fact that it’s an Edwards curve. The number 448 refers to the fact that the curve is over a prime field where the prime p has size 448 bits. But why Goldilocks?

## Golden primes and Goldilocks

The prime in this case is

p = 2448 – 2224 – 1,

which has the same form as the NIST primes. Hamburg says in his paper

I call this the “Goldilocks” prime because its form defines the golden ratio φ = 2224.

That sentence puzzled me. What does this have to do with the golden ratio? The connection is that Hamburg’s prime is of the form

φ² – φ – 1.

The roots of this polynomial are the golden ratio and its conjugate. But instead of looking for real numbers where the polynomial is zero, we’re looking for integers where the polynomial takes on a prime value. (See the followup post on golden ratio primes.)

The particular prime that Hamburg uses is the “Goldilocks” prime by analogy with the fairy tale: the middle term 2224 is just the right size. He explains

Because 224 = 32*7 = 28*8 = 56*4, this prime supports fast arithmetic in radix 228 or 232 (on 32-bit machines) or 256 (on 64-bit machines). With 16, 28-bit limbs it works well on vector units such as NEON. Furthermore, radix-264 implementations are possible with greater efficiency than most of the NIST primes.

## Karatsuba multiplication

The title of this post is “Goldilocks and the three multiplications.” Where do the three multiplications come in? It’s an allusion to an algorithm for multi-precision multiplication that lets you get by with three multiplications where the most direct approach would require four. The algorithm is called Karatsuba multiplication .

Hamburg says “The main advantage of a golden-ratio prime is fast Karatsuba multiplication” and that if we set φ = 2224 then Note that the first line on the right side involves four multiplications, but the bottom line involves three. Since the variables represent 224-bit numbers, removing a multiplication at the expense of an extra addition and subtraction is a net savings .

The most important line of the calculation above, and the only one that isn’t routine, is the second. That’s where the special form of p comes in. When you eliminate common terms from both sides, the calculation boils down to showing that which is obviously true since p = φ² – φ – 1.

## Curve Ed448-Goldilocks

Edwards curves only have one free parameter d (besides the choice of field) since they have the form

x² + y² = 1 + d x² y².

Hamburg chose d = -39081 for reasons explained in the paper.

Most elliptic curves used in ECC currently work over prime fields of order 256 bits, providing 128 bits of security. The motivation for developed Ed448 was much the same as for developing P-384. Both work over larger fields and so provide more bits of security, 224 and 192 bits respectively.

Unlike P-384, Ed448 is a safe curve, meaning that it lends itself to a secure practical implementation.

## Related posts

 Here we’ve just applied the Karatsuba algorithm one time. You could apply it recursively to multiply two n-bit numbers in O(nk) time, where k = log2 3 ≈ 1.86. This algorithm, discovered in 1960, was the first multiplication algorithm faster than O(n²).

 Addition and subtraction are O(n) operations. And what about multiplication? That’s not an easy question. It’s no worse than O(n1.68) by virtue of the Karatsuba algorithm. In fact, it’s O(n log n), but only for impractically large numbers. See the discussion here. But in any case, multiplication is slower than addition for multiprecision numbers.

# Tricks for arithmetic modulo NIST primes

The US National Institute of Standards and Technology (NIST) originally recommended 15 elliptic curves for use in elliptic curve cryptography . Ten of these are over a field of size 2n. The other five are over prime fields. The sizes of these fields are known as the NIST primes.

The NIST curves over prime fields are named after the number of bits in the prime: the name is “P-” followed by the number of bits. The primes themselves are named p with a subscript for the number of bits.

The five NIST primes are

p192 = 2192 – 264 – 1
p224 = 2224 – 296 + 1
p256 = 2256 – 2244 + 2192 + 296 – 1
p384 = 2384 – 2128 – 296 + 232 – 1
p521 = 2521 – 1

The largest of these, p521, is a Mersenne prime, and the rest are generalized Mersenne primes.

Except for p521, the exponents of 2 in the definitions of the NIST primes are all multiples of 32 or 64. This leads to efficient tricks for arithmetic modulo these primes carried out with 32-bit or 64-bit integers. You can find pseudocode implementations for these tricks in Mathematical routines for the NIST prime elliptic curves.

The elliptic curve Ed448 “Goldilocks” was not part of the original set of recommended curves from NIST but has been added. It employs a multiplication trick in the same spirit as the routines referenced above, but simpler. Ed448 uses

p = 2448 – 2224 – 1

which has the special form φ² – φ – 1 where φ = 2224. This enables a trick known as Karatsuba multiplication. More on that here.

## Related posts

 FIPS PUB 186-4. This publication is dated 2013, but the curve definitions are older. I haven’t found for certain when the curves were defined. I’ve seen one source that says 1997 and another that says 1999.

# Elliptic curve P-384

The various elliptic curves used in ellitpic curve cryptography (ECC) have different properties, and we’ve looked at several of them before. For example, Curve25519 is implemented very efficiently, and the parameters were transparently chosen. Curve1174 is interesting because it’s an Edwards curve and has a special addition formula.

This post looks at curve P-384. What’s special about this curve? It’s the elliptic curve that the NSA recommends everyone use until post-quantum methods have been standardized. It provides 192 bits of security, whereas more commonly used curves provide 128 bits.

Does the NSA recommend this method because they know how to get around it? Possibly, but they also need to recommend methods that they believe foreign governments cannot break.

The equation of the P-384 curve is

y² = x³ + ax + b

working over the field of integers modulo a prime p. We will go into each of the specific parameters ab, and p, and discuss how they were chosen.

## Modulus p

Consisting with the naming conventions for elliptic curves used in cryptography, the name “P-384” tells you that the curve is over a prime field where the prime is a 384-bit integer. Specifically, the order of the field is

p = 2384 – 2128 – 296 + 232 – 1

For a given number of bits, in this case 384, you want to pick a prime that’s relatively near the maximum size for that number of bits. In our case, our prime p is a prime near 2384 with a convenient bit pattern. (The special pattern allows implementation tricks that increase efficiency.)

Hasse’s theorem says that the number of points on a curve modulo a large prime is on the order of magnitude equal to the prime, so P-384 contains approximately 2384 points. In fact, the number of points n on the curve is

39402006196394479212279040100143613805079739270465446667946905279627659399113263569398956308152294913554433653942643

or approximately 2384 – 2190. The number n is a prime, and so it is the order of P-384 as a group.

## Linear coefficient a

According to a footnote in the standard defining P-384, FIPS PUB 186-4,

The selection a ≡ -3 for the coefficient of x was made for reasons of efficiency; see IEEE Std 1363-2000.

## Constant coefficient b

The curve P-384 has Weierstrass form

y² = x³ – 3x + b

where b is

27580193559959705877849011840389048093056905856361568521428707301988689241309860865136260764883745107765439761230575.

The parameter b is between 2383 and 2384 but doesn’t have any particular binary pattern:

101100110011000100101111101001111110001000111110111001111110010010011000100011100000010101101011111000111111100000101101000110010001100000011101100111000110111011111110100000010100000100010010000000110001010000001000100011110101000000010011100001110101101011000110010101100011100110001101100010100010111011010001100111010010101010000101110010001110110111010011111011000010101011101111

The specification says that b was chosen at random. How can you convince someone that you chose a parameter at random?

The standard gives a 160-bit seed s, and a hash-based algorithm that s was run through to create a 384-bit parameter c. Then b is the solution to

b² c = -27 mod p.

The algorithm going from the s to c is given in Appendix D.6 and is a sort of key-stretching algorithm. The standard cites ANS X9.62 and IEEE Standard 1363-2000 as the source of the algorithm.

If b was designed to have a back door, presumably a tremendous amount of computation had to go into reverse engineering the seed s.

Koblitz and Menezes wrote a paper in which they suggest a way that the NSA might have picked seeds that lead to weak elliptic curves, but then argue against it.

It is far-fetched to speculate that the NSA would have deliberately selected weak elliptic curves in 1997 for U.S. government usage … confident that no one else would be able to discover the weakness in these curves in the ensuing decades. Such a risky move by the NSA would have been inconsistent with the Agency’s mission.

# Bessel function crossings

The previous looked at the angles that graphs make when they cross. For example, sin(x) and cos(x) always cross with the same angle. The same holds for sin(kx) and cos(kx) since the k simply rescales the x-axis.

The post ended with wondering about functions analogous to sine and cosine, such as Bessel functions. This post will look at that question in more detail. Specifically we’ll look at the functions Jν and Yν.

Because these two Bessel functions satisfy the same second order linear homogeneous differential equation, the Strum separation theorem says that their zeros are interlaced: between each pair of consecutive zeros of Jν is exactly one zero of Yν, and between each pair of consecutive zeros of Yν there is exactly one zero of Jν. In the following Python code, we find zeros of Jν, then look in between for places where Jν and Yν cross. Next we find the angle the two curves make at each intersection and plot the angles.

    from scipy.special import jn_zeros, jv, yv
from scipy.optimize import bisect
from numpy import empty, linspace, arccos
import matplotlib.pyplot as plt

n = 3 # bessel function order
N = 100 # number of zeros

z = jn_zeros(n, N) # Zeros of J_n
crossings = empty(N-1)

f = lambda x: jv(n,x) - yv(n,x)
for i in range(N-1):
crossings[i] = bisect(f, z[i], z[i+1])

def angle(n, x):
# Derivatives of J_nu and Y_nu
dj = 0.5*(jv(n-1,x) - jv(n+1,x))
dy = 0.5*(yv(n-1,x) - yv(n+1,x))

top = 1 + dj*dy
bottom = ((1 + dj**2)*(1 + dy**2))**0.5
return arccos(top/bottom)

y = angle(n, crossings)
plt.plot(y)
plt.xlabel("Crossing number")
plt.ylabel("Angle in radians")
plt.show()


This shows that the angles steadily decrease, apparently quadratically. This quadratic behavior is what we should expect from the asymptotics of Jν and Yν: For large arguments they act like shifted and rescaled versions of sin(x)/√x. So if we looked at √xJν and √xYν rather than Jν and Yν we’d expect the angles to reach some positive asymptote, and they do, as shown below. # Orthogonal graphs

Colin Wright posted a tweet yesterday that said that the plots of cosine and tangent are orthogonal. Here’s a plot so you can see for yourself. Jim Simons replied with a proof so short it fits in a tweet: The product of the derivatives is -sin(x)sec²(x) = -tan(x)/cos(x), which is -1 if cos(x)=tan(x).

This made me wonder whether sine and cosine are orthogonal in the sense of graphs intersecting at right angles. They are orthogonal in the sense that their product integrates to zero over the interval [0, 2π] is zero, a fact that’s important fact in Fourier analysis, but are they orthogonal in the sense of their graphs intersecting at right angles? A graph makes this look plausible: But the graph is misleading. I made the plot without specifying the aspect ratio, using the default in Mathematica. This makes the angle between the graphs look smaller. Setting the aspect ratio to 1 shows the true picture. The two curves intersect at π/4 and 5π/4, and they intersect at an angle of 2π/3, not π/2.

The product of the slopes is not -1, but it is negative and constant, so you could multiply each function by some constant to make the product of slopes -1. Said another way, you could make the curves perpendicular by adjusting he aspect ratio.

Can you do this for other functions that are orthogonal in the inner product sense? Not easily. For example, sin(2x) and sin(3x) are orthogonal in the inner product (integral) sense, but the angles of intersection are different at different places where the curves cross.

What about other functions that are like sine and cosine? I looked at the Bessel functions J1 and J2, but the angles at the intersections vary. Ditto for Chebyshev polynomials. I suppose the difference is that sine and cosine are translations of each other, whereas that’s not the case for Bessel or Chebyshev functions. But it is true for wavelets, so you could find wavelets that are orthogonal in the sense of inner products and in the sense of perpendicular graphs.

Update: See more on how Bessel functions cross in the next post.

# Fascination burnout

Here a little dialog from Anathem by Neal Stephenson that I can relate to:

“… I don’t care …”

Asribalt was horrified. “But how can you not be fascinated by—”

“I am fascinated,” I insisted. “That’s the problem. I’m suffering from fascination burnout. Of all the things that are fascinating, I have to choose just one or two.”

# Area and volume of Menger sponge

The Menger sponge is the fractal you get by starting with a cube, dividing each face into a 3 by 3 grid (like a Rubik’s cube) and removing the middle square of each face and everything behind it. That’s M1, the Menger sponge at the 1st stage of its construction. The next stage repeats this process on all the little cubes that make up what’s left. That’s M2. Repeat the process over and over, and in the limit you get Menger’s sponge, a fractal with zero volume and infinite area!

This business of zero volume and infinite area may sound unreal, but the steps along the way to the limit are very tangible. Here’s a video by Aaron Benzel that let’s you fly through M3, and watch what happens if you split M3 apart.

You can compute the volume and area at each stage to show that and From these equations you can see that you can make the volume as small and you’d like, and the area as large as you like, by taking n big enough. And in case that sounds a little hand-wavey, we can get more concrete. Here’s a little code to find exactly how big a value of n is big enough.

    from math import log, ceil

def menger_volume(n):
return (20/27.)**n

def menger_area(n):
return 2*(20/9.)**n + 4*(8/9.)**n

def volume_below(v):
if v >=1:
return 1
else:
n = log(v)/log(20/27.)
return int(ceil(n))

def area_above(a):
if a < 2:
return 1
else:
n = (log(a) - log(2))/log(20/9.)
return int(ceil(n))

n = volume_below(0.001)
print( n, menger_volume(n) )

n =  area_above(1000)
print( n, menger_area(n) )


# Regular expression for ICD-9 and ICD-10 codes

Suppose you’re searching for medical diagnosis codes in the middle of free text. One way to go about this would be to search for each of the roughly 14,000 ICD-9 codes and each of the roughly 70,000 ICD-10 codes. A simpler approach would be to use regular expressions, though that may not be as precise.

In practice regular expressions may have some false positives or false negatives. The expressions given here have only false positives. That is, no valid ICD-9 or ICD-10 codes will go unmatched, but the regular expressions may match things that are not diagnosis codes. The latter is inevitable anyway since a string of characters could coincide with a diagnosis code but not be used as a diagnosis code. For example 1234 is a valid ICD-9 code, but 1234 in a document could refer to other things, such as a street address.

## ICD-9 diagnosis code format

Most ICD-9 diagnosis codes are just numbers, but they may also start with E or V.

Numeric ICD-9 codes are at least three digits. Optionally there may be a decimal followed by one of two more digits.

An E code begins with E and three digits. These may be followed by a decimal and one more digit.

A V code begins with a V followed by two digits. These may be followed by a decimal and one or two more digits.

Sometimes the decimals are left out.

Here are regular expressions that summarize the discussion above.

    N = "\d{3}\.?\d{0,2}"
E = "E\d{3}\.?\d?"
V = "V\d{2}\.?\d{0,2}"
icd9_regex = "|".join([N, E, V])


Usually E and V are capitalized, but they don’t have to be, so it would be best to do a case-insensitive match.

## ICD-10 diagnosis code format

ICD-10 diagnosis codes always begin with a letter (except U) followed by a digit. The third character is usually a digit, but could be an A or B . After the first three characters, there may be a decimal point, and up to three more alphanumeric characters. These alphanumeric characters are never U. Sometimes the decimal is left out.

So the following regular expression would match any ICD-10 diagnosis code.

    [A-TV-Z][0-9][0-9AB]\.?[0-9A-TV-Z]{0,4}

As with ICD-9 codes, the letters are usually capitalized, but not always, so it’s best to do a case-insensitive search.

## Testing the regular expressions

As mentioned at the beginning, the regular expressions here may have false positives. However, they don’t let any valid codes slip by. I downloaded lists of ICD-9 and ICD-10 codes from the CDC and tested to make sure the regular expressions here matched every code.

## Regular expression features used

Character ranges are supported everywhere, such as [A-TV-Z] for the letters A through T and V through Z.

Not every regular expression implementation supports \d to represent a digit. In Emacs, for example, you would have to use[0-9] instead since it doesn’t support \d.

I’ve used \.? for an optional decimal point. (The . is a special character in regular expressions, so it needs to be escaped to represent a literal period.) Some people wold write [.]? instead on the grounds that it may be more readable. (Periods are not special characters in the context of a character classes.)

I’ve used {m} for a pattern that is repeated exactly m times, and {m,n} for a pattern that is repeated between m and n times. This is supported in Perl and Python, for example, but not everywhere. You could write \d\d\d instead of \d{3} and \d?\d? instead of \d{0,2}.

## Related posts

 The only ICD-10 codes with a non-digit in the third position are those beginning with C4A, C7A, C7B, D3A, M1A, O9A, and Z3A.

# A misunderstanding of complexity

Iterating simple rules can lead to complex behavior. Many examples of this are photogenic, and so they’re good for popular articles. It’s fun to look at fractals and such. I’ve written several articles like that here, such as the post that included the image below. But there’s something in popular articles on complexity that bothers me, and it’s the following logical fallacy:

Complex systems can arise from iterating simple rules, therefore many complex systems do arise from iterating simple rules.

This may just be a popular misunderstanding of complexity theory, but I also get the impression that some people working in complexity theory implicitly fall for the same fallacy.

What fractals, cellular automata, and such systems show is that it is possible to start with simple rules and create a complex system. It says nothing about how often this happens. It does not follow that a particular complex system does in fact arise from iterating simple rules.

There’s a variation on the fallacy that says that while complex systems may not exactly come from iterating simple rules, it’s possible to approximate complex systems this way. Even that is dubious, as I discuss in The other butterfly effect.

At best I think these popular models serve as metaphors and cautionary tales. Strange attractors and such show that systems can exhibit unexpectedly complex behavior, and that forecasts can become useless when looking ahead too far. That’s certainly true more broadly.

But I’m skeptical of saying “You have a complex system. I have a complex system. Let’s see if my complex system models your complex system.” It’s often possible to draw loose analogies between complex systems, and these may be useful. But it’s not realistic to expect quantitative guidance to come out of this, such as using the Mandelbrot set to pick stocks.