Adding an imaginary unit to a finite field

Let p be a prime number. Then the integers mod p form a finite field.

The number of elements in a finite field must be a power of a prime, i.e. the order q = pn for some n. When n > 1, we can take the elements of our field to be polynomials of degree n − 1 with coefficients in the integers mod p.

Addition works just as you’d expect addition to work, adding coefficients mod p, but multiplication is a little more complicated. You multiply field elements by multiplying their polynomial representatives, but then you divide by an irreducible polynomial and take the remainder.

When n = 2, for some p you can define the field by adding an imaginary unit.

When you can and cannot adjoin an i

For some finite fields of order p, you can construct a field of order p² by joining an element i to the field, very much the way you form the complex numbers from the real numbers. For example, you can create a field with 49 elements by taking pairs of (a, b) of integers mod 7 and multiplying them as if they were abi. So

(ab) * (cd) = (ac − bdadbc).

This is equivalent to choosing the polynomial x² + 1 as your irreducible polynomial and following every polynomial multiplication by taking the remainder modulo x² + 1.

This works for a field with 49 elements, but not for a field of 25 elements. That’s because over the integers mod 5 the polynomial x² + 1 already has a root. Two of them in fact: x = 2 or x = 3. So you could say that mod 5, i = 2. Or i = 3 if you prefer. You can still form a field of 25 elements by taking pairs of elements from a field of 5 elements, but you have to choose a different polynomial as your irreducible polynomial because x² + 1 is not irreducible because

x² + 1 = (x − 2)(x + 2)

when working over the integers mod 5. You could use

x² + x + 1

as your irreducible polynomial. To prove that this polynomial is irreducible mod 5, plug in the numbers 0, 1, 2, 3, and 4 and confirm that none of them make the polynomial equal 0.

In general, you can create a field of order p² by adjoining an element i if and only if p = 3 mod 4.

Next we’ll look at an example of making a very large finite field even larger by adding an imaginary element.

Example from Ethereum

The Ethereum virtual machine has support for a pairing—more on that in a future post—of two elliptic curves, BN254 and alt_bn128. The BN254 curve is defined by

y² = x³ + 3

over the field Fp, the integers mod p, where

p = 21888242871839275222246405745257275088696311157297823662689037894645226208583.

The curve alt_bn128 is defined by

y² = x³ + 3/(9 + i)

over the field Fp[i], i.e. the field Fp, with an element i adjoined. Note the that last two digits of p are 83, and so p is congruent to 3 mod 4.

Special point on curve

The Ethereum documentation (EIP-197) singles out a particular point (xy) on alt_bn128:

xabi
ycdi

where

a = 10857046999023057135944570762232829481370756359578518086990519993285655852781
b = 11559732032986387107991004021392285783925812861821192530917403151452391805634
c = 8495653923123431417604973247489272438418190587263600148770280649306958101930
d = 4082367875863433681332203403145435568316851327593401208105741076214120093531.

We will show that this point is on the curve as an exercise in working in the field Fp[i]. We’ll write Python code from scratch, not using any libraries, so all the details will be explicit.

def add(pair0, pair1, p):
    a, b = pair0
    c, d = pair1
    return ((a + c) % p, (b + d) % p)

def mult(pair0, pair1, p):
    a, b = pair0
    c, d = pair1
    return ((a*c - b*d) % p, (b*c + a*d) % p)

p = 21888242871839275222246405745257275088696311157297823662689037894645226208583
a = 10857046999023057135944570762232829481370756359578518086990519993285655852781
b = 11559732032986387107991004021392285783925812861821192530917403151452391805634
c = 8495653923123431417604973247489272438418190587263600148770280649306958101930
d = 4082367875863433681332203403145435568316851327593401208105741076214120093531

# Find (e, f) such that (e, f)*(9, 1) = (1, 0).
# 9e - f = 1
# e + 9f = 0
# Multiply first equation by 9 and add.
e = (9*pow(82, -1, p)) % p
f = (-e*pow(9, -1, p)) % p
prod = mult((e, f), (9, 1), p)
assert(prod[0] == 1 and prod[1] == 0)

y2 = mult((c, d), (c, d), p)
x3 = mult((a, b), mult((a, b), (a, b), p), p)
rhs = add(x3, mult((3, 0), (e, f), p), p)

assert(y2[0] == rhs[0])
assert(y2[1] == rhs[1])

Related posts


			

Four generalizations of the Pythagorean theorem

Here are four theorems that generalize the Pythagorean theorem. Follow the links for more details regarding each equation.

1. Theorem by Apollonius for general triangles.

a^2 + b^2 = 2(m^2 + h^2)

2. Edsgar Dijkstra’s extension of the Pythagorean theorem for general triangles.

\text{sgn}(\alpha + \beta - \gamma) = \text{sgn}(a^2 + b^2 - c^2)

3. A generalization of the Pythagorean theorem to tetrahedra.

V_0^2 = \sum_{i=1}^n V_i^2

4. A unified Pythagorean theorem that covers spherical, plane, and hyperbolic geometry.

A(c) = A(a) + A(b) - \kappa \frac{A(a) \, A(b)}{2\pi}

Elementary symmetric polynomials and optimization

The mth elementary symmetric polynomial of degree n

e_m(x_1, x_2, \ldots, x_n)

is the sum of all terms containing a product of m variables. So, for example,

\begin{align*} e_1(w, x, y, z) &= w + x + y + z \\ e_2(w, x, y, z) &= wx + wy + wz + xy + xz + yz \\ e_3(w, x, y, z) &= xyz + wyz + wxz + wxy \\ e_4(w, x, y, z) &= wxyz \end{align*}
These polynomials came up in the previous post. The problem was choosing weights to minimize the variance of a weighted sum of random variables can be solved using elementary symmetric polynomials.

To state the optimization problem more generally, suppose you want to minimize

t_1^2 x_1 + t_2^2x_2 + \cdots + t_n^2 x_n

where the ti and xi are positive and the ti sum to 1. You can use Lagrange multipliers to show that the solution is

t_i = \frac{e_n(x_1, x_2, \cdots, x_n)}{x_i \,e_{n-1}(x_1, x_2, \cdots, x_n)}

Related posts

Weighting an average to minimize variance

Suppose you have $100 to invest in two independent assets, A and B, and you want to minimize volatility. Suppose A is more volatile than B. Then putting all your money on A would be the worst thing to do, but putting all your money on B would not be the best thing to do.

The optimal allocation would be some mix of A and B, with more (but not all) going to B. We will formalize this problem and determine the optimal allocation, then generalize the problem to more assets.

Two variables

Let X and Y be two independent random variables with finite variance and assume at least one of X and Y is not constant. We want to find t that minimizes

\text{Var}[tX + (1-t)Y]

subject to the constraint 0 ≤ t ≤ 1. Because X and Y are independent,

\text{Var}[tX + (1-t)Y] = t^2 \text{Var}[X] + (1-t)^2 \text{Var}[Y]

Taking the derivative with respect to t and setting it to zero shows that

t = \frac{\text{Var}[Y]}{\text{Var}[X] + \text{Var}[Y]}

So the smaller the variance on Y, the less we allocate to X. If Y is constant, we allocate nothing to X and go all in on Y.  If X and Y have equal variance, we allocate an equal amount to each. If X has twice the variance of Y, we allocate 1/3 to X and 2/3 to Y.

Multiple variables

Now suppose we have n independent random variables Xi for i running from 1 to n, and at least one of the variables is not constant. Then we want to minimize

\text{Var}\left[ \sum_{i=1}^n t_i X_i \right] = \sum_{i=1}^n t_i^2 \text{Var}[X_i]

subject to the constraint

\sum_{i=1}^n t_i = 1

and all ti non-negative. We can solve this optimization problem with Lagrange multipliers and find that

t_i \text{Var}[X_i] = t_j \text{Var}[X_j]

for all 1 ≤ i, jn. These (n − 1) equations along with the constraint that all the ti sum to 1 give us a system of equations whose solution is

t_i = \frac{\prod_{j \ne i} \text{Var}[X_j]}{\sum_{i = 1}^n \prod_{j \ne i} \text{Var}[X_j]}

Incidentally, the denominator has a name: the (n − 1)st elementary symmetric polynomial in n variables. More on this in the next post.

Related posts

Rolling correlation

Suppose you have data on the closing prices of two stocks over 1,000 days and you want to look at the correlation between the two asset prices over time in rolling 30 day windows.

It seems that the rolling correlation is periodic. peaking about every 50 days.

But this is an artifact of the rolling window, not a feature of the data. I created the two simulated stock time series by creating random walks. The price of the stock each day is the price the previous day plus a sample from a normal random variable with mean zero and variance 1.

import numpy as np
from scipy.stats import norm

n = 1000
x = np.cumsum(norm.rvs(size=n))
y = np.cumsum(norm.rvs(size=n))    

If you use a wider window, say 60 days, you’ll still see a periodic pattern in the rolling correlation, though with lower frequency.

Related posts

Analog of Heron’s formula on a sphere

The area of a triangle can be computed directly from the lengths of its sides via Heron’s formula.

A = \sqrt{s(s-a)(s-b)(s-c)}

Here s is the semiperimeter, s = (abc)/2.

Is there an analogous formula for spherical triangles? It’s not obvious there should be, but there is a formula by Simon Antoine Jean L’Huilier (1750–1840).

\tan^2 \frac{S}{4} = \tan \frac{s}{2} \tan \frac{s-a}{2} \tan \frac{s-b}{2} \tan \frac{s-c}{2}

Here we denote area by S for surface area, rather than A because in the context of spherical trigonometry A usually denotes the angle opposite side a. The same convention applies in plane trigonometry, but the potential for confusion is greater in L’Huilier’s formula since the area appears inside a tangent function.

Now tan θ ≈ θ for small θ, and so L’Huilier’s formula reduces to Heron’s formula for small triangles.

Imagine the Earth as a sphere of radius 1 and take a spherical triangle with one vertex at the north pole and two vertices on the equator 90° longitude apart. Then a = b = c = π/2 and s = 3π/4. Such a triangle takes of 1/8 of the Earth’s surface area of 4π, so the area S is π/2. You can verify that L’Huilier’s formula gives the correct area.

It’s not a proof, but it’s a good sanity check that L’Huilier’s formula is correct for small triangles and for at least one big triangle.

How much is a gigawatt?

There’s increasing talk of gigawatt data centers. Currently the largest data center, Switch’s Citadel Campus in Nevada, uses 850 megawatts of power. OpenAI’s Stargate data center, under construction, is supposed to use 1.2 gigawatts.

Gigawatt

An average French nuclear reactor produces about a gigawatt of power. If the US were allowed build nuclear reactors, we could simply build one reactor for every gigantic data center. Unfortunately, the Nuclear Regulatory Commission essentially prohibits the construction of profitable nuclear reactors.

An American home uses about 1200 watts of power, so a gigawatt of electricity could power 800,000 homes. So roughly, a gigawatt is a megahome.

Gigawatt-year

A gigawatt is a unit of power, not energy. Energy is power over some time period.

A gigawatt-year is about 3 × 1016 joules, or 30 petajoules.

A SpaceX Starship launch releases 50 terajoules of energy, so a gigawatt-year is 60 Starship launches.

A couple months ago I wrote about illustrating cryptographic strength in terms of the amount of energy needed to break it, and how much water that much energy would boil. Let’s do something similar for a gigawatt-year.

It takes about 300 kilojoules of energy to boil a liter of water [1], so 30 petajoules would boil 100 billion liters of water. So a gigawatt-year of energy would be enough to boil Coniston Water, the third largest lake in England.

If you could convert a kilogram of matter to energy according to Emc², this would release 90 petajoules. So a gigawatt-year is the energy in about 300 grams of matter.

 

[1] In detail, boiling a liter of water is defined as increases the temperature from 20° C to 100° C at sea level.

 

Japanese polygon theorem

Here’s an interesting theorem that leads to some aesthetically pleasing images. It’s known as the Japanese cyclic polygon theorem.

For all triangulations of a cyclic polygon, the sum of inradii of the triangles is constant. Conversely, if the sum of inradii is independent of the triangulation, then the polygon is cyclic.

The image above shows two triangulations of an irregular hexagon. By the end of the post we’ll see the code that created these images, and we’ll see one more triangulations. And we’ll see that the sum of the radii of the circles is the same in each image.

Glossary

In case any of the terms in the theorem above are unfamiliar, here’s a quick glossary.

A triangulation of a polygon is a way to divide the polygon into triangles, with the restriction that the triangle vertices come from the polygon vertices.

A polygon is cyclic if there exists a circle that passes through all of its vertices. Incidentally, if a polygon has n + 2 sides, the number of possible triangulations is the nth Catalan number. More on that here.

The inradius of a triangle is the radius of the largest circle that fits inside the triangle. More on inner and outer radii here.

Equations for inradius and incenter

We’d like to illustrate the theorem by drawing some images and by adding up the inradii. This means we need to be able to find the inradius and the incenter.

The inradius of a triangle equals its area divided by its semiperimeter. The area we can find using Heron’s formula. The semiperimeter is half the perimeter.

The incenter is the weighted sum of the vertices, weighted by the lengths of the opposite sides, and normalized. If the vertices are A, B, and C, then the incenter is

\frac{aA + bB + cC}{a + b + c}

where A is the vertex opposite side a, B is the vertex opposite side b, and C is the vertex opposite side c.

Python code

The following Python code will draw a triangle with its incircle. Calling this function for each triangle in the triangulation will create the illustrations we’re after.

from numpy import *
import matplotlib.pyplot as plt

def draw_triangle_with_incircle(A, B, C):
    a = linalg.norm(B - C)
    b = linalg.norm(A - C)
    c = linalg.norm(A - B)
    s = (a + b + c)/2
    r = sqrt(s*(s-a)*(s-b)*(s-c))/s
    center = (a*A + b*B + c*C)/(a + b + c)
    plt.plot([A[0], B[0]], [A[1], B[1]], color="C0")
    plt.plot([B[0], C[0]], [B[1], C[1]], color="C0")
    plt.plot([A[0], C[0]], [A[1], C[1]], color="C0")
    t = linspace(0, 2*pi)
    plt.plot(r*cos(t) + center[0], r*sin(t) + center[1], color="C2")
    return r

Next, we pick six points not quite evenly spaced around a circle.

angle = [0, 50, 110, 160, 220, 315] # degrees
p = [array([cos(deg2rad(angle[i])), sin(deg2rad(angle[i]))]) for i in range(6)]

Now we draw three different triangulations of the hexagon defined by the points above. We also print the sum of the inradii to verify that each sum is the same.

def draw_polygon(triangles):
    rsum = 0
    for t in triangles:
        rsum += draw_triangle_with_incircle(p[t[0]], p[t[1]], p[t[2]])        
    print(rsum)
    plt.axis("off")
    plt.gca().set_aspect("equal")
    plt.show()

draw_polygon([[0,1,2], [0,2,3], [0,3,4], [0,4,5]])
draw_polygon([[0,1,2], [2,3,4], [4,5,0], [0,2,4]])
draw_polygon([[0,1,2], [0,2,3], [0,3,5], [3,4,5]])

This produces the two images at the top of the post the one below. All the inradii sum are 1.1441361217691244 with a little variation in the last decimal place.

 

Related posts

Tetrahedral analog of the Pythagorean theorem

A tetrahedron has four triangular faces. Suppose three of those faces come together like the corner of a cube, each perpendicular to the other. Let A1, A2, and A3 be the areas of the three triangles that meet at this corner and let A0 be the area of remaining face, the one opposite the right corner.

De Gua’s theorem, published in 1783, says

A0² = A1² + A2² + A3².

We will illustrate De Gua’s theorem using Python.

from numpy import *

def area(p1, p2, p3):
    return 0.5*abs(linalg.norm(cross(p1 - p3, p2 - p3)))

p0 = array([ 0, 0,  0])
p1 = array([11, 0,  0])
p2 = array([ 0, 3,  0])
p3 = array([ 0, 0, 25])

A0 = area(p1, p2, p3)
A1 = area(p0, p2, p3)
A2 = area(p0, p1, p3)
A3 = area(p0, p1, p2)

print(A0**2 - A1**2 - A2**2 - A3**2)

This prints 0, as expected.

Higher dimensions

A natural question is whether De Gua’s theorem generalizes to higher dimensions, and indeed it does.

Suppose you have a 4-simplex where one corner fits in the corner of a hypercube. The 4-simplex has 5 vertices. If you leave out any vertex, the remaining 4 points determine a tetrahedron. Let V0 be the volume of the tetrahedron formed by leaving out the vertex at the corner of the hypercube, and let V1, V2, V3, and V4 be the volumes of the tetrahedra formed by dropping out each of the other vertices. Then

V0² = V1² + V2² + V3² + V4².

You can extend the theorem to even higher dimensions analogously.

Let’s illustrate the theorem for the 4-simplex in Python. The volume of a tetrahedron can be computed as

V = det(G)1/2/6

where G is the Gram matrix computed in the code below.

def volume(p1, p2, p3, p4):
    u = p1 - p4
    v = p2 - p4
    w = p3 - p4

    # construct the Gram matrix
    G = array( [
            [dot(u, u), dot(u, v), dot(u, w)],
            [dot(v, u), dot(v, v), dot(v, w)],
            [dot(w, u), dot(w, v), dot(w, w)] ])

    return sqrt(linalg.det(G))/6

p0 = array([ 0, 0,  0, 0])
p1 = array([11, 0,  0, 0])
p2 = array([ 0, 3,  0, 0])
p3 = array([ 0, 0, 25, 0])
p4 = array([ 0, 0,  0, 7])

V0 = volume(p1, p2, p3, p4)
V1 = volume(p0, p2, p3, p4)
V2 = volume(p0, p1, p3, p4)
V3 = volume(p0, p1, p2, p4)
V4 = volume(p0, p1, p2, p3)

print(V0**2 - V1**2 - V2**2 - V3**2 - V4**2)

This prints -9.458744898438454e-11. The result is 0, modulo the limitations of floating point arithmetic.

Numerical analysis

Floating point arithmetic is generally accurate to 15 decimal places. Why is the numerical error above relatively large? The loss of precision is the usual suspect: subtracting nearly equal numbers. We have

V0 = 130978.7777777778

and

V1**2 + V2**2 + V3**2 + V4**2 = 130978.7777777779

Both results are correct to 16 decimal places, but when we subtract them we lose all precision.