How to compute the area of a polygon

If you know the vertices of a polygon, how do you compute its area? This seems like this could be a complicated, with special cases for whether the polygon is convex or maybe other considerations. But as long as the polygon is “simple,” i.e. the sides meet at vertices but otherwise do not intersect each other, then there is a general formula for the area.

The formula

List the vertices starting anywhere and moving counterclockwise around the polygon: (x1y1), (x2y2), …, (xnyn). Then the area is given by the formula below.

A = \frac{1}{2} \begin{vmatrix} x_1 & x_2 & \ldots & x_n & x_1\\ y_1 & y_2 & \ldots & y_n & y_1 \end{vmatrix}

But what does that mean? The notation is meant to be suggestive of a determinant. It’s not literally a determinant because the matrix isn’t square. But you evaluate it in a way analogous to 2 by 2 determinants: add the terms going down and to the right, and subtract the terms going up and to the right. That is,

x1 y2 + x2 y3 + … xn y1y1 x2y2 x3 – … –  yn x1

This formula is sometimes called the shoelace formula because the pattern of multiplications resembles lacing a shoe. It’s also called the surveyor’s formula because it’s obviously useful in surveying.

Numerical implementation

As someone pointed out in the comments, in practice you might want to subtract the minimum x value from all the x coordinates and the minimum y value from all the y coordinates before using the formula. Why’s that?

If you add a constant amount to each vertex, you move your polygon but you don’t change the area. So in theory it makes no difference whether you translate the polygon before computing its area. But in floating point, it can make a difference.

The cardinal rule of floating point arithmetic is to avoid subtracting nearly equal numbers. If you subtract two numbers that agree to k significant figures, you can lose up to k figures of precision. We’ll illustrate this by taking a right triangle with base 2 and height π. The area should remain π as we translate the triangle away from the origin, we lose precision the further out we translate it.

Here’s a Python implementation of the shoelace formula.

    def area(x, y):
        n = len(x)
        s = 0.0
        for i in range(-1, n-1):
            s += x[i]*y[i+1] - y[i]*x[i+1]
        return 0.5*s

If you’re not familiar with Python, a couple things require explanation. First, range(n-1) is a list of integers starting at 0 but less than n-1. Second, the -1 index returns the last element of the array.

Now, watch how the precision of the area degrades as we shift the triangle by powers of 10.

    import numpy as np

    x = np.array([0.0, 0.0, 2.0])
    y = np.array([np.pi, 0.0, 0.0])

    for n in range(0, 10):
        t = 10**n
        print( area(x+t, y+t) )

This produces


Shifting by small amounts doesn’t make a noticeable difference, but we lose between one and two significant figures each time we increase t by a multiple of 10. We only have between 15 and 16 significant figures to start with in a standard floating point number, and so eventually we completely run out of significant figures.

When implementing the shoelace formula, we want to do the opposite of this example: instead of shifting coordinates so that they’re similar in size, we want to shift them toward the origin so that they’re not similar in size.

Related post: The quadratic formula and low-precision arithmetic

Passwords and power laws

According to this paper [1], the empirical distribution of real passwords follows a power law [2]. In the authors’ terms, a Zipf-like distribution. The frequency of the rth most common password is proportional to something like 1/r. More precisely,

fr = C rs

where s is on the order of 1. The value of s that best fit the data depended on the set of passwords, but their estimates of s varied from 0.46 to 0.91.

This means that the most common passwords are very common and easy to guess.

Size of password spaces

If passwords come from an alphabet of size A and have length n, then there are An possibilities. For example, if a password has length 10 and consists of uppercase and lowercase English letters and digits, there are

6210 = 839,299,365,868,340,224

possible such passwords. If users chose passwords randomly from this set, brute force password attacks would be impractical. But brute force attacks are practical because passwords are not chosen uniformly from this large space of possibilities, far from it.

Attackers do not randomly try passwords. They start with the most common passwords and work their way down the list. In other words, attackers use Pareto’s rule.

Rules requiring, say, one upper case letter, don’t help much because most users will respond by using exactly one upper case letter, probably the first letter. If passwords must have one special character, most people will use exactly one special character, most likely at the end of the word. Expanding the alphabet size A exponentially increases the possible passwords, but does little to increase the actual number of passwords.

What’s interesting about the power law distribution is that there’s not a dichotomy between naive and sophisticated users. If there were, there would be a lot of common passwords, and all the rest uniformly distributed. Instead, there’s a continuum between the most naive and most sophisticated. That means a lot of people are not exactly naive, but not as secure as they think they are.

Some math

Under the Zipf model [3], the number of times we’d expect to see the most common password is NC where N is the size of the data set. The constant C is what it has to be for the frequencies to sum to 1. That is, C depends on the number of data points N and the exponent s and is given by

C_{N, s} = \frac{1}{\sum_{r=1}^n r^{-s}}

We can bound the sum in the denominator from above and below with integrals, as in the integral test for series convergence. This gives us a way to see more easily how C depends on its parameters.

1 + \int_1^N x^{-s} \, dx > \frac{1}{C} = \sum_{r=1}^N r^{-s} > 1 + \int_2^{N+1} x^{-r}\, dx

When s = 1 this reduces to

1 + \log(N) > \frac{1}{C} > 1 + \log(N+1) - \log(2)

and otherwise to

1 + \frac{N^{1-s} - 1}{1-s} > \frac{1}{C} > 1 + \frac{(N+1)^{1-s} - 2^{1-s}}{1-s}

Suppose you have N = 1,000,000 passwords. The range of s values found by Wang et al varied from roughly 0.5 to 0.9. Let’s first set s = 0.5. Then C is roughly 0.0005. This mean the most common password appears about 500 times.

If we keep N the same and set s to 0.9, then C is roughly 0.033, and so the most common password would appear about 33,000 times.

How to choose passwords

If you need to come up with a password, randomly generated passwords are best, but hard to remember. So people either use weak but memorable passwords, or use strong passwords and don’t try to remember them. The latter varies in sophistication from password management software down to Post-it notes stuck on a monitor.

One compromise is to concatenate a few randomly chosen words. Something like “thebestoftimes” would be weak because they are consecutive words from a famous novel. Something like “orangemarbleplungersoap” would be better.

Another compromise, one that takes more effort than most people are willing to expend, is to use Manuel Blum’s mental hash function.

Related posts

[1] In case the link rots in the future, the paper is “Zipf’s Law in Passwords” by Ding Wang, Haibo Cheng, Ping Wang, Xinyi Huang, and Gaopeng Jian. IEEE Transactions on Information Forensics and Security, vol. 12, no. 11, November 2017.

[2] Nothing follows a power law distribution exactly and everywhere. But that’s OK: nothing exactly follows any other distribution everywhere: not Gaussian, not exponential, etc. But a lot of things, like user passwords, approximately follow a power law, especially over the middle of their range. Power law’s like Zipf’s law tend to not fit as well at the beginning and the end.

[3] Here I’m using a pure Zipf model for simplicity. The paper [1] used a Zipf-like model that I’m not using here. Related to the footnote [2] above, it’s often necessary to use a minor variation on the pure Zipf model to get a better fit.

Footnote on fifth root trick

Numberphile has a nice video on the fifth root trick: someone raises a two-digit number to the 5th power, reads the number aloud, and you tell them immediately what the number was.

Here’s the trick in a nutshell. For any number n, n5 ends in the same last digit as n. You could prove that by brute force or by Euler’s theorem. So when someone tells you n5, you immediately know the last digit. Now you need to find the first digit, and you can do that by learning, approximately, the powers (10k)5 for i = 1, 2, 3, …, 9. Then you can determine the first digit by the range.

Here’s where the video is a little vague. It says that you don’t need to know the powers of 10k very accurately. This is true, but just how accurately do you need to know the ranges?

If the two-digit number is a multiple of 10, you’ll recognize the zeros at the end, and the last non-zero digit is the first digit of n. For example, if n5 = 777,600,000 then you know n is a multiple of 10, and since the last non-zero digit is 6, n = 60.

So you need to know the fifth powers of multiples of 10 well enough to distinguish (10k – 1)5 from (10k + 1)5. The following table shows what these numbers are.

| k | (10k - 1)^5   | (10k + 1)^5   |
| 1 |        59,049 |       161,051 |
| 2 |     2,476,099 |     4,084,101 |
| 3 |    20,511,149 |    28,629,151 |
| 4 |    90,224,199 |   115,856,201 |
| 5 |   282,475,249 |   345,025,251 |
| 6 |   714,924,299 |   844,596,301 |
| 7 | 1,564,031,349 | 1,804,229,351 |
| 8 | 3,077,056,399 | 3,486,784,401 |
| 9 | 5,584,059,449 | 6,240,321,451 |

So any number less than a million has first digit 1. Any number between 1 million and 3 million has first digit 2. Etc.

You could choose the following boundaries, if you like.

| k | upper boundary |
| 1 |      1,000,000 |
| 2 |      3,000,000 |
| 3 |     25,000,000 |
| 4 |    100,000,000 |
| 5 |    300,000,000 |
| 6 |    800,000,000 |
| 7 |  1,700,000,000 |
| 8 |  3,200,000,000 |
| 9 |  6,000,000,000 |

The Numberphile video says you should have someone say the number aloud, in words. So as soon as you hear “six billion …”, you know the first digit of n is 9. If you hear “five billion” or “four billion” you know the first digit is 8. If you hear “three billion” then you know to pay attention to the next number, to decide whether the first digit is 7 or 8. Once you hear the first few syllables of the number, you can stop pay attention until you hear the last syllable or two.

A strange sort of product rule

Let u be a real-valued function of n variables, and let v be a vector-valued function of n variables, a function from n variables to a vector of size n. Then we have the following product rule:

D(uv) = v Duu Dv.

It looks strange that the first term on the right isn’t Du v.

The function uv is a function from n dimensions to n dimensions, so it’s derivative must be an n by n matrix. So the two terms on the right must be n by n matrices, and they are. But Du v is a 1 by 1 matrix, so it would not make sense on the right side.

Here’s why the product rule above looks strange: the multiplication by u is a scalar product, not a matrix product. Sometimes you can think of real numbers as 1 by 1 matrices and everything works out just fine, but not here. The product uv doesn’t make sense if you think of the output of u as a 1 by 1 matrix. Neither does the product u Dv.

If you think of v as an n by 1 matrix and Du as a 1 by n matrix, everything works. If you think of v and Du as vectors, then v Du is the outer product of the two vectors. You could think of Du as the gradient of u, but be sure you think of it horizontally, i.e. as a 1 by n matrix. And finally, D(uv) and Dv are Jacobian matrices.

Update: As Harald points out in the comments, the usual product rule applies if you write the scalar-vector product uv as the matrix product vu where now are are thinking of u as a 1 by 1 matrix! Now the product rule looks right

D(vu) = Dv uv Du

but the product vu looks wrong because you always write scalars on the left. But here u isn’t a scalar!

Finite simple group example: PSL(q)

A few days ago I wrote about finite simple groups and said that I might write more about them. This post is a follow-on to that one.

In my earlier post I said that finite simple groups fell into five broad categories, and that three of these were easy to describe. This post will chip away at one of the categories I said was hard to describe briefly, namely groups of Lie type or classical groups. These are finite groups that are analogous to (continuous) Lie groups. Continue reading

Simplest exponential sum

Today‘s exponential sum curve is simply a triangle.


But yesterday‘s curve was more complex


and tomorrow‘s curve will be more complex as well.


Why is today’s curve so simple? The vertices of the curves are the partial sums of the series

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

where m is the month, d is the day, and y is the last two digits of the year. Typically the sums are too complicated to work out explicitly by hand, but today’s sum is fairly simple. We have m = 9, d = 6, and y = 18. The three fractions add to (2n + 3n² + n³)/18. Reduced mod 18, the numerators are

0, 6, 6, 6, 12, 12, 12, 0, 0, 0, 6, 6, 6, 12, 12, 12, 0, 0

The repetition in the terms of the sum leads to the straight lines in the plot. The terms in the exponential sum only take on three values, the three cube roots of 1. These three roots are 1, a, and b where

a = exp(2πi/3) = -1/2 + i√3/2


b = exp(-2πi/3) = -1/2 – i√3/2

is the complex conjugate of a.

Using Mathematica we have

    Table[ Exp[2 Pi I (2 n + 3 n^2 + n^3)/18], {n, 0, 17}] 
        /. {Exp[2 Pi I/3] -> a, Exp[-2 Pi I/3] -> b}

which produces

1, a, a, a, b, b, b, 1, 1, 1, a, a, a, b, b, b, 1, 1

When we take the partial sums, we get four points in a straight line because they differ by a:

1, 1 + a, 1 + 2a, 1 + 3a

 then three points in a straight line because they differ by b:

(1 + 3a), (1 + 3a) + b, (1 + 3a) + 2b, (1 + 3a) + 3b

and so forth.


Three ways to sum a divergent series

There’s no direct way to define the sum of an infinite number of terms. Addition takes two arguments, and you can apply the definition repeatedly to define the sum of any finite number of terms. But an infinite sum depends on a theory of convergence. Without a definition of convergence, you have no way to define the value of an infinite sum. And with different definitions of convergence, you can get different values.

In this post I’ll review two ways of assigning a meaning to divergent series that I’ve written about before, then mention a third way.

Asymptotic series

A few months ago I wrote about an asymptotic series solution to the differential equation

y' = y = \frac{1}{x}

You end up with the solution

y = \frac{1}{x} + \frac{1}{x^2} + \frac{2}{x^3} + \cdots + \frac{(n-1)!}{x^n} + \cdots

which diverges for all x. That is, for each x, the partial sums of the series do not get closer to any number that you could call the sum. In fact, the individual terms of the series eventually get bigger and bigger. Surely this is a useless solution, right?

Actually, it is useful if you change your perspective. Instead of holding x fixed and letting n go to infinity, fix a value of n and let x go to infinity. In that sense, the series converges. For fixed n and large x, this gives accurate approximations to the solution of the differential equation.

Analytic continuation

At the end of a post on Bernoulli numbers I briefly explain the interpretation of the apparently nonsensical equation

1 + 2 + 3 + … = -1/12.

In a nutshell, the Riemann zeta function is defined by a two-step process. First define

\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}

for s with real part strictly bigger than 1. Then define the zeta function for the rest of the complex plane (except the point s = 1) by analytic continuation. If the infinite sum for zeta were valid for s = -1, which is it not, then it would equal 1 + 2 + 3 + …

The analytic continuation of the zeta function is defined at -1, and there the function equals -1/12. So to make sense of the sum of the positive integers, interpret the sum as a sort of pun, a funny way to write ζ(-1).

p-adic numbers

This is the most radical way to make sense of divergent series: change your number system so that they aren’t divergent!

The sum

1 + 2 + 4 + 8 + …

diverges because the partial sums (1, 3, 7, 15, …) are not getting closer to anything. But you can make the series converge by changing the way you measure distance between numbers. That’s what p-adic numbers do. For any fixed prime number p, define the distance between two numbers as the reciprocal of the largest power of p that divides their difference. That is, numbers are close together if they differ by a large power of p. We can make sense of the sum above in the 2-adic numbers, i.e. the p-adic numbers with p = 2.

The nth partial sum of the series above is 2n – 1. The 2-adic distance between 2n – 1 and -1 is 2n, which goes to zero, so the series converges to -1.

1 + 2 + 4 + 8 + … = -1.

Note that all the partial sums are the same, whether in the real numbers or the 2-adics, but the two number systems disagree on whether the partial sums converge.

If that explanation went by too quickly, here’s a 15-minute video expands on the same derivation.

Names for extremely large numbers

Names for extremely large numbers are kinda pointless. The purpose of a name is to communicate something, and if the meaning of the name is not widely known, the name doesn’t serve that purpose. It’s even counterproductive if two people have incompatible ideas what the word means. It’s much safer to simply use scientific notation for extremely large numbers.

Obscure or confusing

Everyone knows what a million is. Any larger than that and you may run into trouble. The next large number name, billion, means 109 to some people and 1012 to others.

When writing for those who take one billion to mean 109, your audience may or may not know that one trillion is 1012 according to that convention. You cannot count on people knowing the names of larger numbers: quadrillion, quintillion, sextillion, etc.

To support this last claim, let’s look at the frequency of large number names according to Google’s Ngram viewer. Presumably the less often a word is used, the less likely people are to recognize it.

Here’s a bar chart for the data from 2005, plotting on a logarithmic scale. The chart has a roughly linear slope, which means the frequency of the words drops exponentially. Million is used more than 24,000 times as often as septillion.

Frequency of large number names on log scale

Here’s the raw data.

    | Name        |    Frequency |
    | Million     | 0.0096086564 |
    | Billion     | 0.0030243298 |
    | Trillion    | 0.0002595139 |
    | Quadrillion | 0.0000074383 |
    | Quintillion | 0.0000016745 |
    | Sextillion  | 0.0000006676 |
    | Septillion  | 0.0000003970 |

Latin prefixes for large numbers

There is a consistency to these names, though they are not well known. Using the convention that these names refer to powers of 1000, the pattern is

[Latin prefix for n] + llion = 1000n+1

So for example, billion is 10002+1 because bi- is the Latin prefix for 2. A trillion is 10003+1 because tri- corresponds to 3, and so forth. A duodecillion is 100013 because the Latin prefix duodec- corresponds to 12.

    | Name              | Meaning |
    | Billion           |   10^09 |
    | Trillion          |   10^12 |
    | Quadrillion       |   10^15 |
    | Quintillion       |   10^18 |
    | Sextillion        |   10^21 |
    | Septillion        |   10^24 |
    | Octillion         |   10^27 |
    | Nonillion         |   10^30 |
    | Decillion         |   10^33 |
    | Undecillion       |   10^36 |
    | Duodecillion      |   10^39 |
    | Tredecillion      |   10^42 |
    | Quattuordecillion |   10^45 |
    | Quindecillion     |   10^48 |
    | Sexdecillion      |   10^51 |
    | Septendecillion   |   10^54 |
    | Octodecillion     |   10^57 |
    | Novemdecillion    |   10^60 |
    | Vigintillion      |   10^63 |


Here’s the Mathematica code that was used to create the plot above.

        {0.0096086564, 0.0030243298, 0.0002595139, 0.0000074383, 
         0.0000016745, 0.0000006676, 0.0000003970}, 
         ChartLabels -> {"million", "billion", "trillion", "quadrillion", 
                         "quintillion", "sextillion", "septillion"}, 
         ScalingFunctions -> "Log", 
         ChartStyle -> "DarkRainbow"

Fraktur symbols in mathematics

When mathematicians run out of symbols, they turn to other alphabets. Most math symbols are Latin or Greek letters, but occasionally you’ll run into Russian or Hebrew letters.

Sometimes math uses a new font rather than a new alphabet, such as Fraktur. This is common in Lie groups when you want to associate related symbols to a Lie group and its Lie algebra. By convention a Lie group is denoted by an ordinary Latin letter and its associated Lie algebra is denoted by the same letter in Fraktur font.

lower case alphabet in Fraktur


To produce Fraktur letters in LaTeX, load the amssymb package and use the command \mathfrak{}.

Symbols such as \mathfrak{A} are math symbols and can only be used in math mode. They are not intended to be a substitute for setting text in Fraktur font. This is consistent with the semantic distinction in Unicode described below.


The Unicode standard tries to distinguish the appearance of a symbol from its semantics, though there are compromises. For example, the Greek letter Ω has Unicode code point U+03A9 but the symbol Ω for electrical resistance in Ohms is U+2621 even though they are rendered the same [1].

The letters a through z, rendered in Fraktur font and used as mathematical symbols, have Unicode values U+1D51E through U+1D537. These values are in the “Supplementary Multilingual Plane” and do not commonly have font support [2].

The corresponding letters A through Z are encoded as U+1D504 through U+1D51C, though interestingly a few letters are missing. The code point U+1D506, which you’d expect to be Fraktur C, is reserved. The spots corresponding to H, I, and R are also reserved. Presumably these are reserved because they are not commonly used as mathematical symbols. However, the corresponding bold versions U+1D56C through U+ID585 have no such gaps [3].


[1] At least they usually are. A font designer could choose provide different glyphs for the two symbols. I used the same character for both because some I thought some readers might not see the Ohm symbol properly rendered.

[2] If you have the necessary fonts installed you should see the alphabet in Fraktur below:
𝔞 𝔟 𝔠 𝔡 𝔢 𝔣 𝔤 𝔥 𝔦 𝔧 𝔨 𝔩 𝔪 𝔫 𝔬 𝔭 𝔮 𝔯 𝔰 𝔱 𝔲 𝔳 𝔴 𝔵 𝔶 𝔷

I can see these symbols from my desktop and from my iPhone, but not from my Android tablet. Same with the symbols below.

[3] Here are the bold upper case and lower case Fraktur letters in Unicode:
𝖆 𝖇 𝖈 𝖉 𝖊 𝖋 𝖌 𝖍 𝖎 𝖏 𝖐 𝖑 𝖒 𝖓 𝖔 𝖕 𝖖 𝖗 𝖘 𝖙 𝖚 𝖛 𝖜 𝖝 𝖞 𝖟

King of high dimensional space

rook, king, knight

Which has more mobility on an empty chessboard: a rook, a king or a knight?

On an ordinary 8 by 8 chessboard, a rook can move to any one of 14 squares, no matter where it starts. A knight and a king both have 8 possible moves from the middle of the board, fewer moves from an edge.

If we make the board bigger or higher dimensional, what happens to the relative mobility of each piece? Assume we’re playing chess on a hypercube lattice, n points along each edge, in d dimensions. So standard chess corresponds to n = 8 and d = 2.

A rook can move to any square in a coordinate direction, so it can choose one of n-1 squares in each of d dimensions, for a total of (n-1)d possibilities.

A king can move a distance of 0, 1, or -1 in each coordinate. In d dimensions, this gives him 3d – 1 options. (Minus one for the position he’s initially on: moving 0 in every coordinate isn’t a move!)

What about a knight? There are C(d, 2) ways to choose two coordinate directions. For each pair of directions, there are 8 possibilities: two ways to choose which direction to move two steps in, and then whether to move + or – in each direction. So there are 4d(d – 1) possibilities.

In short:

  • A king’s mobility increases exponentially with dimension and is independent of the size of the board.
  • A rook’s mobility increases linearly with dimension and linearly with the size of the board.
  • A knight’s mobility increases quadratically with dimension and independent of the size of the board.

The rules above are not the only way to generalize chess rules to higher dimensions. Here’s an interesting alternative way to define a knight’s moves: a knight can move to any lattice point a distance √5 away. In dimensions up to 4, this corresponds to the definition above. But in dimension 5 and higher, there’s a new possibility: moving one step in each of five dimensions. In that case, the number of possible knight moves increases with dimension like d5.

Related post: A knight’s random walk in 3 or 4 dimensions


3D chess knight moves

I’ve written before about a knight’s random walk on an ordinary chess board. In this post I’d like to look at the generalization to three dimensions (or more).

three dimensional lattice

So what do we mean by 3D chess? For this post, we’ll have a three dimensional lattice of possible positions, of size 8 by 8 by 8. You could think of this as a set of 8 ordinary chess boards stacked vertically. To generalize a knight’s move to this new situation, we’ll say that a knight move consists of moving 2 steps in one direction and 1 step in an orthogonal direction. For example, he might move up two levels and over one position horizontally.

Suppose our knight walks randomly through the chess lattice. At each point, he evaluates all possible moves and chooses one randomly with all possible moves having equal probability. How long on average will it take our knight to return to where he started?

As described in the post about the two dimensional problem, we can find the average return time using a theorem about Markov chains.

The solution is to view the problem as a random walk on a graph. The vertices of the graph are the squares of a chess board and the edges connect legal knight moves. The general solution for the time to first return is simply 2N/k where N is the number of edges in the graph, and k is the number of edges meeting at the starting point.

The problem reduces to counting N and k. This is tedious in two dimensions, and gets harder in higher dimensions. Rather than go through a combinatorial argument, I’ll show how to compute the result with a little Python code.

To count the number of edges N, we’ll add up the number of edges at each node in our graph, and then divide by 2 since this process will count each edge twice. We will iterate over our lattice, generate all potential moves, and discard those that would move outside the lattice.

from numpy import all
from itertools import product, combinations

def legal(v):
    return all([ 0 <= t < 8 for t in v])

count = 0
for position in product(range(8), range(8), range(8)):
    # Choose two directions
    for d in combinations(range(3), 2):
        # Move 1 step in one direction
        # and 2 steps in the other.
        for step in [1, 2]:
            for sign0 in [-1, 1]:
                for sign1 in [-1, 1]:
                    move = list(position)
                    move[d[0]] += sign0*step
                    move[d[1]] += sign1*(3-step)
                    if legal(move):
                        count += 1
print(count // 2)

This tells us that there are N = 4,032 nodes in our graph of possible moves. The number of starting moves k depends on our starting point. For example, if we start at a corner, then we have 6 possibilities. In this case we should expect our knight to return to his starting point in an average of 2*4032/6 = 1,344 moves.

We can easily modify the code above. To look at different size lattices, we could change all the 8’s above. The function legal would need more work if the lattice was not the same size in each dimensions.

We could also look at four dimensional chess by adding one more range to the position loop and changing the combinations to come from range(4) rather than range(3). In case you’re curious, in four dimensions, the graph capturing legal knight moves in an 8 by 8 by 8 by 8 lattice would have 64,512 edges. If our knight started in a corner, he’d have 12 possible starting moves, so we’d expect him to return to his starting position on average after 5,275 moves.

Fibonacci meets Pythagoras

The sum of the squares of consecutive Fibonacci numbers is another Fibonacci number. Specifically we have

F_n^2 + F_{n+1}^2 = F_{2n+1}

and so we have the following right triangle.

The hypotenuse will always be irrational because the only Fibonacci numbers that are squares are 1 and 144, and 144 is the 12th Fibonacci number.

Bounds on the central binomial coefficient

It’s hard to find bounds on binomial coefficients that are both simple and accurate, but in 1990, E. T. H. Wang upper and lower bounds on the central coefficient that are both, valid for n at least 4.

\frac{2^{2n-1}}{\sqrt{n}} < \binom{2n}{n} < \frac{2^{2n-1/2}}{\sqrt{n}}

Here are a few numerical results to give an idea of the accuracy of the bounds on a log scale. The first column is the argument n, The second is the log of the CBC (central binomial coefficient) minus the log of the lower bound. The third column is the log of the upper bound minus the log of the CBC.

    | n |  lower |  upper |
    | 1 | 0.0000 | 0.3465 |
    | 2 | 0.0588 | 0.2876 |
    | 3 | 0.0793 | 0.2672 |
    | 4 | 0.0896 | 0.2569 |
    | 5 | 0.0958 | 0.2507 |
    | 6 | 0.0999 | 0.2466 |
    | 7 | 0.1029 | 0.2436 |
    | 8 | 0.1051 | 0.2414 |
    | 9 | 0.1069 | 0.2396 |

And here’s a plot of the same data taking n out further.

Wang's upper and lower bounds on CBC

So the ratio of the upper bound to the CBC and the ratio of the CBC to the lower bound both quickly approach an asymptote, and the lower bound is better.

Refactored code for division algebras

An earlier post included code for multiplying quaternions, octonions, and sedenions. The code was a little clunky, so I refactor it here.

    def conj(x):
        xstar = -x
        xstar[0] *= -1
        return xstar 

    def CayleyDickson(x, y):
        n = len(x)

        if n == 1:
            return x*y

        m = n // 2
        a, b = x[:m], x[m:]
        c, d = y[:m], y[m:]
        z = np.zeros(n)
        z[:m] = CayleyDickson(a, c) - CayleyDickson(conj(d), b)
        z[m:] = CayleyDickson(d, a) + CayleyDickson(b, conj(c))
        return z

The CayleyDickson function implements the Cayley-Dickson construction and can be used to multiply real, complex, quaternion, and octonion numbers. In fact, it can be used to implement multiplication in any real vector space of dimension 2n. The numeric types listed above correspond to n = 0, 1, 2, and 3. These are the only normed division algebras over the reals.

When n = 4 we get the sedenions, which are not a division algebra because they contain zero divisors, and the code can be used for any larger value of n as well. As noted before, the algebraic properties degrade as n increases, though I don’t think they get any worse after n = 4.

If you wanted to make the code more robust, you could add code to verify that the arguments x and y have the same length, and that their common length is a power of 2. (You could just check that the length is either 1 or even; if it’s not a power of 2 the recursion will eventually produce an odd argument.)

Related posts

How close is octonion multiplication to being associative?

Quaternion multiplication is associative but not commutative. An earlier post looked at the average size of the commutator xy – yx as a measure of how far quaternion multiplication is from being commutative.

This post looks at an analogous question for octonions. Octonion mulitplication is neither commutative nor associative. So in this post we look at the associator of three octonions (xy)z – x(yz) as a measure of how far octonion multiplication is from being associative.

(A post from yesterday looked at how close octonion multiplication comes to being associative in an algebraic sense, looking at weak forms of associativity. This post looks at how close multiplication comes to being associative in an analytical sense, looking at norm distances rather than algebraic identities.)

We will use a simulation to look at the average norm of the octonion associator over the octionions of unit length, analogous to the earlier post that looked at the commutator of the quaternions. We developed code for octonion multiplication in the previous post and will reuse that code here. We also developed code for generating random unit-length octonions in the same post.

Here’s code to find the average norm of the associator and plot a histogram of its values.

    import matplotlib.pyplot as plt

    # omult is octonion multiplication. 
    # See previous post.
    def associator(x, y, z):
        return omult(omult(x, y), z) - 
               omult(x, omult(y, z))

    N = 100000
    s = 0
    h = np.zeros(N)
    for n in range(N):
        x = random_unit_octonion()
        y = random_unit_octonion()
        z = random_unit_octonion()    
        t = norm(associator(x, y, z))
        s += t
        h[n] = t

    plt.hist(h, bins=50)

This gives an average of 1.095 and the histogram below.

histogram of octonion associator norm values

Update: Greg Egan calculated the exact mean value for the norm of the associator to be 147456/(42875 π) ≈ 1.0947336. Here are the details.