Asteroids named after mathematicians

asteroid image

This evening I stumbled on the fact that John von Neumann and Fibonacci both have asteroids named after them. Then I wondered how many more famous mathematicians have asteroids named after them. As it turns out, most of them: Euclid, Gauss, Cauchy, Noether, Gödel, Ramanujan, … It’s easier to look at the names that are not assigned to asteroids.

I wrote a little script to search for the 100 greatest mathematicians (according to James Allen’s list) and look for names missing from a list of named asteroids. Here are the ones that were missing, in order of Allen’s list.

So of the top 100 list of mathematicians, 80 have asteroids named after them and the 20 above do not.

Alexandre Grothendieck, is the greatest mathematician—11th on James Allen’s list—not to have an asteroid named in his honor.

Sine of a googol

How do you evaluate the sine of a large number in floating point arithmetic? What does the result even mean?

Sine of a trillion

Let’s start by finding the sine of a trillion (1012) using floating point arithmetic. There are a couple ways to think about this. The floating point number t = 1.0e12 can only be represented to 15 or 16 significant decimal figures (to be precise, 53 bits [1]), and so you could think of it as a representative of the interval of numbers that all have the same floating point representation. Any result that is the sine of a number in that interval should be considered correct.

Another way to look at this is to say that t can be represented exactly—its binary representation requires 42 bits and we have 53 bits of significand precision available—and so we expect sin(t) to return the sine of exactly one trillion, correct to full precision.

It turns out that IEEE arithmetic does the latter, computing sin(1e12) correctly to 16 digits.

Here’s the result in Python

    >>> sin(1.0e12)
    -0.6112387023768895

and verified by Mathematica by computing the value to 20 decimal places

    In:= N[Sin[10^12], 20]
    Out= -0.61123870237688949819

Range reduction

Note that the result above is not what you’d get if you were first to take the remainder when dividing by 2π and then taking the sine.

    >>> sin(1.0e12 % (2*pi))
    -0.6112078499756778

This makes sense. The result of dividing a trillion by the floating point representation of 2π, 159154943091.89536, is correct to full floating point precision. But most of that precision is in the integer part. The fractional part is only has five digits of precision, and so we should expect the result above to be correct to at most five digits. In fact it’s accurate to four digits.

When your processor computes sin(1e12) it does not naively take the remainder by 2π and then compute the sine. If it did, you’d get four significant figures rather than 16.

We started out by saying that there were two ways of looking at the problem, and according to the first one, returning only four significant figures would be quite reasonable. If we think of the value t as a measured quantity, measured to the precision of floating point arithmetic (though hardly anything can be measured that accurately), then four significant figures would be all we should expect. But the people who designed the sine function implementation on your processor did more than they might be expected to do, finding the sine of a trillion to full precision. They do this using a range reduction algorithm that retains far more precision than naively doing a division. (I worked on this problem a long time ago.)

Sine of a googol?

What if we try to take the sine of a ridiculously large number like a googol (10100)? This won’t work because a googol cannot be represented exactly as a floating point number (i.e. as a IEEE 754 double). It’s not too big; floating point numbers can be as large as around 10308. The problem is that a number that big cannot be represented to full precision. But we can represent 2333 exactly, and a googol is between 2332 and 2333. And amazingly, Python’s sine function (or rather the sine function built into the AMD processor on my computer) returns the correct result to full precision.

    >>> sin(2**333)
    0.9731320373846353

How do we know this is right? I verified it in Mathematica:

    In:= Sin[2.0^333]
    Out= 0.9731320373846353

How do we know Mathematica is right? We can do naive range reduction using extended precision, say 200 decimal places.

    In:= p = N[Pi, 200]
    In:= theta = x - IntegerPart[x/ (2 p)] 2 p
    Out= 1.8031286178334699559384196689…

and verify that the sine of the range reduced value is correct.

    In:= Sin[theta]
    Out= 0.97313203738463526…

Interval arithmetic interpretation

Because floating point numbers have 53 bits of precision, all real numbers between 256 – 22 and 256 + 22 are represented as the floating point number 256. This is a range of width 8, wider than 2π, and so the sines of the numbers in this range cover the possible values of sine, i.e. [-1, 1]. So if you think of a floating point number as a sample from the set of all real numbers with the same binary representation, every possible value of sine is a valid return value for numbers bigger than 256. But if you think of a floating point number as representing itself exactly, then it makes sense to ask for the sine of numbers like 2333 and larger, up to the limits of floating point representation [2].

Related posts

[1] An IEEE 754 double has 52 significand bits, but these bits can represent 53 bits of precision because the first bit of the fractional part is always 1, and so it does not need to be represented. See more here.

[2] The largest exponent of an IEEE double is 1023, and the largest significand is 2 – 2-53 (i.e. all 1’s), so the largest possible value of a double is

(253 – 1)21024-53

and in fact the Python expression sin((2**53 - 1)*2**(1024-53)) returns the correct value to 16 significant figures.

Sine sum

Sam Walters posted something interesting on Twitter yesterday I hadn’t seem before:

If for some reason your browser doesn’t render the embedded tweet, he points out that

-\frac{1}{7} < \sum_{n=1}^N \sin(n) < 2

for all positive integers N.

Here’s a little Python script to illustrate the sum in his tweet.

    from scipy import sin, arange
    import matplotlib.pyplot as plt
    
    x = arange(1,100)
    y = sin(x).cumsum()
    plt.plot(x, y)
    plt.plot((1, 100), (-1/7, -1/7), "g--")
    plt.plot((1, 100), (2, 2), "g--")
    plt.savefig("sinsum.svg")

Exponential sums

Exponential sums are interesting because crude application of the triangle inequality won’t get you anywhere. All it will let you conclude is that the sum is between –N and N.

(Why is this an exponential sum? Because it’s the imaginary part of the sum over ein.)

For more on exponential sums, you might like the book Uniform Distribution of Sequences.

Also, I have a page that displays the plot of a different exponential sum each day, the coefficients in the sum

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

being taking from the day’s date. Because consecutive numbers have very different number theoretic properties, the images vary quite a bit from day to day.

Here’s a sneak peak at what the exponential sum for Christmas will be this year.

Geometry of an oblate spheroid

We all live on an oblate spheroid [1], so it could be handy to know a little about oblate spheroids.

Eccentricity

Conventional notation uses a for the equatorial radius and c for the polar radius. Oblate means ac. The eccentricity e is defined by

e = \sqrt{1 - \frac{c^2}{a^2}}

For a perfect sphere, ac and so e = 0. The eccentricity for earth is small, e = 0.08. The eccentricity increases as the polar radius becomes smaller relative to the equatorial radius. For Saturn, the polar radius is 10% less than the equatorial radius, and e = 0.43.

Volume

The volume of an oblate spheroid is simple:

V = \frac{4}{3}\pi a^2 c

Clearly we recover the volume of a sphere when ac.

Surface area

The surface area is more interesting. The surface of a spheroid is a surface of rotation, and so can easily be derived. It works out to be

A = 2\pi a^2 + \pi \frac{c^2}{e} \log\left(\frac{1 +e}{1 - e} \right)

It’s not immediately clear that we get the area of a sphere as c approaches a, but it becomes clear when we expand the log term in a Taylor series.

A = 2\pia^2 + 2\pi c^2\left( 1 + \frac{e^2}{3} + \frac{e^4}{5} + \frac{e^6}{7} + \cdots \right)

This suggests that an oblate spheroid has approximately the same area as a sphere with radius √((a² + c²)/2), with error on the order of e².

If we set a = 1 and let c vary from 0 to 1, we can plot how the surface area varies.

plot c vs area

Here’s the corresponding plot where we use the eccentricity e as our independent variable rather than the polar radius c.

plot of area as a function of eccentricity

Related posts

[1] Not in a yellow submarine as some have suggested.

Comparing bfloat16 range and precision to other 16-bit numbers

server rack

Deep learning has spurred interest in novel floating point formats. Algorithms often don’t need as much precision as standard IEEE-754 doubles or even single precision floats. Lower precision makes it possible to hold more numbers in memory, reducing the time spent swapping numbers in and out of memory. Also, low-precision circuits are far less complex. Together these can benefits can give significant speedup.

Here I want to look at bfloat16, or BF16 for short, and compare it to 16-bit number formats I’ve written about previously, IEEE and posit. BF16 is becoming a de facto standard for deep learning. It is supported by several deep learning accelerators (such as Google’s TPU), and will be supported in Intel processors two generations from now.

Bit layout

The BF16 format is sort of a cross between FP16 and FP32, the 16- and 32-bit formats defined in the IEEE 754-2008 standard, also known as half precision and single precision.

BF16 has 16 bits like FP16, but has the same number of exponent bits as FP32. Each number has 1 sign bit. The rest of the bits in each of the formats are allocated as in the table below.

    |--------+------+----------+----------|
    | Format | Bits | Exponent | Fraction |
    |--------+------+----------+----------|
    | FP32   |   32 |        8 |       23 |
    | FP16   |   16 |        5 |       10 |
    | BF16   |   16 |        8 |        7 |
    |--------+------+----------+----------|

BF16 has as many bits as a FP16, but as many exponent bits as a FP32. The latter makes conversion between BF16 and FP32 simple, except for some edge cases regarding denormalized numbers.

Precision

The epsilon value, the smallest number ε such that 1 + ε > 1 in machine representation, is 2e where e is the number of fraction bits. BF16 has much less precision near 1 than the other formats.

    |--------+------------|
    | Format |    Epsilon |
    |--------+------------|
    | FP32   | 0.00000012 |
    | FP16   | 0.00390625 |
    | BF16   | 0.03125000 |
    |--------+------------|

Dynamic range

The dynamic range of bfloat16 is similar to that of a IEEE single precision number. Relative to FP32, BF16 sacrifices precision to retain range. Range is mostly determined by the number of exponent bits, though not entirely.

Dynamic range in decades is the log base 10 of the ratio of the largest to smallest representable positive numbers. The dynamic ranges of the numeric formats are given below. (Python code to calculate dynamic range is given here.)

    |--------+-------|
    | Format | DR    |
    |--------+-------|
    | FP32   | 83.38 |
    | BF16   | 78.57 |
    | FP16   | 12.04 |
    |--------+-------|

Comparison to posits

The precision and dynamic range of posit numbers depends on how many bits you allocate to the maximum exponent, denoted es by convention. (Note “maximum.” The number of exponent bits varies for different numbers.) This post explains the anatomy of a posit number.

Posit numbers can achieve more precision and more dynamic range than IEEE-like floating point numbers with the same number of bits. Of course there’s no free lunch. Posits represent large numbers with low precision and small numbers with high precision, but this trade-off is often what you’d want.

For an n-bit posit, the number of fraction bits near 1 is n – 2 – es and so epsilon is 2 to the exponent es – n – 2. The dynamic range is

\log_{10}\left( 2^{2^{\text{\small\emph{es}}} } \right)^{2n-4} = (2n-4)2^{es}\log_{10}2

which is derived here. The dynamic range and epsilon values for 16-bit posits with es ranging from 1 to 4 are given in the table below.

    |----+--------+-----------|
    | es |     DR |   epsilon |
    |----+--------+-----------|
    |  1 |  16.86 | 0.0000076 |
    |  2 |  33.82 | 0.0000153 |
    |  3 |  37.43 | 0.0000305 |
    |  4 | 143.86 | 0.0000610 |
    |----+--------+-----------|

For all the values of es above, a 16-bit posit number has a smaller epsilon than either FP16 or BF16. The dynamic range of a 16-bit posit is larger than that of a FP16 for all values of es, and greater than BF16 and FP32 when es = 4.

Related posts

Continued fraction cryptography

Every rational number can be expanded into a continued fraction with positive integer coefficients. And the process can be reversed: given a sequence of positive integers, you can make them the coefficients in a continued fraction and reduce it to a simple fraction.

In 1954, Arthur Porges published a one-page article pointing out that continued fractions could be used to create a cipher. To encrypt your cleartext, convert it to a list of integers, use them as continued fraction coefficients, and report the resulting fraction. To decrypt, expand the fraction into a continued fraction and convert the coefficients back to text.

We can implement this in Mathematica as follows:

    
encode[text_] := FromContinuedFraction[ ToCharacterCode[ text ]]
decode[frac_] := FromCharacterCode[ ContinuedFraction[ frac ]]

So, for example, suppose we want to encrypt “adobe.” If we convert each letter to its ASCII code we get {97, 100, 111, 98, 101}. When we make these numbers coefficients in a continued fraction we get

\mathrm{

which reduces to 10661292093 / 109898899.

This isn’t a secure encryption scheme by any means, but it’s a cute one. It’s more of an encoding scheme, a (non-secret) way to convert a string of numbers into a pair of numbers.

Related posts

[1] Arthur Porges. A Continued Fraction Cipher. The American Mathematical Monthly, Vol. 59, No. 4 (Apr., 1952), p. 236

How to compute the area of a polygon

If you know the vertices of a polygon, how do you compute its area? This seems like this could be a complicated, with special cases for whether the polygon is convex or maybe other considerations. But as long as the polygon is “simple,” i.e. the sides meet at vertices but otherwise do not intersect each other, then there is a general formula for the area.

The formula

List the vertices starting anywhere and moving counterclockwise around the polygon: (x1y1), (x2y2), …, (xnyn). Then the area is given by the formula below.

A = \frac{1}{2} \begin{vmatrix} x_1 & x_2 & \ldots & x_n & x_1\\ y_1 & y_2 & \ldots & y_n & y_1 \end{vmatrix}

But what does that mean? The notation is meant to be suggestive of a determinant. It’s not literally a determinant because the matrix isn’t square. But you evaluate it in a way analogous to 2 by 2 determinants: add the terms going down and to the right, and subtract the terms going up and to the right. That is,

x1 y2 + x2 y3 + … xn y1y1 x2y2 x3 – … –  yn x1

This formula is sometimes called the shoelace formula because the pattern of multiplications resembles lacing a shoe. It’s also called the surveyor’s formula because it’s obviously useful in surveying.

Numerical implementation

As someone pointed out in the comments, in practice you might want to subtract the minimum x value from all the x coordinates and the minimum y value from all the y coordinates before using the formula. Why’s that?

If you add a constant amount to each vertex, you move your polygon but you don’t change the area. So in theory it makes no difference whether you translate the polygon before computing its area. But in floating point, it can make a difference.

The cardinal rule of floating point arithmetic is to avoid subtracting nearly equal numbers. If you subtract two numbers that agree to k significant figures, you can lose up to k figures of precision. We’ll illustrate this by taking a right triangle with base 2 and height π. The area should remain π as we translate the triangle away from the origin, we lose precision the further out we translate it.

Here’s a Python implementation of the shoelace formula.

    def area(x, y):
        n = len(x)
        s = 0.0
        for i in range(-1, n-1):
            s += x[i]*y[i+1] - y[i]*x[i+1]
        return 0.5*s

If you’re not familiar with Python, a couple things require explanation. First, range(n-1) is a list of integers starting at 0 but less than n-1. Second, the -1 index returns the last element of the array.

Now, watch how the precision of the area degrades as we shift the triangle by powers of 10.

    import numpy as np

    x = np.array([0.0, 0.0, 2.0])
    y = np.array([np.pi, 0.0, 0.0])

    for n in range(0, 10):
        t = 10**n
        print( area(x+t, y+t) )

This produces

    3.141592653589793
    3.1415926535897825
    3.1415926535901235
    3.1415926535846666
    3.141592651605606
    3.1415929794311523
    3.1416015625
    3.140625
    3.0
    0.0

Shifting by small amounts doesn’t make a noticeable difference, but we lose between one and two significant figures each time we increase t by a multiple of 10. We only have between 15 and 16 significant figures to start with in a standard floating point number, and so eventually we completely run out of significant figures.

When implementing the shoelace formula, we want to do the opposite of this example: instead of shifting coordinates so that they’re similar in size, we want to shift them toward the origin so that they’re not similar in size.

Related post: The quadratic formula and low-precision arithmetic

Passwords and power laws

According to this paper [1], the empirical distribution of real passwords follows a power law [2]. In the authors’ terms, a Zipf-like distribution. The frequency of the rth most common password is proportional to something like 1/r. More precisely,

fr = C rs

where s is on the order of 1. The value of s that best fit the data depended on the set of passwords, but their estimates of s varied from 0.46 to 0.91.

This means that the most common passwords are very common and easy to guess.

Size of password spaces

If passwords come from an alphabet of size A and have length n, then there are An possibilities. For example, if a password has length 10 and consists of uppercase and lowercase English letters and digits, there are

6210 = 839,299,365,868,340,224

possible such passwords. If users chose passwords randomly from this set, brute force password attacks would be impractical. But brute force attacks are practical because passwords are not chosen uniformly from this large space of possibilities, far from it.

Attackers do not randomly try passwords. They start with the most common passwords and work their way down the list. In other words, attackers use Pareto’s rule.

Rules requiring, say, one upper case letter, don’t help much because most users will respond by using exactly one upper case letter, probably the first letter. If passwords must have one special character, most people will use exactly one special character, most likely at the end of the word. Expanding the alphabet size A exponentially increases the possible passwords, but does little to increase the actual number of passwords.

What’s interesting about the power law distribution is that there’s not a dichotomy between naive and sophisticated users. If there were, there would be a lot of common passwords, and all the rest uniformly distributed. Instead, there’s a continuum between the most naive and most sophisticated. That means a lot of people are not exactly naive, but not as secure as they think they are.

Some math

Under the Zipf model [3], the number of times we’d expect to see the most common password is NC where N is the size of the data set. The constant C is what it has to be for the frequencies to sum to 1. That is, C depends on the number of data points N and the exponent s and is given by

C_{N, s} = \frac{1}{\sum_{r=1}^n r^{-s}}

We can bound the sum in the denominator from above and below with integrals, as in the integral test for series convergence. This gives us a way to see more easily how C depends on its parameters.

1 + \int_1^N x^{-s} \, dx > \frac{1}{C} = \sum_{r=1}^N r^{-s} > 1 + \int_2^{N+1} x^{-r}\, dx

When s = 1 this reduces to

1 + \log(N) > \frac{1}{C} > 1 + \log(N+1) - \log(2)

and otherwise to

1 + \frac{N^{1-s} - 1}{1-s} > \frac{1}{C} > 1 + \frac{(N+1)^{1-s} - 2^{1-s}}{1-s}

Suppose you have N = 1,000,000 passwords. The range of s values found by Wang et al varied from roughly 0.5 to 0.9. Let’s first set s = 0.5. Then C is roughly 0.0005. This mean the most common password appears about 500 times.

If we keep N the same and set s to 0.9, then C is roughly 0.033, and so the most common password would appear about 33,000 times.

How to choose passwords

If you need to come up with a password, randomly generated passwords are best, but hard to remember. So people either use weak but memorable passwords, or use strong passwords and don’t try to remember them. The latter varies in sophistication from password management software down to Post-it notes stuck on a monitor.

One compromise is to concatenate a few randomly chosen words. Something like “thebestoftimes” would be weak because they are consecutive words from a famous novel. Something like “orangemarbleplungersoap” would be better.

Another compromise, one that takes more effort than most people are willing to expend, is to use Manuel Blum’s mental hash function.

Related posts

[1] In case the link rots in the future, the paper is “Zipf’s Law in Passwords” by Ding Wang, Haibo Cheng, Ping Wang, Xinyi Huang, and Gaopeng Jian. IEEE Transactions on Information Forensics and Security, vol. 12, no. 11, November 2017.

[2] Nothing follows a power law distribution exactly and everywhere. But that’s OK: nothing exactly follows any other distribution everywhere: not Gaussian, not exponential, etc. But a lot of things, like user passwords, approximately follow a power law, especially over the middle of their range. Power law’s like Zipf’s law tend to not fit as well at the beginning and the end.

[3] Here I’m using a pure Zipf model for simplicity. The paper [1] used a Zipf-like model that I’m not using here. Related to the footnote [2] above, it’s often necessary to use a minor variation on the pure Zipf model to get a better fit.

Footnote on fifth root trick

Numberphile has a nice video on the fifth root trick: someone raises a two-digit number to the 5th power, reads the number aloud, and you tell them immediately what the number was.

Here’s the trick in a nutshell. For any number n, n5 ends in the same last digit as n. You could prove that by brute force or by Euler’s theorem. So when someone tells you n5, you immediately know the last digit. Now you need to find the first digit, and you can do that by learning, approximately, the powers (10k)5 for i = 1, 2, 3, …, 9. Then you can determine the first digit by the range.

Here’s where the video is a little vague. It says that you don’t need to know the powers of 10k very accurately. This is true, but just how accurately do you need to know the ranges?

If the two-digit number is a multiple of 10, you’ll recognize the zeros at the end, and the last non-zero digit is the first digit of n. For example, if n5 = 777,600,000 then you know n is a multiple of 10, and since the last non-zero digit is 6, n = 60.

So you need to know the fifth powers of multiples of 10 well enough to distinguish (10k – 1)5 from (10k + 1)5. The following table shows what these numbers are.

|---+---------------+---------------|
| k | (10k - 1)^5   | (10k + 1)^5   |
|---+---------------+---------------|
| 1 |        59,049 |       161,051 |
| 2 |     2,476,099 |     4,084,101 |
| 3 |    20,511,149 |    28,629,151 |
| 4 |    90,224,199 |   115,856,201 |
| 5 |   282,475,249 |   345,025,251 |
| 6 |   714,924,299 |   844,596,301 |
| 7 | 1,564,031,349 | 1,804,229,351 |
| 8 | 3,077,056,399 | 3,486,784,401 |
| 9 | 5,584,059,449 | 6,240,321,451 |
|---+---------------+---------------|

So any number less than a million has first digit 1. Any number between 1 million and 3 million has first digit 2. Etc.

You could choose the following boundaries, if you like.

|---+----------------|
| k | upper boundary |
|---+----------------|
| 1 |      1,000,000 |
| 2 |      3,000,000 |
| 3 |     25,000,000 |
| 4 |    100,000,000 |
| 5 |    300,000,000 |
| 6 |    800,000,000 |
| 7 |  1,700,000,000 |
| 8 |  3,200,000,000 |
| 9 |  6,000,000,000 |
|---+----------------|

The Numberphile video says you should have someone say the number aloud, in words. So as soon as you hear “six billion …”, you know the first digit of n is 9. If you hear “five billion” or “four billion” you know the first digit is 8. If you hear “three billion” then you know to pay attention to the next number, to decide whether the first digit is 7 or 8. Once you hear the first few syllables of the number, you can stop pay attention until you hear the last syllable or two.

A strange sort of product rule

Let u be a real-valued function of n variables, and let v be a vector-valued function of n variables, a function from n variables to a vector of size n. Then we have the following product rule:

D(uv) = v Duu Dv.

It looks strange that the first term on the right isn’t Du v.

The function uv is a function from n dimensions to n dimensions, so it’s derivative must be an n by n matrix. So the two terms on the right must be n by n matrices, and they are. But Du v is a 1 by 1 matrix, so it would not make sense on the right side.

Here’s why the product rule above looks strange: the multiplication by u is a scalar product, not a matrix product. Sometimes you can think of real numbers as 1 by 1 matrices and everything works out just fine, but not here. The product uv doesn’t make sense if you think of the output of u as a 1 by 1 matrix. Neither does the product u Dv.

If you think of v as an n by 1 matrix and Du as a 1 by n matrix, everything works. If you think of v and Du as vectors, then v Du is the outer product of the two vectors. You could think of Du as the gradient of u, but be sure you think of it horizontally, i.e. as a 1 by n matrix. And finally, D(uv) and Dv are Jacobian matrices.

Update: As Harald points out in the comments, the usual product rule applies if you write the scalar-vector product uv as the matrix product vu where now are are thinking of u as a 1 by 1 matrix! Now the product rule looks right

D(vu) = Dv uv Du

but the product vu looks wrong because you always write scalars on the left. But here u isn’t a scalar!

Finite simple group example: PSL(q)

A few days ago I wrote about finite simple groups and said that I might write more about them. This post is a follow-on to that one.

In my earlier post I said that finite simple groups fell into five broad categories, and that three of these were easy to describe. This post will chip away at one of the categories I said was hard to describe briefly, namely groups of Lie type or classical groups. These are finite groups that are analogous to (continuous) Lie groups. Continue reading

Simplest exponential sum

Today‘s exponential sum curve is simply a triangle.

2018_09_06

But yesterday‘s curve was more complex

2018_09_05

and tomorrow‘s curve will be more complex as well.

2018_09_07

Why is today’s curve so simple? The vertices of the curves are the partial sums of the series

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

where m is the month, d is the day, and y is the last two digits of the year. Typically the sums are too complicated to work out explicitly by hand, but today’s sum is fairly simple. We have m = 9, d = 6, and y = 18. The three fractions add to (2n + 3n² + n³)/18. Reduced mod 18, the numerators are

0, 6, 6, 6, 12, 12, 12, 0, 0, 0, 6, 6, 6, 12, 12, 12, 0, 0

The repetition in the terms of the sum leads to the straight lines in the plot. The terms in the exponential sum only take on three values, the three cube roots of 1. These three roots are 1, a, and b where

a = exp(2πi/3) = -1/2 + i√3/2

and

b = exp(-2πi/3) = -1/2 – i√3/2

is the complex conjugate of a.

Using Mathematica we have

    Table[ Exp[2 Pi I (2 n + 3 n^2 + n^3)/18], {n, 0, 17}] 
        /. {Exp[2 Pi I/3] -> a, Exp[-2 Pi I/3] -> b}

which produces

1, a, a, a, b, b, b, 1, 1, 1, a, a, a, b, b, b, 1, 1

When we take the partial sums, we get four points in a straight line because they differ by a:

1, 1 + a, 1 + 2a, 1 + 3a

 then three points in a straight line because they differ by b:

(1 + 3a), (1 + 3a) + b, (1 + 3a) + 2b, (1 + 3a) + 3b

and so forth.

 

Three ways to sum a divergent series

There’s no direct way to define the sum of an infinite number of terms. Addition takes two arguments, and you can apply the definition repeatedly to define the sum of any finite number of terms. But an infinite sum depends on a theory of convergence. Without a definition of convergence, you have no way to define the value of an infinite sum. And with different definitions of convergence, you can get different values.

In this post I’ll review two ways of assigning a meaning to divergent series that I’ve written about before, then mention a third way.

Asymptotic series

A few months ago I wrote about an asymptotic series solution to the differential equation

y' = y = \frac{1}{x}

You end up with the solution

y = \frac{1}{x} + \frac{1}{x^2} + \frac{2}{x^3} + \cdots + \frac{(n-1)!}{x^n} + \cdots

which diverges for all x. That is, for each x, the partial sums of the series do not get closer to any number that you could call the sum. In fact, the individual terms of the series eventually get bigger and bigger. Surely this is a useless solution, right?

Actually, it is useful if you change your perspective. Instead of holding x fixed and letting n go to infinity, fix a value of n and let x go to infinity. In that sense, the series converges. For fixed n and large x, this gives accurate approximations to the solution of the differential equation.

Analytic continuation

At the end of a post on Bernoulli numbers I briefly explain the interpretation of the apparently nonsensical equation

1 + 2 + 3 + … = -1/12.

In a nutshell, the Riemann zeta function is defined by a two-step process. First define

\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}

for s with real part strictly bigger than 1. Then define the zeta function for the rest of the complex plane (except the point s = 1) by analytic continuation. If the infinite sum for zeta were valid for s = -1, which is it not, then it would equal 1 + 2 + 3 + …

The analytic continuation of the zeta function is defined at -1, and there the function equals -1/12. So to make sense of the sum of the positive integers, interpret the sum as a sort of pun, a funny way to write ζ(-1).

p-adic numbers

This is the most radical way to make sense of divergent series: change your number system so that they aren’t divergent!

The sum

1 + 2 + 4 + 8 + …

diverges because the partial sums (1, 3, 7, 15, …) are not getting closer to anything. But you can make the series converge by changing the way you measure distance between numbers. That’s what p-adic numbers do. For any fixed prime number p, define the distance between two numbers as the reciprocal of the largest power of p that divides their difference. That is, numbers are close together if they differ by a large power of p. We can make sense of the sum above in the 2-adic numbers, i.e. the p-adic numbers with p = 2.

The nth partial sum of the series above is 2n – 1. The 2-adic distance between 2n – 1 and -1 is 2n, which goes to zero, so the series converges to -1.

1 + 2 + 4 + 8 + … = -1.

Note that all the partial sums are the same, whether in the real numbers or the 2-adics, but the two number systems disagree on whether the partial sums converge.

If that explanation went by too quickly, here’s a 15-minute video expands on the same derivation.

Names for extremely large numbers

Names for extremely large numbers are kinda pointless. The purpose of a name is to communicate something, and if the meaning of the name is not widely known, the name doesn’t serve that purpose. It’s even counterproductive if two people have incompatible ideas what the word means. It’s much safer to simply use scientific notation for extremely large numbers.

Obscure or confusing

Everyone knows what a million is. Any larger than that and you may run into trouble. The next large number name, billion, means 109 to some people and 1012 to others.

When writing for those who take one billion to mean 109, your audience may or may not know that one trillion is 1012 according to that convention. You cannot count on people knowing the names of larger numbers: quadrillion, quintillion, sextillion, etc.

To support this last claim, let’s look at the frequency of large number names according to Google’s Ngram viewer. Presumably the less often a word is used, the less likely people are to recognize it.

Here’s a bar chart for the data from 2005, plotting on a logarithmic scale. The chart has a roughly linear slope, which means the frequency of the words drops exponentially. Million is used more than 24,000 times as often as septillion.

Frequency of large number names on log scale

Here’s the raw data.

    |-------------+--------------|
    | Name        |    Frequency |
    |-------------+--------------| 
    | Million     | 0.0096086564 |
    | Billion     | 0.0030243298 |
    | Trillion    | 0.0002595139 |
    | Quadrillion | 0.0000074383 |
    | Quintillion | 0.0000016745 |
    | Sextillion  | 0.0000006676 |
    | Septillion  | 0.0000003970 |
    |-------------+--------------|

Latin prefixes for large numbers

There is a consistency to these names, though they are not well known. Using the convention that these names refer to powers of 1000, the pattern is

[Latin prefix for n] + llion = 1000n+1

So for example, billion is 10002+1 because bi- is the Latin prefix for 2. A trillion is 10003+1 because tri- corresponds to 3, and so forth. A duodecillion is 100013 because the Latin prefix duodec- corresponds to 12.

    |-------------------+---------|
    | Name              | Meaning |
    |-------------------+---------|
    | Billion           |   10^09 |
    | Trillion          |   10^12 |
    | Quadrillion       |   10^15 |
    | Quintillion       |   10^18 |
    | Sextillion        |   10^21 |
    | Septillion        |   10^24 |
    | Octillion         |   10^27 |
    | Nonillion         |   10^30 |
    | Decillion         |   10^33 |
    | Undecillion       |   10^36 |
    | Duodecillion      |   10^39 |
    | Tredecillion      |   10^42 |
    | Quattuordecillion |   10^45 |
    | Quindecillion     |   10^48 |
    | Sexdecillion      |   10^51 |
    | Septendecillion   |   10^54 |
    | Octodecillion     |   10^57 |
    | Novemdecillion    |   10^60 |
    | Vigintillion      |   10^63 |
    |-------------------+---------|

Code

Here’s the Mathematica code that was used to create the plot above.

    BarChart[ 
        {0.0096086564, 0.0030243298, 0.0002595139, 0.0000074383, 
         0.0000016745, 0.0000006676, 0.0000003970}, 
         ChartLabels -> {"million", "billion", "trillion", "quadrillion", 
                         "quintillion", "sextillion", "septillion"}, 
         ScalingFunctions -> "Log", 
         ChartStyle -> "DarkRainbow"
    ]

Fraktur symbols in mathematics

When mathematicians run out of symbols, they turn to other alphabets. Most math symbols are Latin or Greek letters, but occasionally you’ll run into Russian or Hebrew letters.

Sometimes math uses a new font rather than a new alphabet, such as Fraktur. This is common in Lie groups when you want to associate related symbols to a Lie group and its Lie algebra. By convention a Lie group is denoted by an ordinary Latin letter and its associated Lie algebra is denoted by the same letter in Fraktur font.

lower case alphabet in Fraktur

LaTeX

To produce Fraktur letters in LaTeX, load the amssymb package and use the command \mathfrak{}.

Symbols such as \mathfrak{A} are math symbols and can only be used in math mode. They are not intended to be a substitute for setting text in Fraktur font. This is consistent with the semantic distinction in Unicode described below.

Unicode

The Unicode standard tries to distinguish the appearance of a symbol from its semantics, though there are compromises. For example, the Greek letter Ω has Unicode code point U+03A9 but the symbol Ω for electrical resistance in Ohms is U+2621 even though they are rendered the same [1].

The letters a through z, rendered in Fraktur font and used as mathematical symbols, have Unicode values U+1D51E through U+1D537. These values are in the “Supplementary Multilingual Plane” and do not commonly have font support [2].

The corresponding letters A through Z are encoded as U+1D504 through U+1D51C, though interestingly a few letters are missing. The code point U+1D506, which you’d expect to be Fraktur C, is reserved. The spots corresponding to H, I, and R are also reserved. Presumably these are reserved because they are not commonly used as mathematical symbols. However, the corresponding bold versions U+1D56C through U+ID585 have no such gaps [3].

Footnotes

[1] At least they usually are. A font designer could choose provide different glyphs for the two symbols. I used the same character for both because some I thought some readers might not see the Ohm symbol properly rendered.

[2] If you have the necessary fonts installed you should see the alphabet in Fraktur below:
𝔞 𝔟 𝔠 𝔡 𝔢 𝔣 𝔤 𝔥 𝔦 𝔧 𝔨 𝔩 𝔪 𝔫 𝔬 𝔭 𝔮 𝔯 𝔰 𝔱 𝔲 𝔳 𝔴 𝔵 𝔶 𝔷

I can see these symbols from my desktop and from my iPhone, but not from my Android tablet. Same with the symbols below.

[3] Here are the bold upper case and lower case Fraktur letters in Unicode:
𝕬 𝕭 𝕮 𝕯 𝕰 𝕱 𝕲 𝕳 𝕴 𝕵 𝕶 𝕷 𝕸 𝕹 𝕺 𝕻 𝕼 𝕽 𝕾 𝕿 𝖀 𝖁 𝖂 𝖃 𝖄 𝖅
𝖆 𝖇 𝖈 𝖉 𝖊 𝖋 𝖌 𝖍 𝖎 𝖏 𝖐 𝖑 𝖒 𝖓 𝖔 𝖕 𝖖 𝖗 𝖘 𝖙 𝖚 𝖛 𝖜 𝖝 𝖞 𝖟