1 + 2 + 3 + … = −1/12

The other day MathMatize posted

roses are red
books go on a shelf
1+2+3+4+ …

with a photo of Ramanujan on X.

This was an allusion to the bizarre equation

1 + 2 + 3 + … = − 1/12.

This comes up often enough that I wanted to write a post that I could share a link to next time I see it.

The equation is nonsense if interpreted in the usual way. The sum on the left diverges. You could say the sum is ∞ if by that you mean you can make the sum as large as you like by taking the partial sum out far enough.

Here’s how the equation is mean to be interpreted. The Riemann zeta function is defined as

\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}

for complex numbers s with positive real part, and defined for the rest of the complex plane by analytic continuation. The qualifiers matter. The infinite sum above does not define the zeta function for all numbers; it defines ζ(s) for numbers with real part greater than 1. The sum is valid for numbers like 7, or 42 −476i, or √2 + πi, but not for −1.

If the sum did define ζ(−1) then the sum would be 1 + 2 + 3 + …, but it doesn’t.

However, ζ(−1) is defined, and it equals −1/12.

What does it mean to define a function by analytic continuation? There is a theorem that essentially says there is only one way to extend an analytic function. It is possible to construct an analytic function that has the same values as ζ(s) when Re(s) > 1, where that function is defined for all s ≠ 1.

We could give that function a new name, say f(s). That is the function whose value at −1 equals − 1/12. But since there is only one possible analytic function f that overlaps with ζ(s) we go ahead and use the notation ζ(s) for this extended function.

To put it another way, the function ζ(s) is valid for all s ≠ 1, but the series representation for ζ(s) is not valid unless Re(s) > 1. There are other representations for ζ(s) for other regions of the complex plane, including for s = −1, and that’s what lets us compute ζ(−1) to find out that it equals −1/12.

So the rigorous but less sensational way to interpret the equation is to say

1s + 2s + 3s + …

is a whimsical way of referring to the function defined by the series, when the series converges, and defined by its analytic continuation otherwise. So in addition to saying

1 + 2 + 3 + … = − 1/12

we could also say

1² + 2² + 3² + … = 0

and

1³ + 2³ + 3³ + … = 1/120.

You can make up your own equation for any value of s for which you can calculate ζ(s).

Multiple angle asymmetry

The cosine of a multiple of θ can be written as a polynomial in cos θ. For example,

cos 3θ = 4 cos3 θ − 3 cos θ

and

cos 4θ = 8 cos4 θ − 8 cos2 θ + 1.

But it may or may not be possible to write the sine of a multiple of θ as a polynomial in sin θ. For example,

sin 3θ = −4 sin3 θ + 3 sin θ

but

sin 4θ =  − 8 sin3 θ cos θ + 4 sin θ cos θ

It turns out cos nθ can always be written as a polynomial in cos θ, but sin nθ can be written as a polynomial in sin θ if and only if n is odd. We will prove this, say more about sin nθ for even n, then be more specific about the polynomials alluded to.

Proof

We start by writing exp(inθ) two different ways:

cos nθ + i sin nθ = (cos θ + i sin θ)n

The real part of the left hand side is cos nθ and the real part of the right hand side contains powers of cos θ and even powers of sin θ. We can convert the latter to cosines by replacing sin2 θ with 1 − cos2 θ.

The imaginary part of the left hand side is sin nθ. If n is odd, the right hand side involves odd powers of sin θ and even powers of cos θ, in which case we can replace the even powers of cos θ with even powers of sin θ. But if n is even, every term in the imaginary part will involve odd powers of sin θ and odd powers of cos θ. Every odd power of cos θ can be turned into terms involving a single cos θ and an odd power of sin θ.

We’ve proven a little more than we set out to prove. When n is even, we cannot write sin nθ as a polynomial in sin θ, but we can write it as cos θ multiplied by an odd degree polynomial in sin θ. Alternatively, we could write sin nθ as sin θ multiplied by an odd degree polynomial in cos θ.

Naming polynomials

The polynomials alluded to above are not arbitrary polynomials. They are well-studied polynomials with many special properties. Yesterday’s post on Chebyshev polynomials defined Tn(x) as the nth degree polynomial for which

Tn(cos θ) = cos nθ.

That post didn’t prove that the right hand side is a polynomial, but this post did. The polynomials Tn(x) are known as Chebyshev polynomials of the first kind, or sometimes simply Chebyshev polynomials since they come up in application more often than the other kinds.

Yesterday’s post also defined Chebyshev polynomials of the second kind by

Un(cos θ) sin θ = sin (n+1)θ.

So when we say cos nθ can be written as a polynomial in cos θ, we can be more specific: that polynomial is Tn.

And when we say sin nθ can be written as sin θ times a polynomial in cos θ, we can also be more specific:

sin nθ = sin θ Un−1(cos θ).

Solving trigonometric equations

A couple years ago I wrote about systematically solving trigonometric equations. That post showed that any polynomial involving sines and cosines of multiples of θ could be reduced to a polynomial in sin θ and cos θ. The results in this post let us say more about this polynomial, that we can write it in terms of Chebyshev polynomials. This might allow us to apply some of the numerous identities these polynomials satisfy and find useful structure.

Related posts

Russian Morse Code

I read something once about an American telegraph operator who had to switch over to using Russian Morse code during WWII. I wondered how hard that would be, but let it go. The idea came back to me and I decided to settle it.

It would be hard to switch from being able to recognize English words to being able to recognize Russian words, but that’s not the the telegrapher had to do. He had to switch from receiving code groups made of English letters to ones made of Russian letters.

Switching from receiving encrypted English to encrypted Russian is an easier task than switching from English plaintext to Russian plaintext. Code groups were transmitted at a slower speed than words because you can never learn to recognize entire code groups. Also, every letter of a code group is important; you cannot fill in anything from context.

Russian Morse code consists largely of the same sequences of dots and dashes as English Morse code, with some additions. For example, the Russian letter Д is transmitted in Morse code as -.. just like the English letter D. So our telegraph operator could hear -.. and transcribe it as D, then later change D to Д.

The Russian alphabet has 33 letters so it needs Morse codes for 7 more letters than English. Actually, it uses 6 more symbols, transmitting Е and Ё with the same code. Some of the additional codes might have been familiar to our telegrapher. For example Я is transmitted as .-.- which is the same code an American telegrapher would use for ä (if he bothered to distinguish ä from a).

All the additional codes used in Russian correspond to uncommon Latin symbols (ö, ch, ñ, é, ü, and ä) and so our telegrapher could transcribe Russian Morse code without using any Latin letters.

The next question is how the Russian Morse code symbols correspond to the English. Sometimes the correspondence is natural. For example, Д is the same consonant sound as D. But the correspondence between Я and ä is arbitrary.

I wrote about entering Russian letters in Vim a few weeks ago, and I wondered how the mapping of Russian letters to English letters implicit in Morse code corresponds to the mapping used in Vim.

Most Russian letters can be entered in Vim by typing Ctrl-k followed by the corresponding English letter and an equal sign. The question is whether Morse code and Vim have the same idea of what corresponds to what. Many are the same. For example, both agree that Д corresponds to D. But there are some exceptions.

Here’s the complete comparison.

     Vim   Morse
А    A=    A
Б    B=    B
В    V=    W
Г    G=    G
Д    D=    D
Е    E=    E
Ё    IO    E
Ж    Z%    V
З    Z=    Z
И    I=    I
Й    J=    J
К    K=    K
Л    L=    L
М    M=    M
Н    N=    N
О    O=    O
П    P=    P
Р    R=    R
С    S=    S
Т    T=    T
У    U=    U
Ф    F=    F
Х    H=    H
Ц    C=    C
Ч    C%    ö
Ш    S%    ch
Щ    Sc    Q
Ъ    ="    ñ
Ы    Y=    Y
Ь    %"    X
Э    JE    é
Ю    JU    ü
Я    JA    ä

Related posts

Posthumous Chebyshev Polynomials

Two families of orthogonal polynomials are named after Chebyshev because he explored their properties. These are prosaically named Chebyshev polynomials of the first and second kind.

I recently learned there are Chebyshev polynomials of the third and fourth kind as well. You might call these posthumous Chebyshev polynomials. They were not developed by Mr. Chebyshev, but they bear a family resemblance to the polynomials he did develop.

The four kinds of Chebyshev polynomials may be defined in order as follows.

\begin{align*} T_n(\cos\theta) &= \cos n\theta \\ U_n(\cos\theta) &= \frac{\sin (n+1)\theta}{\sin \theta} \\ V_n(\cos\theta) &= \frac{\cos \left(n+\frac{1}{2}\right)\theta}{\cos \frac{1}{2}\theta} \\ W_n(\cos\theta) &= \frac{\sin \left(n+\frac{1}{2}\right)\theta}{\sin \frac{1}{2}\theta} \\ \end{align*}

It’s not obvious that these definitions even make sense, but in each case the right hand side can be expanded into a sum of powers of cos θ, i.e. a polynomial in cos θ. [1]

All four kinds of Chebyshev polynomials satisfy the same recurrence relation

P_n(x) = 2x\,P_{n-1}(x) - P_{n-2}(x)

for n ≥ 2 and P0 = 1 but with different values of P1, namely x, 2x, 2x − 1, and 2x + 1 respectively [2].

Plots

We can implement Chebyshev polynomials of the third kind using the recurrence relation above.

def V(n, x):
    if n == 0: return 1
    if n == 1: return 2*x - 1
    return 2*x*V(n-1, x) - V(n-2, x)

Here is a plot of Vn(x) for n = 0, 1, 2, 3, 4.

The code for implementing Chebyshev polynomials of the fourth kind is the same, except the middle line becomes

    if n == 1: return 2*x + 1

Here is the corresponding plot.

Square roots

The Chebyshev polynomials of the first and third kind, and polynomials of the second and fourth kind, are related as follows:

\begin{align*} V_n(x)&=\sqrt\frac{2}{1+x}T_{2n+1}\left(\sqrt\frac{x+1}{2}\right) \\ W_n(x)&=U_{2n}\left(\sqrt\frac{x+1}{2}\right) \end{align*}

To see that the expressions on the right hand side really are polynomials, note that Chebyshev polynomials of the first and second kinds are odd for odd orders and even for even orders [3]. This means that in the first equation, every term in T2n + 1 has a factor of √(1 + x) that is canceled out by the 1/√(1 + x) term up front. In the second equation, there are only even powers of the radical term so all the radicals go away.

You could take the pair of equations above as the definition of Chebyshev polynomials of the third and fourth kind, but the similarity between these polynomials and the original Chebyshev polynomials is more apparent in the definition above using sines and cosines.

The square roots hint at how these polynomials first came up in applications. According to [2], Chebyshev polynomials of the third and fourth kind

have been called “airfoil polynomials”, since they are appropriate for approximating the single square root singularities that occur at the sharp end of an airfoil.

Dirichlet kernel

There’s an interesting connection between Chebyshev polynomials of the fourth kind and Fourier series.

The right hand side of the definition of Wn is known in Fourier analysis as Dn, the Dirichlet kernel of order n.

D_n(\theta) = \frac{\sin \left(n+\frac{1}{2}\right)\theta}{\sin \frac{1}{2}\theta}

The nth order Fourier series approximation of f, i.e. the sum of terms −n through n in the Fourier series for f is the convolution of f with Dn, times 2π.

(D_n * f)(\theta) = 2\pi \sum_{k=-n}^n \hat{f}(k) \exp(ik\theta)

Note that Dn(θ) is a function of θ, not of x. The equation Wn(cos θ) = Dn(θ) defines Wn(x) where x = cos θ. To put it another way, Dn(θ) is not a polynomial, but it can be expanded into a polynomial in cos θ.

Related posts

[1] Each function on the right hand side is an even function, which implies it’s at least plausible that each can be written as powers of cos θ. In fact you can apply multiple angle trig identities to work out the polynomials in cos θ.

[2] J.C. Mason and G.H. Elliott. Near-minimax complex approximation by four kinds of Chebyshev polynomial expansion. Journal of Computational and Applied Mathematics 46 (1993) 291–300

[3] This is not true of Chebyshev polynomials of the third and fourth kind. To see this note that V1(x) = 2x − 1, and W1(x) = 2x + 1, neither of which is an odd function.

Sparse binary Pythagorean triples

I recently ran across an interesting family of Pythagorean triples [1].

\begin{align*} a &= 2^{4n} + 2^{2n+1} \\ b &= 2^{4n} - 2^{4n-2} - 2^{2n} - 1 \\ c &= 2^{4n} + 2^{4n-2} + 2^{2n} + 1 \\ \end{align*}

You can verify that a² + b² = c² for all n.

Sparseness

When written in binary, a has only two bits set, and c has only four bits set.

It’s not as immediately obvious, but b has only two bits that are not set.

For example, here’s what we get writing the Pythagorean triple (a, b, c) in binary when n = 5.

(100000000100000000000,  10111111101111111111, 101000000010000000001)

In linear algebra, we say a matrix is sparse if most of its entries are zeros. The word “sparse” is used similarly in other areas, generally meaning something contains a lot of zeros. So in that sense a and c are sparse.

I suppose you could call b dense since its binary representation is a string of almost all 1s. Or you could say that it is sparse in that has little variation in symbols.

Aside from sparseness, these triples touch on a couple other ideas from the overlap of math and computer science.

One’s complement

The number b is the one’s complement of c, i.e. if you flip every bit in c then you obtain b (with a leading zero).

More precisely, one’s complement is given relative to a number of bits N. The N-bit one’s complement of x equals 2Nx.

In our case b is the (4n + 1)-bit one’s complement of c. Also c is the (4n + 1)-bit one’s complement of b because one’s complement is its own inverse, i.e. it is an involution.

Run-length encoding

The binary representations of ab, and c are highly compressible strings when n is large. Run-length encoding (RLE) represents each string compactly.

RLE simply describes a string by stating symbols and how many times each is repeated. So to compute the run-length encoding of

100000000100000000000,10111111101111111111,101000000010000000001

from the example above, you’d observe one 1, eight 0s, one 1, eleven zeros, etc.

There’s ambiguity writing the RLE of a sequence of digits unless you somehow put the symbols and counts in a different namespace. For example, if we write 1180110 we intend this to be read as above, but someone could read this as 180 1s followed by 1o 1s.

Let’s replace 0s with z (for zero) and 1s with u (for unit) so our string will not contain any digits.

uzzzzzzzzuzzzzzzzzzzz,uzuuuuuuuzuuuuuuuuuu,uzuzzzzzzzuzzzzzzzzzu

Then the RLE of the string is

uz8uz11,uzu7zu10,uzuz7uz9u

Here a missing count is implicitly 1. So uz8… is read as u, followed by z repeated 8 times, etc.

As n increases, the length of the binary string grows much faster than the length of the corresponding RLE.

Exercise for the reader: What is the RLE of the triple for general n?

Related posts

[1] H. S. Uhler. A Colossal Primitive Pythagorean Triangle. The American Mathematical Monthly, Vol. 57, No. 5 (May, 1950), pp. 331–332.

DeepSeek-R1: Do we need less compute now?

 

The reactions to the new DeepSeek-R1 AI model in recent days seem limitless. Some say it runs so much faster than existing models that we will no longer need the billions of dollars in compute hardware that big tech is preparing to buy.

Is that plausible?

To get an answer, we need only look back at the experience of the recently-completed Exascale Computing Project. This large scale multi-lab project was tasked with developing technology (primarily software) to prepare for exascale computing, which has recently been achieved by Frontier, Aurora and El Capitan.

During the course of the project, various algorithm and implementation improvements were discovered by the the science teams, these leading to as much as 60X speedup or more, over and above speedups possible from hardware alone [1]. In response, are the teams just running the very same problems faster on older hardware? No — instead, they are now able to run much, much larger problems than previously possible, exploiting both hardware and software improvements.

Or suppose today there were no such thing as the fast Fourier transform (FFT) and scientists were computing Fourier transforms using (essentially) large dense matrix-vector products. If someone then discovered the FFT, I’d guarantee you that scientists would not only say, (1) “Wow, now I can run my existing problems much, much faster,” but also, (2) “Wow, now I can run problems much larger than I ever dreamed and solve problems larger than I could have ever imagined!”

Paradoxically, faster algorithms might even increase the demand for newer, faster hardware. For example, a new faster algorithm for designing medications to cure cancer might be judged so important that it’s worth building the largest machine possible to run it effectively.

All this is not to say whether you should buy or sell Nvidia stock right now. However, it does mean that there is no simplistic argument that faster algorithms and implementations necessarily lead to lower spend on computing hardware. History shows that sometimes this is not true at all. The smart money, on the other hand, is on research teams that are able to exploit any and every new discovery to improve what is possible with their codes, whether by hardware, data, code optimizations or algorithms.

Notes

[1] See slide 9 from Doug Kothe’s talk, “Exascale and Artificial Intelligence: A Great Marriage“. The “Figure of Merit” (FOM) number represents speedup of science output from an application compared to an earlier baseline system. Specifically, a FOM speedup of 50X is the anticipated speedup from baseline due to efficient use of hardware only, for example, on Frontier compared to the earlier OLCF Titan system.

Matrix representations of number systems

The previous post discussed complex numbers, dual numbers, and double numbers. All three systems are constructed by adding some element to the real numbers that has some special algebraic property. The complex numbers are constructed by adding an element i such that i² = −1. The dual numbers add an element ε ≠ 0 with ε² = 0, and the double numbers are constructed by adding j ≠ 1 with j² = 1.

If adding special elements seems somehow illegitimate, there is an alternative way to define these number systems that may seem more concrete using 2 × 2 matrices. (A reader from 150 years ago would probably be more comfortable with appending special numbers than with matrices, but now we’re accustomed to matrices.)

The following mappings provide isomorphisms between complex, dual, and double numbers and their embeddings in the ring of 2 × 2 matrices.

\begin{align*} a + ib &\leftrightarrow \begin{pmatrix} a & -b \\ b & a \end{pmatrix} \\ a + \varepsilon b &\leftrightarrow \begin{pmatrix} a & b \\ 0 & a \end{pmatrix} \\ a + jb &\leftrightarrow \begin{pmatrix} a & b \\ b & a \end{pmatrix} \\ \end{align*}

Because the mappings are isomorphisms, you can translate a calculation in one of these number systems into a calculation involving real matrices, then translate the result back to the original number system. This is conceptually interesting, but it could also be useful if you’re using software that supports matrices but does not directly support alternative number systems.

You can also apply the correspondences from right to left. If you need to carry out calculations on matrices of the special forms above, you could move over to complex (or dual, or double) numbers, do your algebra, then convert the result back to matrices.

Functions of a matrix

The previous post looked at variations on Euler’s theorem in complex, dual, and double numbers. You could verify these three theorems by applying exp, sin, cos, sinh, and cosh to matrices. In each case you define the function in terms of its power series and stick in matrices. You should be a little concerned about convergence, but it all works out.

You should also be concerned about commutativity. Multiplication of real numbers is commutative, but multiplication of matrices is not, so you can’t just stick matrices into any equation derived for real numbers and expect it to hold. For example, it’s not true in general that exp(A + B) equals exp(A) exp(B). But it is true if the matrices A and B commute, and the special matrices that represent complex (or dual, or double) numbers do commute.

Related posts

Euler’s formula for dual numbers and double numbers

The complex numbers are formed by adding an element i to the real numbers such that i² = − 1. We can create other number systems by adding other elements to the reals.

One example is dual numbers. Here we add a number ε ≠ 0 with the property ε² = 0. Dual numbers have been used in numerous applications, most recently in automatic differentiation.

Another example is double numbers [1]. Here we add a number j ≠ ±1 such that j² = 1. (Apologies to electrical engineers and Python programmers. For this post, j is not the imaginary unit from complex numbers.)

(If adding special numbers to the reals makes you uneasy, see the next post for an alternative approach to defining these numbers.)

We can find analogs of Euler’s formula

\exp(i\theta) = \cos(\theta) + i \sin(\theta)

for dual numbers and double numbers by using the power series for the exponential function

\exp(z) = \sum_{k=0}^\infty \frac{z^k}{k!}

to define exp(z) in these number systems.

For dual numbers, the analog of Euler’s theorem is

\exp(\varepsilon x) = 1 + \varepsilon x

because all the terms in the power series after the first two involve powers of ε that evaluate to 0. Although this equation only holds for dual numbers, not real numbers, it is approximately true of ε is a small real number. This is the motivation for using ε as the symbol for the special number added to the reals: Dual numbers can formalize calculations over the reals that are not formally correct.

For double numbers, the analog of Euler’s theorem is

\exp(j x) = \cosh(x) + j \sinh(x)

and the proof is entirely analogous to the proof of Euler’s theorem for complex numbers: Write out the power series, then separate the terms involving even exponents from the terms involving odd exponents.

Related posts

[1] Double numbers have also been called motors, hyperbolic numbers, split-complex numbers, spacetime numbers, …

Tricks for radix conversion by hand

The simplest trick for converting from one base to another is grouping. To convert between base b and base bk, group numbers in sets of k and convert one group at a time. To convert from binary to octal, for instance, group bits in sets of three, starting from the right end, and convert each group to octal.

11110010two → (11)(110)(010) → 362eight

For an example going the other direction, let’s convert 476 in base nine to base three.

476nine → (11)(21)(20) → 112120three

In general, conversion between bases is too tedious to do by hand, but one important case where it’s a little easier than it could be is converting between decimal and octal. In combination with the grouping trick above, this means you could, for example, convert between decimal and binary by first converting decimal to octal. Then the conversion from octal to binary is trivial.

The key to converting between decimal and octal is to exploit the fact that 10 = 8 + 2, so powers of 10 become powers of (8 + 2), or powers of 8 become powers of (10 − 2). These tricks are easier to carry out than to explain. You can find descriptions and examples in Knuth’s TAOCP, volume 2, section 4.4.

Knuth cites a note by Walter Soden from 1953 as the first description of the trick for converting octal to decimal.

The trick for moving between base 9 and base 10 (or by grouping, between base 3 and base 10) is simpler and left as an exercise by Knuth. (Problem 12 in section 4.4, with a solution given at the back of the volume.)

Related posts

Double rounding

The previous post started by saying that rounding has a surprising amount of detail. An example of this is double rounding: if you round a number twice, you might not get the same result as if you rounded directly to the final precision.

For example, let’s say we’ll round numbers ending in 0, 1, 2, 3, or 4 down, and numbers ending in 5, 6, 7, 8, or 9 up. Then if we have a number like 123.45 and round it to one decimal place we have 123.5, and if we round that to an integer we have 124. But if we had rounded 123.45 directly to an integer we would have gotten 123. This is not a mere curiosity; it comes up fairly often and has been an issue in lawsuits.

The double rounding problem cannot happen in odd bases. So, for example, if you have some fraction represented in base 7 and you round it first from three figures past the radix point to two, then from two to one, you’ll get the same result as if you directly rounded from three figures to one. Say we start with 4231.243seven. If we round it to two places we get 4231.24seven, and if we round again to one place we get 4231.3seven, the same result we would get by rounding directly from three places to one.

The reason this works is that you cannot represent ½ by a finite expression in an odd base.