with *n* terms can be written as the composition

where

As discussed in the previous post, a Möbius transformation can be associated with a matrix. And the composition of Möbius transformations is associated with the product of corresponding matrices. So the continued fraction at the top of the post is associated with the following product of matrices.

The previous post makes precise the terms “associated with” above: Möbius transformations on the complex plane ℂ correspond to linear transformations on the projective plane *P*(ℂ). This allows us to include ∞ in the domain and range without resorting to hand waving.

Matrix products are easier to understand than continued fractions, and so moving to the matrix product representation makes it easier to prove theorems.

where *ad* – *bc* ≠ 0 is sometimes called a **fractional linear** transformation or a **bilinear** transformation. I usually use the name Möbius transformation.

In what sense are Möbius transformations **linear** transformations? They’re nonlinear functions unless *b* = *c* = 0. And yet they’re analogous to linear transformations. For starters, the condition *ad* – *bc* ≠ 0 appears to be saying that a determinant is non-zero, i.e. that a matrix is non-singular.

The transformation *g* is closely associated with the matrix

but there’s more going on than a set of analogies. The reason is that Möbius transformation **are** linear transformations, but not on the complex numbers ℂ.

When you’re working with Möbius transformations, you soon want to introduce ∞. Things get complicated if you don’t. Once you add ∞ theorems become much easier to state, and yet there’s a nagging feeling that you may be doing something wrong by informally introducing ∞. This feeling is justified because tossing around ∞ without being careful can lead to wrong conclusions.

So how can we rigorously deal with ∞? We could move from numbers (real or complex) to **pairs** of numbers, as with fractions. We replace the complex number *z* with the equivalence class of all pairs of complex numbers whose ratio is *z*. The advantage of this approach is that you get to add one special number, the equivalence class of all pairs whose second number is 0, i.e. fractions with zero in the denominator. This new number system is called *P*(ℂ), where “*P*” stands for “projective.”

Möbius transformations are projective linear transformations. They’re linear on *P*(ℂ), though not on ℂ.

When we multiply the matrix above by the column vector (*z* 1)^{T} we get

and since our vectors are essentially fractions, the right hand side corresponds to *g*(*z*) if the second component of the vector, *cz* + *d, *is not zero.

If *cz* + *d* = 0, that’s OK. Everything is fine while we’re working in *P*(ℂ), but we get an element of *P*(ℂ) that does not correspond to an element of ℂ, i.e. we get ∞.

We’ve added ∞ to the domain and range of our Möbius transformations without any handwaving. We’re just doing linear algebra on finite complex numbers.

There’s a little bit of fine print. In *P*(ℂ) we can’t allow both components of a pair to be 0, and non-zero multiples of the same vector are equivalent, so we’re not quite doing linear algebra. Strictly speaking a Möbius transformation is a **projective** linear transformation, not a linear transformation.

It takes a while to warm up to the idea of moving from complex numbers to equivalence classes of pairs of complex numbers. The latter seems unnecessarily complicated. And it often **is** unnecessary. In practice, you can work in *P*(ℂ) by thinking in terms of ℂ until you need to have to think about ∞. Then you go back to thinking in terms of *P*(ℂ). You can think of *P*(ℂ) as ℂ with a safety net for working rigorously with ∞.

Textbooks usually introduce higher dimensional projective spaces before speaking later, if ever, of one-dimensional projective space. (Standard notation would write *P*¹(ℂ) rather than *P*(ℂ) everywhere above.) But one-dimensional projective space is easier to understand by analogy to fractions, i.e. fractions whose denominator is allowed to be zero, provided the numerator is not also zero.

I first saw projective coordinates as an unmotivated definition. “Good morning everyone. We define *P*^{n}(ℝ) to be the set of equivalence classes of ℝ^{n+1} where ….” There had to be some payoff for this added complexity, but we were expected to delay the gratification of knowing what that payoff was. It would have been helpful if someone had said “The extra coordinate is there to let us handle points at infinity consistently. These points are not special at all if you present them this way.” It’s possible someone *did* say that, but I wasn’t ready to hear it at the time.

Naming a language after its creators shows a certain paucity of imagination. In our defense, we didn’t have a better idea, and by coincidence, at some point in the process we were in three adjacent offices in the order Aho, Weinberger, and Kernighan.

By the way, here’s a nice line from near the end of the book.

The post Naming Awk first appeared on John D. Cook.]]>Realistically, if you’re going to learn only one programming language, Python is the one. But for small programs typed at the command line, Awk is hard to beat.

The geometric mean of two numbers is the square root of their product. For example, the geometric mean of 9 and 25 is 15.

More generally, the geometric mean of a set of *n* numbers is the *n*th root of their product.

Alternatively, the geometric mean of a set of *n* numbers the exponential of their average logarithm.

The advantage of the alternative definition is that it extends to integrals. The geometric mean of a function over a set is the exponential of the average value of its logarithm. And the average of a function over a set is its integral over that set divided by the measure of the set.

The Mahler measure of a polynomial is the geometric mean over the unit circle of the absolute value of the polynomial.

The Mahler measure equals the product of the absolute values of the leading coefficient and roots outside the unit circle. That is, if

then

Let *p*(*z*) = 7(*z* − 2)(*z* − 3)(z + 1/2). Based on the leading coefficient and the roots, we would expect *M*(*p*) to be 42. The following Mathematica code shows this is indeed true by returning 42.

z = Exp[2 Pi I theta] Exp[Integrate[Log[7 (z - 2) (z - 3) (z + 1/2)], {theta, 0, 1}]]

Mahler measure is multiplicative: for any two polynomials *p* and *q*, the measure of their product is the product of their measures.

A few days ago I wrote about height functions for rational numbers. Mahler measure is a height function for polynomials, and there are theorems bounding Mahler measure by other height functions, such as the sum or maximum of the absolute values of the coefficients.

The Gauss map [1] is the function

where ⌊*y*⌋ is the floor of *y*, the greatest integer no larger than *y*.

I’ve written about this map a couple times before. First, I wrote about how this map is measure-preserving. Second, I wrote about the image at the top of the post, based on Michael Trott’s idea of extending the floor function to the complex plane and plotting it.

This post is a third take on the Gauss map, expanding on a comment by Giovanni Panti. His paper opens by saying

The fact that the euclidean algorithm eventually terminates is pervasive in mathematics. In the language of continued fractions, it can be stated by saying that the orbits of rational points under the Gauss map

x↦x^{−1}−⌊x^{−1}⌋ eventually reach zero.

What does the Gauss map have to do with continued fractions or the Euclidean algorithm? We’ll show this by working through an example.

A continued fraction has the form

Let’s start with 162/47 and see how we would write it as a continued fraction. An obvious place to start would be to write this as a proper fraction.

Next we turn 21/47 into 1 over something.

Now let’s do the same thing with 47/21: turn it into a proper fraction 2 + 5/21, then rewrite the fraction part 5/21 as the reciprocal of its reciprocal:

Finally, we write 21/5 as 4 + 1/5, and we’re done:

Now go back and look at what happens to the fraction in the bottom left corner at each step:

The sequence of bottom left fractions is 21/47, 5/21, 1/5. Each fraction is replaced by its Gauss map: *f*(21/47) = 5/21, and *f*(5/21) = 1/5. We applied the Gauss map above naturally in the process of creating a continued fraction.

Now suppose we wanted to find the greatest common divisor of 162 and 47 using the Euclidean algorithm.

Notice that these are the same numbers, produced by the same calculations as above.

[1] There are other things called the Gauss map, such as the map that takes a point on a surface to the unit normal at that point. That’s not the Gauss map we’re talking about here.

The post Gauss map, Euclidean algorithm, and continued fractions first appeared on John D. Cook.]]>… we can say this in fancier terms. Fix a field

k…. We say that an elliptic curveEdefined overkis that functor which …

Well that *is* fancy. But what does it mean?

A functor is a pair of functions [2]. At the base level, a functor takes objects in one category to objects in another category. The quote above continues

… that functor which associates fields

Kcontainingkto an algebraic set of the form …

The key word here is “associates.” That must be our function between objects. Our functor maps fields containing *k* to a certain kind of algebraic set. So the domain of our functor must be fields containing the field *k*, or fields more generally if you take “containing *k*” to apply on both sides for all fields *k*.

Categories are more than objects, and functors are more than morphisms.

A category consists of objects and morphisms between those objects. So our category of fields must also contain some sort of morphisms between fields, presumably field homomorphisms. And our category of algebraic sets must have some kind of morphisms between algebraic sets that preserve the algebraic structure.

The functor between fields and algebraic sets must map fields to algebraic sets, and maps between fields to maps between algebraic sets, in a structure-preserving way.

The algebraic sets are what we normally think of as elliptic curves, but the author is saying we can think of this *functor* as an elliptic curve. This is an example of **categorification**: taking something that doesn’t initially involve category theory, then building a categorical scaffold around it.

Why do this? In order to accentuate structure that is implicit before the introduction of category theory. In our case, that implicit structure has to do with fields.

The most concrete way to think of an elliptic curve is over a single field, but we can look at the same curve over larger fields as well. We might start, for example, by thinking of a curve over the rational numbers ℚ, then extending our perspective to include the real numbers ℝ, and then extending it again to include the complex numbers ℂ.

The categorification of elliptic curves emphasizes that things behave well when we go from thinking of an elliptic curve as being over a field *k* to thinking of it as a field over an extension field *K* that contains *k*.

A functor between fields (and their morphisms) and algebraic sets (of a certain form, along with their morphisms) has to act in a “structure-preserving way” as we said above. This means that morphisms between fields carry over to morphisms between these special algebraic sets in a way that has all the properties one might reasonably expect.

This post has been deliberately short on details, in part because the line in [1] that it is expanding on is short on details. But we’ve seen how you might tease out a passing comment that something is a functor. You know the right questions to ask if you want to look into this further: what exactly are the morphisms in both categories, and does functorality tell us about elliptic curves as we change fields?

There are many ways to categorify something, some more useful than others. Useful categorifications express some structure; some fact that was a theorem before categorification becomes a corollary of the new structure after categorification. There are other ways to categorify elliptic curves, each with their own advantages and disadvantages.

- Category theory without categories
- Category theory make a little easier for programmers
- Applied category theory

[1] Edray Herber Goins. The Ubiquity of Elliptic Curves. Notices of the American Mathematical Society. February 2019.

[2] Strictly speaking, a pair of “morphisms” that may or may not be functions.

The post An elliptic curve is a functor first appeared on John D. Cook.]]>If one of *P* or *Q* is the point at infinity …

Else if *P* = *Q* …

Else if *P* and *Q* lie on a vertical line …

Else …

It would seem that an algorithm for adding points would have to have the same structure, which is unfortunate for at least a couple reasons: it adds complexity, and it raises the possibility of a timing attack since some branches will execute faster than others.

However, an algebraic formula for addition need not mirror its geometric motivation.

Jacobian coordinates are a different way to describe elliptic curves. They have the advantage that addition formulas have fewer logic branches than the geometric description. They may have one test of whether a point is the point at infinity but they don’t have multiple branches.

Jacobian coordinates also eliminate the need to do division. The geometric description of addition on an elliptic curve involves calculating where the line through two points intersects the curve, and this naturally involves calculating slopes. But in Jacobian coordinates, addition is done using only addition and multiplication, no division. When you’re working over the field of integers modulo some gigantic prime, multiplication and addition are much faster than division.

You can find much more on Jacobian coordinates, including explicit formulas and operation counts, here.

Sometimes it’s possible to completely eliminate logic branches as well as division operations. An elliptic curve addition formula is called **complete** if it is valid for all inputs. The first surprise is that complete addition formulas sometimes exist. The second surprise is that complete addition formulas *often* exist.

You can find much more on complete addition formulas in the paper “Complete addition formulas for prime order elliptic curves” available here.

Whether the Jacobian coordinate formulas or complete formulas are more or less complex than direct implementation of the geometric definition depends on how you define complexity. The formulas are definitely less complex in terms of **McCabe complexity**, also known as **cyclomatic complexity**. But they are more complex in terms of **Kolmogorov complexity**.

The McCabe complexity of a function is essentially the number of independent paths through the function. A complete addition formula, one that could be implemented in software with no branching logic, has the smallest possible McCabe complexity.

I’m using the term Kolmogorov complexity to mean simply the amount of code it takes to implement a function. It’s intuitively clear what this means. But if you make it precise, you end up with a measure of complexity that is only useful in theory and cannot be calculated in practice.

Instead of literally computing Kolmogorov complexity you’d count the number of multiplications and additions necessary to execute a formula. The links above do just that. If you know how many cycles it takes to execute an addition or multiplication (in the field you’re working over), and how many cycles it would take to carry out addition (on the elliptic curve) implemented directly from the definition, include the division required, then you could estimate whether these alternative formulas would save time.

It used to be common to count how many floating point operations it would take to execute an algorithm. I did that back when I took numerical analysis in college. But this became obsolete not long after I learned it. Counting floating point operations no longer tells you as much about an algorithm’s runtime as it used to due to changes in the speed of arithmetic relative to memory access and changes in computer architecture. However, in the context of this post, counting operations still matters because each operation—such as adding two 512-bit numbers modulo a 512-bit prime—involves a lot of more basic operations.

Such informal statements can be made more precise using height functions. There are a variety of height functions designed for different applications, but the most common height function defines the height of a fraction *p*/*q* in lowest terms to be the sum of the numerator and denominator:

height(*p*/*q*) = |*p*| + |*q*|.

This post will look at how this applies to musical intervals, to approximations for π, and the number of days in a year.

Here are musical intervals, ranked by the height of their just tuning frequency ratios.

- octave (2:1)
- fifth (3:2)
- forth (4:3)
- major sixth (5:3)
- major third (5:4)
- minor third (6:5)
- minor sixth (8:5)
- minor seventh (9:5)
- major second (10:9)
- major seventh (15:8)
- minor second (16:15)
- augmented fourth (45:32)

The least tension is an interval of an octave. The next six intervals are considered consonant. A minor seventh is considered mildly dissonant, and the rest are considered more dissonant. The most dissonant interval is the augmented fourth, also known as a tritone because it is the same interval as three whole steps.

Incidentally, a telephone busy signal consists of two pitches, 620 Hz and 480 Hz. This is a ratio of 24:31, which has a height of 54. This is consistent with the signal being moderately dissonant.

The first four continued fraction approximations to π are 3, 22/7, 333/106, and 335/113.

Continued fraction convergents give the best rational approximation to an irrational for a given denominator. But for a height value that is not the height of a convergent, the best approximation might not be a convergent.

For example, the best approximation to π with height less than or equal to 333 + 106 is 333/106. But the best approximation with height less than or equal to 400 is 289/92, which is not a convergent of the continued fraction for π.

The number of days in a year is 365.2424177. Obviously that’s close to 365 1/4, and 1/4 is the best approximation to 0.2424177 for its height.

The Gregorian calendar has 97 leap days every 400 years, which approximates 0.2424177 with 97/400. This approximation has practical advantages for humans, but 8 leap days every 33 years would be a more accurate approximation with a much smaller height.

Encryption can be analogous. The time it takes to encrypt data can leak information about the data being encrypted. It probably won’t reveal the data per se, but it may reveal enough about the data or the encryption process to reduce the effort needed to break the encryption.

There are two ways of thwarting timing attacks. One is to try to make the encryption take the same amount of time, independent of the data. This would prevent an attacker from inferring, for example, which branch of an algorithm was taken if one branch executes faster than the other.

If the encryption process always takes the same amount of time, then the execution time of the encryption process carries no information. But its enough that the execution time carries no *useful* information.

It may be easier to make execution time uncorrelated with content than to make execution time constant. Also, keeping the execution time of an algorithm constant may require making the program always run as slow as the worst case scenario. You may get faster average execution time by allowing the time to vary in a way that is uncorrelated with any information useful to an attacker.

One example of this would be Garner’s algorithm used in decrypting RSA encoded messages.

Suppose you’re using RSA encryption with a public key *e*, private key *d*, and modulus *n*. You can decrypt cyphertext *c* to obtain the cleaertext *m* by computing

*m* = *c*^{d} mod *n*.

An alternative would be to compute a random message *r* and decrypt *r*^{e}*c*:

(*r*^{e}*c*)^{d} = *r*^{ed} *c*^{d} = *rm* mod *n*

then multiply by the inverse of *r* mod *n* to obtain *m*. Because *r* is random, the time required to decrypt *r*^{e}*c* is uncorrelated with the time required to decrypt *c*.

How much smaller are we talking about? According to NIST recommendations, a 256-bit elliptic curve curve provides about the same security as working over a 3072-bit finite field. Not only are elliptic curves smaller, they scale better. A 512-bit elliptic curve is believed to be about as secure as a 15360-bit finite field: a factor of 2x for elliptic curves and a factor of 5x for finite fields.

The core idea of Diffie-Hellman is to pick a group *G*, an element *g*, and a large number of *x*. If *y* is the result of starting with *x* and applying the group operation *x* times, it is difficult to recover *x* from knowing *y*. This is called the discrete logarithm problem, taking its name from the case of the group operation being multiplication. But the inverse problem is still called the discrete logarithm problem when the group is additive.

In FFDHE the group *G* is the multiplicative group of a generator *g* modulo a large prime *p*. Applying the group operation (i.e. multiplication) to *g* a number of times *x* is computing

*y* = *g ^{x}*

and *x* is rightly called a discrete logarithm; the process is directly analogous to taking the logarithm of a real number.

In ECDHE the group is given by addition on an elliptic curve. Applying the group operation *x* times to *g*, adding *g* to itself *x* times, is written *xg*. The problem of recovering *x* from *xg* is still called the discrete *logarithm* problem, though you could call it the discrete “division” problem.

Some groups are unsuited for Diffie-Hellman cryptography because the discrete logarithm problem is easy. If we let *G* be the *additive* group modulo a prime (not the multiplicative group) then it is easy to recover *x* from *xg*.

Note that when we talk about applying the group operation a large number of times, we mean a *really* large number of times, *in theory*, though not in practice. If you’re working over an elliptic curve with on the order of 2^{256} elements, and *x* is on the order of 2^{256}, then *xg* is the result of adding *x* to itself on the order of 2^{256} times. But in practice you’d double *g* on the order of 256 times. See fast exponentiation.

In the post on FFDHE we said that you have to be careful that your choice of prime and generator doesn’t give the group structure that a cryptanalysist could exploit. This is also true for the elliptic curves used in ECDHE, and even more so because elliptic curves are more subtle than finite fields.

If large-scale quantum computing ever becomes practical, Diffie-Hellman encryption will be broken because a quantum computer can solve discrete logarithm problems efficiently via Schor’s algorithm. This applies equally to finite fields and elliptic curves.

The starting point is a large prime *p* and a generator 1 < *g* < *p*.

Alice generates a large random number *x*, her private key, and sends Bob *g*^{x} mod *p*.

Similarly, Bob generates a large random number *y*, his private key, and sends Alice *g*^{y} mod *p*.

Alice takes *g*^{y} and raises it to her exponent *x*, and Bob takes *g*^{x} and raises it to the exponent *y*. They arrive at a common key *k* because

*k* = (*g*^{y})^{x} = (*g*^{x})^{y} mod *p*.

The security of the system rests on the assumption that the discrete logarithm problem is hard, i.e. given *g* and *g*^{z} it is computationally impractical to solve for *z*. This assumption appears to be true in general, but can fail when the group generated by *g* has exploitable structure.

You can read more about Diffie-Hellman here.

The choice of prime *p* and generator *g* can matter is subtle ways and so there are lists of standard choices that are believed to be secure.

IETF RFC 7919 recommends five standard primes. These have the form

where *b* is the size of *p* in bits, *e* is the base of natural logarithms, and *X* is the smallest such that *p* is a safe prime. In every case the generator is *g* = 2.

The values of *b* are 2048, 3072, 4096, 6144, and 8192. The values of *X* and *p* are given in RFC 7919, but they’re both determined by *b*.

I don’t imagine there’s anything special about the constant *e* above. I suspect it’s there to shake things up a bit in a way that doesn’t appear to be creating a back door. Another irrational number like π or φ would probably do as well, but I don’t know this for sure.

The recommended primes have names of the form “ffdhe” followed by *b*. For *b* = 2048, the corresponding value is *X* is 560316.

I wrote a little Python code to verify that this value of *X* does produce a safe prime and that smaller values of *X* do not.

#!/usr/bin/env python3 from sympy import isprime, E, N, floor b = 2048 e = N(E, 1000) c = floor(2**(b-130) * e) d = 2**b - 2**(b-64) + 2**64*c - 1 def candidate(b, x): p = d + 2**64*x return p for x in range(560316, 0, -1): p = candidate(b, x) if isprime(p) and isprime((p-1)//2): print(x)

This took about an hour to run. It only printed 560316, verifying the claim in RFC 7919.

Finite field Diffie-Hellman is so called because the integers modulo a prime form a finite field. We don’t need a field per se; we’re working in the group formed by the orbit of *g* within that field. Such groups need to be very large in order to provide security.

It’s possible to use Diffie-Hellman over any group for which the discrete logarithm problem is intractable, and the discrete logarithm problem is harder over elliptic curves than over finite fields. The elliptic curve groups can be smaller and provide the same level of security. Smaller groups mean smaller keys to exchange. For this reason, elliptic curve Diffie-Hellman is more commonly used than finite field Diffie-Hellman.

The post Finite field Diffie Hellman primes first appeared on John D. Cook.]]>The Chinese Remainder Theorem assures us that the system of congruences

has a unique solution mod *m*, but the theorem doesn’t say how to compute *x* efficiently.

H. L. Garner developed an algorithm to directly compute *x* [1]:

You compute the inverse of *q* mod *p* once and save it, then solving the system above for multiple values of *a* and *b* is very efficient.

Garner’s algorithm extends to more than two factors. We will present the general case of his algorithm below, but first we do a concrete example with RSA keys.

This is a continuation of the example at the bottom of this post.

This shows that the numbers in the key file besides those that are strictly necessary for the RSA algorithm are numbers needed for Garner’s algorithm.

What the key file calls “coefficient” is the inverse of *q* modulo *p*.

What the key file calls “exponent1” is the the decryption exponent *d* reduced mod *p*-1. Similarly, “exponent2” is *d* reduced mod *q*-1 as explained here.

from sympy import lcm prime1 = 0xf33514...d9 prime2 = 0xfee496...51 publicExponent = 65537 privateExponent = 0x03896d...91 coefficient = 0x4d5a4c...b7 # q^-1 mod p assert(coefficient*prime2 % prime1 == 1) exponent1 = 0x37cc69...a1 # e % phi(p) exponent2 = 0x2aa06f...01 # e % phi(q) assert(privateExponent % (prime1 - 1) == exponent1) assert(privateExponent % (prime2 - 1) == exponent2)

Garner’s algorithm can be used more generally when *m* is the product of more than two primes [2]. Suppose

where the *m*_{i} are pairwise relatively prime (not necessarily prime). Then the system of congruences

for *i* = 1, 2, 3, …, *n* can be solved by looking for a solution of the form

where

Again, in practice the modular inverses of the products of the *m*s would be precomputed and cached.

[1] Ferguson, Schneier, Kohno. Cryptography Engineering. Wiley. 2010.

[2] Geddes, Czapor, and Labahn. Algorithms for Computer Algebra. Kluwer Academic Publishers. 1992.

The post Chinese Remainder Theorem synthesis algorithm first appeared on John D. Cook.]]>You can carry out calculations mod *m* more efficiently by carrying out the same calculations mod *p* and mod *q*, then combining the results. We **analyze** *m* into its remainders by *p* and *q*, carry out our calculations, then **synthesize** the results to get back to a result mod *m*.

The Chinese Remainder Theorem (CRT) says that this synthesis is possible; Garner’s algorithm, the subject of the next post, shows how to compute the result promised by the CRT.

For example, if we want to multiply *xy* mod *m*, we can analyze *x* and *y* as follows.

Then

and by repeatedly multiplying *x* by itself we have

Now suppose *p* and *q* are 1024-bit primes, as they might be in an implementation of RSA encryption. We can carry out exponentiation mod *p* and mod *q*, using 1024-bit numbers, rather than working mod *n* with 2048-bit numbers.

Furthermore, we can apply Euler’s theorem (or the special case Fermat’s little theorem) to reduce the size of the exponents.

Assuming again that *p* and *q* are 1024-bit numbers, and assuming *e* is a 2048-bit number, by working mod *p* and mod *q* we can use exponents that are 1024-bit numbers.

We still have to put our pieces back together to get the value of *x*^{e} mod *n*, but that’s the subject of the next post.

The trick of working modulo factors is used to speed up RSA decryption. It cannot be used for encryption since *p* and *q* are secret.

The next post shows that is in fact used in implementing RSA, and that a key file contains the decryption exponent reduced mod *p*-1 and mod *q*-1.

*c* = *m*^{e} mod *n*

where *e* is part of the public key. In practice, *e* is usually 65537 though it does not have to be.

As we discussed in the previous post, not all messages *m* can be decrypted unless we require *m* to be relatively prime to *n*. In practice this is almost certainly the case: discovering a message *m* not relatively prime to *n* is equivalent to finding a factor of *n* and breaking the encryption.

If we limit ourselves to messages which can be encrypted and decrypted, our messages come not from the integers mod *n* but from the **multiplicative group** of integers mod *n*: the integers less than and relatively prime to *n* form a group *G* under multiplication.

The encryption function that maps *m* to *m*^{e} is an invertible function on *G*. Its inverse is the function that maps *c* to *c*^{d} where *d* is the private key. Encryption is an automorphism of *G* because

(*m*_{1} *m*_{2}) ^{e} = *m*_{1}^{e} *m*_{2}^{e} mod *n*.

Euler’s theorem tells us that

*m*^{φ(n)} = 1 mod *n*

for all *m* in *G*. Here φ is **Euler’s totient function**. There are φ(*n*) elements in *G*, and so we could see this as a consequence of **Lagrange’s theorem**: the order of an element divides the order of a group.

Now the order of a particular *m* might be less than φ(*n*). That is, we know that if we raise *m* to the exponent φ(*n*) we will get 1, but maybe a smaller exponent would do. In fact, maybe a smaller exponent would do for all *m*.

**Carmichael’s totient function** λ(*n*) is the smallest exponent *k* such that

*m*^{k} = 1 mod *n*

for all *m*. For some values of *n* the two totient functions are the same, i.e. λ(*n*) = φ(*n*). But sometimes λ(*n*) is strictly less than φ(*n*). And going back to Lagrange’s theorem, λ(*n*) always divides φ(*n*).

For example, there are 4 positive integers less than and relatively prime to 8: 1, 3, 5, and 7. Since φ(8) = 4, Euler’s theorem says that the 4th power of any of these numbers will be congruent to 1 mod 8. That is true, but its also true that the square of any of these numbers is congruent to 1 mod 8. That is, λ(8) = 2.

Now for RSA encryption, *n* = *pq* where *p* and *q* are large primes and *p* ≠ *q*. It follows that

φ(*pq*) = φ(*p*) φ(*q*) = (*p* − 1)(*q* − 1).

On the other hand,

λ(*pq*) = lcm( λ(*p*), λ(*q*) ) = lcm(*p *− 1, *q* − 1).

Since *p *− 1 and *q* − 1 at least share a factor of 2,

λ(*n*) ≤ φ(*n*)/2.

It’s possible that λ(*n*) is smaller than φ(*n*) by more than a factor of 2. For example,

φ(7 × 13) = 6 × 12 = 72

but

λ(7 × 13) = lcm(6, 12) = 12.

You could verify this last calculation with the following Python code:

>>> from sympy import gcd >>> G = set(n for n in range(1, 91) if gcd(n, 91) == 1) >>> set(n**12 % 91 for n in s)

This returns `{1}`

.

The significance of Carmichael’s totient to RSA is that φ(*n*) can be replaced with λ(*n*) when finding private keys. Given a public exponent *e*, we can find *d* by solving

*ed* = 1 mod λ(*n*)

rather than

*ed* = 1 mod φ(*n*).

This gives us a smaller private key *d* which might lead to faster decryption.

I generated an RSA key with `openssl`

as in this post

openssl genpkey -out fd.key -algorithm RSA \ -pkeyopt rsa_keygen_bits:2048 -aes-128-cbc

and read it using

openssl pkey -in fd.key -text -noout

The public exponent was 65537 as noted above. I then brought the numbers in the key over to Python.

from sympy import lcm modulus = xf227d5...a9 prime1 = 0xf33514...d9 prime2 = 0xfee496...51 assert(prime1*prime2 == modulus) publicExponent = 65537 privateExponent = 0x03896d...91 phi = (prime1 - 1)*(prime2 - 1) lamb = lcm(prime1 - 1, prime2 - 1) assert(publicExponent*privateExponent % lamb == 1) assert(publicExponent*privateExponent % phi != 1)

This confirms that the private key *d* is the inverse of *e* = 65537 using modulo λ(*pq*) and not modulo φ(*pq*).

The equation for the perimeter of an ellipse is

where *a* is the semimajor axis, *e* is eccentricity, and *E* is a special function. The equation is simple, in the sense that it has few terms, but it is not elementary, because it depends on an advanced function, the complete elliptic integral of the second kind.

However, there is an approximation for the perimeter that is both simple and elementary:

The generalization of an ellipse to three dimensions is an ellipsoid. The surface area of an ellipsoid is neither simple nor elementary. The surface area *S* is given by

where *E* is *incomplete* elliptic integral of the second kind and *F* is the incomplete elliptic integral of the first kind.

However, once again there is an approximation that is simple and elementary. The surface area approximately

where *p* = 1.6075.

Notice the similarities between the approximation for the perimeter of an ellipse and the approximation for the area of an ellipsoid. The former is the perimeter of a unit circle times a kind of mean of the axes. The latter is the area of a unit sphere times a kind of mean of the products of pairs of axes. The former uses a *p*-mean with *p* = 1.5 and the latter uses a *p*-mean with *p* = 1.6075. More on such means here.

The complexity of expressions for the surface area of an ellipsoid apparently increase with dimension. The expression get worse for hyperellipsoids, i.e. *n*-ellipsoids for *n* > 3. You can find such expressions in [1]. More of that in just a minute.

It is natural to conjecture, based on the approximations above, that the surface area of an *n*-ellipsoid is the area of a unit sphere in dimension *n* times the *p*-mean of all products of of *n*-1 semiaxes for some value of *p*.

For example, the surface area of an ellipsoid in 4 dimensions might be approximately

for some value of *p*.

Why this form? Permutations of the axes do not change the surface area, so we’d expect permutations not to effect the approximation either. (More here.) Also, we’d expect from dimensional analysis for the formula to involve products of *n*-1 terms since the result gives *n*-1 dimensional volume.

Surely I’m not the first to suggest this. However, I don’t know what work has been done along these lines.

In [1] the author gives some very complicated but general expressions for the surface area of a hyperellipsoid. The simplest of his expression involves probability:

where the *X*s are independent normal random variables with mean 0 and variance 1/2.

At first it may look like this can be simplified. The sum of normal random variables is a normal random variable. But the squares of normal random variables are not normal, they’re gamma random variables. The sum of gamma random variables is a gamma random variable, but that’s only if the variables have the same scale parameter, and these do not unless all the semiaxes, the *q*s, are the same.

You could use the formula above as a way to approximate *S* via Monte Carlo simulation. You could also use asymptotic results from probability to get approximate formulas valid for large *n*.

[1] Igor Rivin. Surface area and other measures of ellipsoids. Advances in Applied Mathematics 39 (2007) 409–427

The post Hyperellipsoid surface area first appeared on John D. Cook.]]>If you are given the perimeter and one of the axes, you can solve for the other axis, though this involves a nonlinear equation with an elliptic integral. Not an insurmountable obstacle, but not trivial either.

However, the simple approximation for the perimeter is easy to invert. Since

we have

The same equation holds if you reverse the roles of *a* and *b*.

If this solution is not accurate enough, it at least gives you a good starting point for solving the exact equation numerically.

If you’re not given either *a* or *b*, then you might as well assume *a* = *b* and so both equal *p*/2π.

How many scores are possible? It is possible to score any number of points except 1. You can score 2 points for a safety, so you could score any even number of points via safeties. You can score 3 points for a field goal, so you can score any odd number of points, except 1, by a field goal and as many safeties as necessary.

The highest score in an NFL game was 73 points. No team has scored 67, 68, 69, or 71 points. Otherwise, all possible scores up to 73 have been seen in actual games.

Assume a maximum possible score of *M*. Then there are *M* possible winning scores: 0, 2, 3, 4, …, *M*.

There are also *M* possible losing scores, but there are less than *M*² possible total scores since the winning score cannot be less than the losing or tying score.

Out of the *M*² pairs of two numbers coming from a set if *M* numbers, *M* of these pairs are tied, and in half of the rest the first number is higher than the second. So the number of possible scores, with each score bounded by *M*, is

*M* + (*M*² − *M*)/2 = *M*(*M* + 1)/2.

If *M* = 73, there are 2,701 possible scores.

There have been 1,076 unique scores in NFL football. (There were 1,075 until yesterday.) That means there are 1,626 possible scores we haven’t seen yet (assuming the winning team scores no more than 73 points). There are 256 scores that have only been seen once.

The smallest score not yet seen is 4-0.

Here’s a visualization of actual scores. The vertical axis is the winners score, from 0 down to 73, and the horizontal axis is the tie or loser score, starting from 0 on the left.

I mentioned in that post that I moved the code for finding the center to its own function because in the future I might want to see what happens when you look at different choices of center. There are thousands of ways to define the center of a triangle.

This post will look at 4 levels of recursive division, using the barycenter, incenter, and circumcenter.

The barycenter of a set of points is the point that would be the center of mass if each point had the same weight. (The name comes from the Greek *baros* for weight. Think *barium* or *bariatric *surgery.)

This is the method used in the earlier post.

The incenter of a triangle is the center of the largest circle that can be drawn inside the triangle. When we use this definition of center and repeatedly divide our triangle, we get a substantially different image.

The circumcenter of a triangle is the center of the unique circle that passes through each of the three vertices. This results in a very different image because the circumcenter of a triangle may be outside of the triangle.

By recursively dividing our triangle, we get a hexagon!

I remember as a kid calculating the size difference (diameter) of a belt between each hole. Now I think about it every time I wear a belt.

Holes 1 inch apart change the diameter by about one-third of an inch (1/π). [Assuming people have a circular waistline ]

People do not have circular waistlines, unless they are obese, but the circular approximation is fine for reasons we’ll show below.

Good simplifications, such as approximating a human waist by a circle, are robust. It doesn’t matter how well a circle approximates a waistline but rather how well the *conclusion* assuming a circular waistline approximates the *conclusion* for a real waistline.

There’s a joke that physicists say things like “assume a spherical cow.” Obviously cows are not spherical, but depending on the context, assuming a spherical cow may be a very sensible thing to do.

A human waistline may be closer to an ellipse than a circle. It’s not an ellipse either—it varies from person to person—but my point here is to show that using a different model results in a similar conclusion.

For a circle, the perimeter equals π times the diameter. So an increase of 1 inch in the diameter corresponds to an increase of 1/π in the perimeter, as Dave said.

Suppose we increase the perimeter of an **ellipse** by 1 and keep the aspect ratio of the ellipse the same. How much do the major and minor axes change?

The answer will depend on the aspect ratio of the ellipse. I’m going to guess that the aspect ratio is maybe 2 to 1. This corresponds to eccentricity *e* equal to 0.87.

The ratio of the perimeter of an ellipse to its major axis is 2*E*(*p*) where *E* is the complete elliptic integral of the second kind. (See, there’s a good reason Dave used a circle rather than an ellipse!)

For a circle, the eccentricity is 0, and *E*(0) = π/2, so the ratio of perimeter to the major axis (i.e. diameter) is π. For eccentricity 0.87 this ratio is 2.42. So a change in belt size of 1 inch would correspond to a change in major axis of 0.41 and a change in minor axis of 0.21.

Dave’s estimate of 1/3 of an inch the average of these two values. If you average the major and minor axes of an ellipse and call that the “diameter” then Dave’s circular model comes to about the same conclusion as our elliptical model, but avoids having to use elliptic integrals.

The following graph shows the ratio of perimeter to average axis length for an ellipse. On the left end, aspect 1, we have a circle and the ratio is π. As the aspect ratio goes to infinity, the limiting value is 4.

Even for substantial departures from a circle, such as a 2 : 1 or 3 : 1 aspect ratio, the ratio isn’t far from π.

Here are the first three steps:

I set the alpha value of the lines to 0.1 so that lines that get drawn repeatedly would appear darker.

The size of the images grows quickly as the number of subdivisions increases. Here are links to the images after five steps: SVG, PNG

**Update**: Plots using incenter and circumcenter look very different than the plots in this post. See here.

The code below can be used to subdivide any triangle, not just an equilateral triangle, to any desired depth.

I pulled the code to find the center of the triangle out into its own function because there are many ways to define the center of a triangle—more on that here—and I may want to come back and experiment with other centers.

import matplotlib.pyplot as plt import numpy as np from itertools import combinations def center(points): return sum(points)/len(points) def draw_triangle(points): for ps in combinations(points, 2): xs = [ps[0][0], ps[1][0]] ys = [ps[0][1], ps[1][1]] plt.plot(xs, ys, 'b-', alpha=0.1) def mesh(points, depth): if depth > 0: c = center(points) for pair in combinations(points, 2): pts = [pair[0], pair[1], c] draw_triangle(pts) mesh(pts, depth-1) points = [ np.array([0, 1]), np.array([-0.866, -0.5]), np.array([ 0.866, -0.5]) ] mesh(points, 3) plt.axis("off") plt.gca().set_aspect("equal") plt.show()

As with most things in statistics, plugging numbers into a formula is not the hard part. The hard part is deciding what numbers to plug in, which in turn depends on understanding the context of the study. What are you trying to learn? What are the constraints? What do you know a priori?

In my experience, sample size calculation is very different in a **scientific setting** versus a **legal setting**.

In a scientific setting, sample size is often determined by budget. This isn’t done explicitly. There is a negotiation between the statistician and the researcher that starts out with talk of power and error rates, but the assumptions are adjusted until the sample size works out to something the researcher can afford.

In a legal setting, you can’t get away with statistical casuistry as easily because you have to defend your choices. (In theory researchers have to defend themselves too, but that’s a topic for another time.)

Opposing counsel or a judge may ask how you came up with the sample size you did. The difficulty here may be more expository than mathematical, i.e. the difficulty lies in explaining subtle concepts, not is carrying out calculations. A statistically defensible study design is no good unless there is someone there who can defend it.

One reason statistics is challenging to explain to laymen is that there are multiple levels of uncertainty. Suppose you want to determine the defect rate of some manufacturing process. You want to quantify the uncertainty in the quality of the output. But you also want to quantify your uncertainty about your estimate of uncertainty. For example, you may estimate the defect rate at 5%, but how sure are you that the defect rate is indeed 5%? How likely is it that the defect rate could be 10% or greater?

When there are multiple contexts of uncertainty, these contexts get confused. For example, variations on the following dialog come up repeatedly.

“Are you saying the quality rate is 95%?”

“No, I’m saying that I’m 95% confident of my estimate of the quality rate.”

Probability is subtle and there’s no getting around it.

- Think of the bits in a file as the coefficients in a polynomial
*P*(*x*). - Divide
*P*(*x*) by a fixed polynomial*Q*(*x*) mod 2 and keep the remainder. - Report the remainder as a sequence of bits.

In practice there’s a little more to the algorithm than this, such as appending the length of the file, but the above pattern is at the heart of the algorithm.

There’s a common misconception that the polynomial *Q*(*x*) is irreducible, i.e. cannot be factored. This may or may not be the case.

Perhaps the most common choice of *Q* is

*Q*(*x*) = *x*^{32} + *x*^{26} + *x*^{23} + *x*^{22} + *x*^{16} + *x*^{12} + *x*^{11} + *x*^{10} + *x*^{8} + *x*^{7} + *x*^{5} + *x*^{4} + *x*^{3} + *x*^{2} + *x* + 1

This polynomial is used in the `cksum`

utility and is part of numerous standards. It’s know as CRC-32 polynomial, though there are other polynomials occasionally used in 32-bit implementations of the CRC algorithm.

and it is far from irreducible as the following Mathematica code shows. The command

Factor[x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^3 + x^2 + x + 1, Modulus -> 2]

shows that *Q* can be factored as

(1 + *x*)^{5} (1 + *x* + *x*^{3} + *x*^{4} + *x*^{6}) (1 + *x* + *x*^{2} + *x*^{5} + *x*^{6})

(1 + *x* + *x*^{4} + *x*^{6} + *x*^{7}) (1 + *x* + *x*^{4} + *x*^{5} + *x*^{6} + *x*^{7} + *x*^{8})

(Mathematica displays polynomials in increasing order of terms.)

Note that the factorization is valid when done over the field with 2 elements, *GF*(2). Whether a polynomial can be factored, and what the factors are, depends on what field you do your arithmetic in. The polynomial *Q*(*x*) above *is* irreducible as a polynomial with real coefficients. It can be factored working mod 3, for example, but it factors differently mod 3 than it factors mod 2. Here’s the factorization mod 3:

(1 + 2 *x*^{2} + 2 *x*^{3} + *x*^{4} + *x*^{5}) (2 + *x* + 2 *x*^{2} + *x*^{3} + 2 *x*^{4} + *x*^{6} + *x*^{7})

(2 + *x* + *x*^{3} + 2 *x*^{7} + *x*^{8} + *x*^{9} + *x*^{10} + 2 *x*^{12} + *x*^{13} + *x*^{15} + 2 *x*^{16} + *x*^{17} + *x*^{18} + *x*^{19} + *x*^{20})

The polynomial

*Q(x) = x ^{64}* +

is known as CRC-64, and is part of several standards, including ISO 3309. This polynomial *is* irreducible mod 2 as the following Mathematica code confirms.

IrreduciblePolynomialQ[x^64 + x^4 + x^3 + x + 1, Modulus -> 2]

The CRC algorithm uses this polynomial mod 2, but out of curiosity I checked whether it is irreducible in other contexts. The following code tests whether the polynomial is irreducible modulo the first 100 primes.

Table[IrreduciblePolynomialQ[x^64 + x^4 + x^3 + x + 1, Modulus -> Prime[n]], {n, 1, 100}]

It is irreducible mod *p* for *p* = 2, 233, or 383, but not for any other primes up to 541. It’s also irreducible over the real numbers.

Since *Q* is irreducible mod 2, the check sum essentially views its input *P*(*x*) as a member of the finite field *GF*(2^{64}).

and *J* has a particular form. The eigenvalues of *A* are along the diagonal of *J*, and the elements **above** the diagonal are 0s or 1s. There’s a particular pattern to the 1s, giving the matrix *J* a block structure, but that’s not the focus of this post.

Some books say a Jordan matrix *J* has the eigenvalues of *A* along the diagonal and 0s and 1s **below** the diagonal.

So we have two definitions. Both agree that the non-zero elements of *J* are confined to the main diagonal and an adjacent diagonal, but they disagree on whether the secondary diagonal is above or below the main diagonal. It’s my impression that placing the 1s below the main diagonal is an older convention. See, for example, [1]. Now I believe it’s more common to put the 1s above the main diagonal.

How are these two conventions related and how might you move back and forth between them?

It’s often harmless to think of linear transformations and matrices as being interchangeable, but for a moment we need to distinguish them. Let *T* be a linear transformation and let *A* be the matrix that represents *T* with respect to the basis

Now suppose we represent *T* by a new basis consisting of the same vectors but in the opposite order.

If we reverse the rows and columns of *A* then we have the matrix for the representation of *T* with respect to the new basis.

So if *J* is a matrix with the eigenvalues of *A* along the diagonal and 0s and 1s **above** the diagonal, and we reverse the order of our basis, then we get a new matrix *J*′ with the eigenvalues of *A* along the diagonal (though in the opposite order) and 0s and 1s **below** the diagonal. So *J* and *J*′ represent the same linear transformation with respect to different bases.

Let *R* be the matrix formed by starting with the identity matrix *I *and reversing all the rows. So while *I* has 1s along the NW-SE diagonal, *R* has 1s along the SW-NE diagonal.

Reversing the **rows** of *A* is the same as multiplying *A* by *R* on the **right**.

Reversing the **columns** of *A* is the same as multiplying *A* by *R* on the **left**.

Here’s a 3 by 3 example:

Note that the matrix *R* is its own inverse. So if we have

then we can multiply both sides on the left and right by *R.*

If *J* has 1s above the main diagonal, then *RJR* has 1s below the main diagonal. And if *J* has 1’s below the main diagonal, *RJR* has 1s above the main diagonal.

Since *R* is its own inverse, we have

This says that if the similarity transform by *P* puts *A* into Jordan form with 1’s above (below) the diagonal, then the similarity transform by *PR* puts *A* into Jordan form with 1’s below (above) the diagonal.

[1] Hirsch and Smale. Differential Equations, Dynamical Systems, and Linear Algebra. 1974.

The post Jordan normal form: 1’s above or below diagonal? first appeared on John D. Cook.]]>In more formal language, what can we say about the eigenvectors and eigenvalues of the DFT matrix?

I mentioned in the previous post that Mathematica’s default convention for defining the DFT has mathematical advantages. One of these is that it makes the DFT an **isometry**, that is, taking the DFT of a vector does not change its norm. We will use Mathematica’s convention here because that will simplify the discussion. Under this convention, the DFT matrix of size *N* is the square matrix whose (*j*, *k*) entry is

ω^{jk} / √*N*

where ω = exp(-2π *i/N*) and the indices *j* and *k* run from 0 to *N* − 1.

Using the definition above, if you take the discrete Fourier transform of a vector four times, you end up back where you started. With other conventions, taking the DFT four times takes you to a vector that is *proportional* to the original vector, but not the same.

It’s easy to see what the eigenvalues of the DFT are. If transforming a vector multiplies it by λ, then λ^{4} = 1. So λ = ±1 or ±*i*. This answers the second question at the top of the post: if the DFT of a vector is proportional to the original vector, the proportionality constant must be a fourth root of 1.

The eigenvectors of the DFT, however, are not nearly so simple.

Suppose *N* = 4*k* for some *k* > 1 (which it nearly always is in practice). I would expect by symmetry that the eigenspaces of 1, −1, *i* and −*i* would each have dimension *k*, but that’s not quite right.

In [1] the authors proved that the eigenspaces associated with 1, −1, *i* and −*i* have dimension *k*+1, *k*, *k*−1, and *k* respectively.

This seems strange to me in two ways. First, I’d expect all the spaces to have the same dimension. Second, if the spaces did not have the same dimension, I’d expect 1 and −1 to differ, not *i* and −*i*. Usually when you see *i* and −*i* together like this, they’re symmetric. But the span of the eigenvectors associated with *i* has dimension one less than the dimension of the span of the eigenvectors associated with −*i*. I don’t see why this should be. I’ve downloaded [1] but haven’t read it yet.

[1] J. H. McClellan; T. W. Parks (1972). “Eigenvalues and eigenvectors of the discrete Fourier transformation”. IEEE Transactions on Audio and Electroacoustics. 20 (1): 66–74.

The post Eigenvectors of the DFT matrix first appeared on John D. Cook.]]>This post will look at two DFT conventions, one used in Python’s NumPy library, and one used in Mathematica. There are more conventions in use, but this post will just look at these two.

In some sense the differences between conventions are trivial, but trivial doesn’t mean unimportant [1]. If you don’t know that there are multiple conventions, you could be quite puzzled when the output of a FFT doesn’t match your expectations.

NumPy’s `fft`

and related functions define the discrete Fourier transform of a sequence *a*_{0}, *a*_{1}, …, *a*_{N−1} to be the sequence *A*_{0}, *A*_{1}, …, *A*_{N−1} given by

Mathematica’s `Fourier`

function defines the discrete Fourier transform of a sequence *u*_{1}, *u*_{2}, …, *u*_{N} to be the sequence *v*_{1}, *v*_{2}, …, *v*_{N} given by

This is the default definition in Mathematica, but not the only possibility. More on that below in the discussion of compatibility.

Python arrays are indexed from 0 while Mathematica arrays are indexed starting from 1. This is why the inputs and outputs are numbered as they are.

Subtracting 1 from the *m* and *k* indices makes the two definitions visually less similar, but the terms in the two summations are the same. The only difference between the two implementations is the scaling factor in front of the sum.

Why does Mathematica divide the sum by √*N* while NumPy does not? As is often the case when there are differing conventions for defining the same thing, the differences are a result of which theorems you want to simplify. Mathematica complicates the definition of the DFT slightly, but in exchange makes the DFT and its inverse more symmetric.

The choice of scaling factor is consistent with the user bases of the two languages. Python skews more toward engineering and applied math, while Mathematica skews more toward pure math. In light of this, the choices made by Python and Mathematica seem inevitable.

Like Mathematica’s continuous Fourier transform function `FourierTransform`

, its discrete Fourier transform function `Fourier`

takes an optional `FourierParameters`

argument for compatibility with other conventions. Setting the *a* parameter to 1 eliminates the √*N* term and produces a result consistent with NumPy.

There are more variations in DFT definitions. For example, some definitions of the DFT do not have a negative sign inside the exponential. Mathematica can accomodate this by setting *b* to −1 in thel `FourierParameters`

argument. There are other possibilities too. In some implementations, for example, the 0 frequency DC term is in the middle rather than at the beginning.

[1] The FFT is an algorithm for computing the DFT, but the transform itself is often called the FFT.

[2] In classical education, the trivium consisted of grammar, logic, and rhetoric. The original meaning of “trivial” is closer to “foundational” than to “insignificant.”

The post DFT conventions: NumPy vs Mathematica first appeared on John D. Cook.]]>William L. Briggs and Van Emden Henson wrote such a book, The DFT: An Owner’s Manual for the Discrete Fourier Transform. The cover features the following image.

At first glance, this image might look like a complete graph, one with an edge from every node along the circle to every other node. But that’s not it. For one thing, there’s only one line that goes through the center of the circle. And when you look closer it’s not as symmetric as it may have seemed at first.

Here’s the same image plotted in blue, except this time I reduced the alpha channel of the lines. Making the lines less opaque makes it possible to see that some lines are drawn more often than others, the darker lines being the ones that have been traced more than once.

What does this drawing represent? The explanation is given in the last exercise at the end of the first chapter.

The cover and frontmatter of this book display several polygonal, mandala-shaped figures which were generated using the DFT.

The exercise goes into some detail and invites the reader to reproduce versions of the cover figure with *N* = 4 or *N* = 8 nodes around the circle. For *n* from 1 to *N*, take the DFT (discrete Fourier transform) of the *n*th standard basis vector *e*_{n} and draw lines connecting the components of the DFT. These components are

*F*_{k} = exp(-2π*ink*/*N*) / *N*

for *k* = 1 to *N*.

The cover also has smaller figures which correspond to the same sort of image for other values of *N*. For example, here is the figure for *N* = 8.

The miracle of the sonnet, you see, is that it is fourteen lines long and written almost always in iambic pentameter. … suffice it to say that most lines are going to have ten syllables and the others will be very close to ten. And ten syllables of English are about as long as fourteen lines are high: square.

For example, suppose you’re wondering whether dogs ever have two tails. You observe thousands of dogs and never see two tails. But then you see a dog with two tails? Now what can you say about the probability of dogs having two tails? It’s certainly not zero.

We’ll first look at the case of 0 successes out of *N* trials then look at the case of 1 success out of *N* trials.

If you’re observing a binary event and you’ve seen no successes out of *N* trials your point estimate of the probability of your event is 0. You can’t have any confidence in the *relative* accuracy of your estimate: if the true probability is positive, no matter how small, then the relative error in your estimate is infinite.

But you can have a great deal of confidence in its *absolute* accuracy. When you’re looking for a binary event and you have not seen any instances in *N* trials for large *N*, then a 95% confidence interval for the event’s probability is approximately [0, 3/*N*]. This is the statistical rule of three. This is a robust estimate, one you could derive from either a frequentist or Bayesian perspective.

Note that the confidence interval [0, 3/*N*] is exceptionally narrow. When observing a moderate mix of successes and failures the width of the confidence interval is on the order of 1/√*N*, not 1/*N*.

After seeing your first success, your point estimate jumps from 0 to 1/*N*, and infinite relative increase. What happens to your confidence interval?

If we use Jeffreys’ beta(1/2, 1/2) prior, then the posterior distribution after seeing 1 success and *N* − 1 failures is a beta(3/2, *N* + 1/2). Now an approximate 95% confidence interval is

[0.1/*N*, 4.7/*N*]

So compared to the case of seeing zero successes, seeing one success makes our confidence interval about 50% wider and shifts it to the left by 0.1/*N*.

So if you’ve seen 100,000 dogs and only 1 had two tails, you could estimate that a 95% confidence interval for the probability of a dog having two tails is

[10^{−6}, 4.7 × 10^{−5}].

If we run the exact numbers we get

[ 1.07 × 10^{−6}, 4.67 ×10^{−5}].

“Logarithms are usually taken to integer bases, like 2 or 10.”

“What about *e*?”

“OK, that’s an example of an irrational base, but it’s the only one.”

“Decibels are logarithms to base 10^{1/10}.”

“Really?!”

“Yeah, you can read about this here.”

“That’s weird. But logarithms are always take to bases bigger than 1.”

“Au contraire. Bases can be less than one, not just in theory but in practice.”

This post expands on the dialog above, especially the last line. We will show that stellar magnitude is a logarithm to a base smaller than 1.

Decibles are defined as 10 times the log base 10. But as explained here, decibels are not just a *multiple* of a logarithm, they *are* logarithms, logarithms base 10^{1/10}.

Raising a musical pitch a half-step (semitone) multiplies its frequency by 2^{1/12}, and so raising it 12 half-steps doubles it, raising it an octave. Semitones are logarithms base 2^{1/12}.

So here are two examples of irrational bases for logarithms: decibels are logs base 1.2589 and semitones are logs base 1.0595.

Stellar magnitude is strange for a couple reasons. First of all, the scale runs backward to what you might expect, with brighter objects having smaller magnitude. Perhaps stellar magnitude should be called stellar *dimness*. But the magnitude scale made more sense when it was limited to visible stars. The most visible starts were of the first category, the next most in the second category, and the least visible in the sixth category.

Second, stellar magnitude is defined so that a change in brightness of 100 corresponds to change in magnitude of 5. This seems arbitrary until you realize the intention was to fit a scale from 1 to 100 to the magnitude of visible stars. The scale seems strange now that we apply it to a wider variety of objects than naked eye astronomy.

If a star *X* is 100 times as bright as a star *Y*, then the magnitude of *X* is −5 times the magnitude of *Y*. The log base 10 of the brightness of *X* is twice the log base 10 of the magnitude of *Y*, so magnitude is −5/2 times log base 10 of brightness.

So stellar magnitude is a multiple of log base 10, like decibels.

Now

log_{a}(*x*) = log_{b}(*x*) / log_{b}(*a*)

for any bases *a* and *b*. If we let *b* = 10, this says that a multiple *k* of base 10 is the same as the log base *a* where log_{10}(*a*) = 1/*k*, so

*a* = 10^{1/k}.

For decibels, *k* = 10, so *a* = 10^{1/10} = 1.2589. For stellar magnitude, *k* = −5/2, so *a* = 10^{−2/5} = 0.3981.

That is, stellar magnitude is logarithm base 0.3981.

To be more precise, we need a reference point. The star Vega (image above) has magnitude 0, so the magnitude of a star is the logarithm base 0.003162 of the ratio of the star’s brightness to the brightness of Vega.

Image of Vega by Stephen Rahn via Wikipedia.

The post Stellar magnitude first appeared on John D. Cook.]]>It’s still mostly the case that states do not have consecutive area codes. But there is one exception: Colorado contains area codes 719 and 720. California, Texas, and New York all have a pair of area codes that differ by 2. For example, Tyler has area code 430 and Midland has area code 432. The median minimum difference between area codes within a state is 19.

I don’t know why this is. Only about 400 of the possible 1000 area codes are assigned to a geographic region. (Some are reserved for non-geographic uses, like toll-free numbers.) There’s no need to assign consecutive area codes within a state when adding a new area code.

I wrote a little script to look up the location corresponding to an area code. This is something I have to do fairly often in order to tell what time zone a client is *probably* in.

I debated whether to write such a script because it’s trivial to search for such information. Typing “area code 510” into a search bar is no harder than typing `areacode 510`

on the command line, but the latter is less disruptive to my workflow. I don’t have to click on any sketchy web sites filled with adds, I’m not tempted to go anywhere else while my browser is open, and the script works when my internet connection is down.

The script is trivial:

#!/usr/bin/env python3 import sys codes = { "202" : "District of Columbia", "203" : "Bridgeport, CT", "204" : "Manitoba", .... "985" : "Houma, LA", "986" : "Idaho", "989" : "Saginaw, MI", } area = sys.argv[1] print(codes[area]) if area in codes else print("Unassigned")

If you’d like, you can grab the full script here.

The post Area codes first appeared on John D. Cook.]]>