Exact sequences

A couple days ago, near the end of a post, I mentioned exact sequences. This term does not mean what you might reasonably think it means. It doesn’t mean exact in the sense of not being approximate.

It means that the stuff that comes out of one step is exactly the stuff that gets set to 0 in the next step. That is, the image of each function is exactly the kernel of the next function in the sequence [1].

Gradient and curl

For example, let f be gradient and let g be curl. Then the following sequence is exact

A \to^f B \to^g C

if A is the set of smooth functions from ℝ³ to ℝ, and B and C are set of smooth functions ℝ³ to ℝ³.

This says that the image of f, vector fields that are the gradient of something, is the kernel of g, vector fields with zero curl. In other lingo, gradient vector fields are irrotational, and all irrotational vector fields are the gradient of some potential function.

To show that

image f = kernel g

we have to show two things:

image f ⊂ kernel g


image f ⊃ kernel g

As is often the case, the former is easier than the latter. It’s a common homework problem [2] to show that the curl of a divergence is zero, i.e.

∇×(∇φ) = 0

It’s not so easy to show that for every vector field F with ∇×F = 0 there exists a potential φ such that F = ∇φ. It’s easier to show that the image of the first function part of the kernel of the next than to show that the kernel of the first is exactly the kernel of the next, because the latter requires proving the existence of something.

Short exact sequences

A short exact sequence is an exact sequence of the following form. (It’s called short because exact sequences are often longer.)

0 \to A \to^f B \to^g C \to 0

There’s no label on the arrow from 0 to A because there’s only one function from 0 to anywhere. There’s also no label on the arrow from C to 0 because there’s only one function that goes from anywhere to 0 [3].

The zero on the left implies that the function f is one-to-one (injective). It’s image is only one element, so the kernel of f is only one element.

Similarly the zero on the right end implies that the function g is onto (surjective). The kernel of the last arrow is everything in C, so the image of g has to be everything in C.

For example, let B be a group and suppose φ is a homomorphism from B to another group. Let A be the kernel of φ and let f be the inclusion map from the kernel into B. Let g be quotient map taking C = B/A. Then the so-called “first group homomorphism theorem” says that the sequence above is exact.

Div, grad, curl and all that

The title of this section is an homage to an excellent little book by H. M. Schey.

We said above that the curl of a gradient is zero, and that all vector fields with zero curl are gradients. It’s also true that the divergence of a curl is zero, and that a vector field is has zero divergence if it is the curl of something. That is, for a vector field F,

∇ · (∇ × F) = 0

and if ∇ · G = 0 for a vector field G then there exists a vector field F such that

∇×F = G.

This means we can extend our example above to

0 \to A \to^{grad} B \to^{curl} C \to^{div} D \to 0

if we define A carefully.

The discussion in this section only justifies the labeled arrows. We need to justify the two unlabeled arrows on the ends.

The zero on the left requires the gradient be one-to-one. But in general, the gradient is not one-to-one: functions that differ by a constant have the same gradient. But if we define A to be the set of integrable functions on ℝ³ then the gradient is one-to-one. Requiring the integral of a function over ℝ³ to exist means the functions must eventually approach zero in every direction.

The zero on the right requires every smooth function on ℝ³ to be the divergence of something. That’s easy. Given a function φ from ℝ³ to ℝ, let F be the vector field whose first component at (x, y, z) is the integral of φ from the origin to (x, 0, 0) and whose second and third components are 0. Then the divergence of F equals φ.

Long(er) exact sequences

The exact sequence in the previous section is longer than a short exact sequence, and longer exact sequences come up in practice. An exact sequence could be infinite in one or both directions. For example, the Mayer–Vietoris sequence, a foundational tool in homology, is infinite on its left and and terminates in 0 on its right end.

Related posts

[1] The term “zero” is overloaded here. It could be the integer 0, or the origin of a vector space, or the kernel of a group, etc.

Everything here generalizes to objects and morphisms in a category, in which case the objects are not necessarily sets, and the morphisms are not necessarily functions. Then we don’t talk about images and kernels but about morphisms being monomorphisms and epimorphisms.

[2] There is a lot of foreshadowing in textbooks. It’s common for an author to include exercises and examples that are important later. This is often intentional, sort of an Easter egg. But sometimes the author is unconsciously pulling from experience.

[3] This observation is turned into a definition in category theory: a zero object is defined as one that is initial and final. That is, there is only one morphism from it to any other object, and only one morphism from any object to it.

Data swamps


I recently heard the term data swamp, a humorous take on data lakes. I thought about data swamps yesterday when I hiked past the literal swamp in the photo above.

Swamps are a better metaphor than lakes for heterogeneous data collections because a lake is a homogeneous body of water. What makes a swamp a swamp is its heterogeneity.

The term “data lake” may bring up images of clear water, maybe an alpine lake like Lake Tahoe straddling California and Nevada. “Data swamp” makes me think of Louisiana’s Atchafalaya Basin, which is probably a more appropriate image for most data lakes.

Related posts

Numerical footnote

Yesterday’s post said that that you could construct a chain of linear relationships between the hypergeometric function

F(a, b; c; z)


F(a+i, b+j; c+k; z)

for integers i, j, and k. Toward the end of the post I said that this could be used to speed up programs by computing function values from previous values rather than from scratch. This is true, but you need to check whether there are numerical issues.

You have to be careful when using recurrence relations numerically. It’s common for a recurrence to be numerically stable in one direction but unstable in the opposite direction. I wrote about this here and gave examples.

The stability problems for recurrence relations are the discrete analog of the more familiar problem of sensitive dependence on initial conditions for differential equations described here.


Four, five, and nine lemmas

This post is similar in spirit to the previous post: reducing mathematical theorems to moves in a board game by looking at things from a ridiculously high level.

The theorems we’ll be looking at are known as the four lemma, the five lemma, and the nine lemma. The nine lemma is also known as the 3×3 lemma.

All the lemmas start with a commutative diagram. A diagram is commutative if any two ways of getting from one place to another are equal. If the arrows represent functions, then the diagram implies certain compositions of functions are equal.

Given hypotheses about most of the arrows in a diagram, the theorems conclude something about the rest of the arrows. This general category of theorems is known as diagram chasing. The proofs are tedious but shallow. If a theorem in topology, for example, were formulated as a diagram chase, someone with no intuition for topology could prove the theorem by applying the rules of a simple game.

The meaning of the dots and arrows isn’t important for this post, but you could think of the dots in the diagram as groups and the arrows as homomorphisms between groups. More generally you could think of the dots as objects in an Abelian category and the arrows as morphisms between objects.

The four lemma

The four lemma starts with with a commutative diagram of the following shape.

It’s called the four lemma because there are four dots across each row.

There are two versions of the four lemma. Both make assumptions about the rows and three out of the four columns and conclude something about the fourth column.

In the first part of the lemma, we have hypotheses about the two rows, and hypotheses about the first, third, and fourth vertical arrows. The conclusion is something about the second vertical arrow.

The second part of the lemma is very similar, but has hypotheses about the first, second, and fourth vertical arrows and concludes something about the third arrow.

Five lemma

The five lemma starts with a commutative diagram with five objects across the top and bottom rows.

Given assumptions about every arrow except the vertical arrow in the middle, the theorem concludes something about the middle arrow.

Nine lemma

The nine lemma, a.k.a. the 3×3 lemma, starts with a commutative diagram of the following shape.

It’s called the nine lemma because the attention is focused on the nine dots, the 3×3 grid of dots in the middle. All the objects around the perimeter, those outside the 3×3 grid, are 0.

There are three parts to the nine lemma. Each has hypotheses about all the columns and two of the rows, and concludes something about the row that was left out. So, for example, one part of the theorem has hypotheses about the three columns and the first two rows and concludes something about the last row.

Missing details

Here I zoom in from 100,000 feet to 10,000 feet, giving some of the missing details but not all. For full details, you could look up the theorems on Wikipedia or on nLab.

As promised at the beginning, this was a ridiculously high-level view of these theorems. The purpose was to point out the core of each theorem and making it possible to compare the three theorems at the highest level.

When I said a theorem makes a hypothesis about a row or a column, the hypothesis is that the row forms an exact sequence. The conclusions in the nine lemma are that a row is an exact sequence.

When I alluded to hypotheses about vertical arrows, these hypotheses are essentially saying that a function is one-to-one (injective), onto (surjective), or one-to-one and onto (bijective). More precisely, they are assumptions that a morphism is mono, epi, or iso. More on this terminology here. The conclusion of the four lemma is that an arrow is either epi or mono. The conclusion of the five lemma is that an arrow is iso.

Related posts

3D Go with identities

Let’s play a game that’s something like Go in three dimensions. Our game is played on the lattice of points in 3D that have integer coordinates. Someone places stones on two lattice points, and your job is to create a path connecting the two stones by adding stones to neighboring locations.

Game cube

We don’t really have a game “board” since we’re working in three dimensions. We have a game scaffold or game cube. And we can’t just put our stones down. We have to imagine the stones has having hooks that we can use to hang them where pieces of scaffolding come together.

Basic moves

Now for the rules. Our game is based on identities for a function with depending on three parameters: a, b, and c. A location in our game cube is a point with integer coordinates (i, j, k) corresponding to function parameters

(a+i, b+j, c+k).

If there is a stone at (i, j, k) then we are allowed to place stones at neighboring locations if there is a function identity corresponding to these points. The function identities are the “contiguous relations” for hypergeometric functions. You don’t need to know anything about hypergeometric functions to play this game. We’re going to leave out a lot of detail.

Our rule book for identities will be Handbook of Mathematical Function by Abramowitz and Stegun, widely known as A&S. You can find a copy online here.

For example, equation 15.2.10 from A&S relates

F(a, b; c; z)


F(a-1, b; c; z)


F(a+1, b; c; z).

So if we have a stone at (i, j, k) we can add stones at (i-1, j, k) and (i+1, j, k). Or we could add stones at (i+1, j, k) and (i+2, j, k). You can turn one stone into a row of three stones along the first axis overlapping the original stone.

Equation 15.2.11 lets you do something analogous for incrementing the 2nd coordinate, and Equation 15.2.12 lets you increment the 3rd coordinate. So you could replace a stone with a row of three stones along the second or third axes as well.

This shows that it is always possible to create a path between two lattice points. If you can hop between stones a distance 1 apart, you can create a path that will let you hop between any two points.

Advanced moves

There are other possible steps. For example, equation 15.2.25 has the form

F(a, b; c; z) + … F(a, b-1; c; z)  + … F(a, b; c+1; z) = 0

I’ve left out the coefficients above, denoting them by ellipses. (More on that below.) This identity says you don’t just have to place stones parallel to an axis. For example, you can add a stone one step south and another one step vertically in the same move.

The previous section showed that it’s always possible to create a path between two lattice points. This section implies the finding the shortest path between two points might be a little more interesting.


There are practical applications to the game described in this post. One reason hypergeometric functions are so useful is that they satisfy a huge number of identities. There are libraries of hypergeometric identities, and there are still new identities to discover and new patterns to discover for organizing the identities.

The moves in the game above can be used to speed up software. Hypergeometric functions are expensive to evaluate, and so it saves time to compute a new function in terms of previously computed functions rather than having to compute it from scratch.

I used this trick to speed up clinical trial simulations. Consecutive steps in the simulation required evaluating contiguous hypergeometric functions, so I could compute one or two expensive functions at the beginning of the simulation and bootstrap them to get the rest of the values I needed.


I’d like to close by saying something about the coefficients in these identities. The coefficients are complicated and don’t have obvious patterns, and so they distract from the underlying structure discussed here. You could stare at a list of identities for a while before realizing that apart from the coefficients the underlying relationships are simple.

All the coefficients in the identities cited above are polynomials in the variables a, b, c, and z. All have degree at most 2, and so they can be computed quickly.

The identities discussed here are often called linear relations. The relations are linear in the hypergeometric functions, but the coefficients are not linear.

Related posts

Area of spherical triangle

A few days ago I wrote about how you could calculate the area of a spherical triangle by measuring the angles at its vertices. As was pointed out in the comments, there’s a much more direct way to find the area.

Folke Eriksson gives the following formula for area in [1]. If a, b, and c are three unit vectors, the spherical triangle formed by their vertices has area E where

\tan\left(\frac{E}{2}\right) = \frac{|\mathbf{a} \cdot (\mathbf{b} \times \mathbf{c})|}{1 + \mathbf{a}\cdot \mathbf{b} + \mathbf{b}\cdot \mathbf{c} + \mathbf{a}\cdot \mathbf{c}}

Let’s try the formula out with an example and compare the area to that of a flat triangle. We’ll pick three cities as our three points, assuming the earth is a unit sphere.

Let a = Austin, Texas, b = Boston, Massachusetts, and c = Calgary, Alberta. The spherical coordinates of each point will be (1, θ, φ) where θ is longitude and φ is π/2 minus latitude.

    from numpy import *
    def latlong_to_cartesian(latlong):
        phi   = deg2rad( 90 - latlong[0] )
        theta = deg2rad( latlong[1] )
        x = sin(theta) * cos(phi)
        y = sin(theta) * sin(phi)
        z = cos(theta)
        return array([x,y,z])
    def spherical_area(a, b, c):
        t = abs( inner(a, cross(b, c) ) )
        t /= 1 + inner(a,b) + inner(b,c) + inner(a,c)
        return 2*arctan(t)
    def flat_area(a, b, c):
        x = cross(a - b, c - b)
        return 0.5*linalg.norm(x)
    # Latitude and longitude
    austin  = (30.2672,  97.7431)
    boston  = (42.3584,  71.0598)
    calgary = (51.0501, 114.0853)
    a = latlong_to_cartesian(austin)
    b = latlong_to_cartesian(boston)
    c = latlong_to_cartesian(calgary)
    round = spherical_area(a, b, c)
    flat  = flat_area(a, b, c)
    print(round, flat, round/flat)

This shows that our spherical triangle has area 0.1154 and the corresponding Euclidean triangle slicing through the earth would have area 0.1093, the former being 5.57% bigger.

Let’s repeat our exercise with a smaller triangle. We’ll replace Boston and Calgary with Texas cities Boerne and Carrollton.

    boerne  = (29.7947, 98.7320)
    carrollton = (32.9746, 96.8899)

Now we get area 0.000345 in both cases. The triangle is so small relative to the size of the earth that the effect of the earth’s curvature is negligible.

In fact the ratio of the two computed areas is 0.9999, i.e. the spherical triangle has slightly less area. The triangle is so flat that numerical error has a bigger effect than the curvature of the earth.

How could we change our code to get more accurate results for small triangles? That would be an interesting problem. Maybe I’ll look into that in another post.

[1] “On the measure of solid angles”, F. Eriksson, Math. Mag. 63 (1990) 184–187.

Triangles on a pseudosphere

The previous post was about triangles on a sphere. This post will be about triangles on a pseudosphere.

A pseudosphere looks something like the bell of a trumpet or a trombone.

Here’s a plot of a pseudosphere.

This was created in Mathematica with the code

            Cos[p] Sech[t],
           -Sin[p] Sech[t],
            t - Tanh[t] 
        {t,  0, 4},
        {p, 0, 2 Pi}

The pseudosphere has constant negative curvature just as the sphere has constant positive curvature.

By curvature I mean specifically Gaussian curvature, the product of the sectional curvature in orthogonal directions at each point. A sphere curves the same way in all directions. A pseudosphere curves opposite ways in different directions. Coordinate lines through a point are convex in one direction and concave in another. Their sectional curvatures have opposite signs and so their product is negative.

A spherical triangle is formed by great circles connecting the vertices of the triangles. Great circles are the geodesics on a sphere.

A pseudospherical triangle is a triangle formed by geodesics on the pseudosphere. These are the “straight” lines in the hyperbolic geometry of the surface.

On a sphere, the interior angles of a triangle add up to more than π, and in fact the amount by which the sum of the angles exceeds π is proportional to the area of the triangle.

On a pseudosphere, the interior angles of a triangle add up to less than π. And just as in the case of the sphere, the difference between the sum of the angles and π, now negative, is proportional to the area. This result was discovered by Johann Heinrich Lambert (1728–1777).

On a sphere or a pseudosphere, the interior angles of a small triangle add up to nearly π. As triangles get larger, the sum increases on a sphere but decreases on a pseudosphere.

The general principle behind both of these cases is the local Gauss-Bonnet theorem. For any surface, let T be triangle whose sides are geodesics. Then the integral of the Gaussian curvature over T is equal to the sum of the interior angles of T minus π.

When a surface has constant curvature, the integral of curvature over a region is proportional to the area of the region. So on a sphere or pseudosphere, the area of a geodesic triangle is proportional to the sum of the interior angles minus π.

The local Gauss-Bonnet theorem holds when the sides of a triangle are not geodesics, but in that case the theorem has an extra term, the integral of the geodesic curvature along the sides. Since the geodesic curvature of a geodesic is zero, this term drops out for geodesic triangles.

Related posts

A tale of three cities

Big blue marble

Pick three cities and form a spherical triangle by connecting each pair of cities with the shortest arc between them. How might you find the area of this triangle?

For this post, I’ll assume the earth is perfectly spherical. (Taking into account that the earth is slightly oblate makes the problem much more complicated. Maybe a post for another time.)

Spherical excess

If you draw a triangle on the plane, the interior angles add up to 180°. But if you draw a triangle on a sphere, the interior angles add up to more than 180°. The amount by which the sum of the interior angles exceeds 180° is called the spherical excess. It turns out that the area of a spherical triangle is proportional to its spherical excess.

For example, if a spherical triangle is small relative to the radius of the sphere, it’s nearly a Euclidean triangle, and the spherical excess is negligible. But consider a much bigger spherical triangle, one with a vertex at the north pole and vertices at two points 90° apart on the equator. This triangle has three right angles, so the sum of the interior angles is 270° and the spherical excess is 90°.

Relating spherical excess and area

If we measure spherical excess E in radians, then the area of a spherical triangle is

A = ER²

where R is the radius of the sphere. This remarkable theorem was discovered by Thomas Harriot in 1603.

Let’s go back to the example above of a triangle running from the north pole to two points 90° apart on the equator. This triangle takes up 1/8 of the earth’s surface, and area of the earth is 4πR², and so the triangle has area πR²/2. In this case the spherical excess E was π/2, and so we could have come to the same result by multiplying E by R².

Computing area

We can find the area of a spherical triangle by measuring the angle of arcs between pairs of points. We’ll compute area assuming the sphere has radius 1. (Otherwise temporarily assume radius 1 and then multiply by R² when we’re done.)

The previous post explains how to find a parameterization for the great circle connecting points on a sphere. We can take derivatives to find tangent vectors where the great circles intersect, and compute the angle between these tangents to find the interior angles of our triangles.

The previous post showed that if the central angle between two vectors A and B is θ then

cos(t) A + sin(t) (cot θ A – csc θ B)

parameterizes a great circle including A and B. This curve passes through A at time t = 0 with velocity

cot θ A – csc θ B

and so this gives us a tangent vector at A in the direction of B. Repeat the exercise for a great circle between A and C. Then the cosine of the angle of intersection is the two normalized tangent vectors. We can thus obtain all the interior angles, and from there we can compute the spherical excess and thus the area of the spherical triangle.

Related posts

Great circle through two points on a sphere

Given two points on a unit sphere, there is a unique great circle passing through the two points. This post will show two ways to find a parameterization for this circle.

Both approaches have their advantages. The first derivation is shorter and in some sense simpler. The second derivation is a little more transparent and generalizes to higher dimensions.

Let the two points on our sphere be v and w. If these two vectors are orthogonal, then the great circle connecting them is given by

cos(t) v + sin(t) w.

The problem is that w might not be orthogonal to v. So the task is to find a new vector u in the plane spanned by v and w that is perpendicular to v.

Cross product

The cross product of two vectors in three dimensions is perpendicular to both. So

z = v × w

is orthogonal to v and w. So

y = z × v

is perpendicular to v and lives in the same plane as w. So y is the vector we’re looking for, except that it might not have unit length. (In fact, it’s probably too short. It will have length equal to sin θ where θ is the angle between v and w. If sin θ were 1, then v and w were orthogonal and we wouldn’t need to go through this exercise.)

So we need to normalize y, setting

u = y / || y ||.

This solution is quick and simple, but it obscures the dependence on w. It also only works in 3 dimensions because cross product is only defined in 3 dimensions.

If you look back, we used the fact that we’re working in ℝ³ when we argued that y was in the plane spanned by v and w. In more dimensions, we could find a vector z perpendicular to v and w, and a vector y perpendicular to z but not in the plane of v and w.

More general solution

We need to find a unit vector u in the space spanned by v and w that is orthogonal to v. Since u is in the space spanned by v and w,

u = a v + b w,

for some a and b, and so a parameterization for our circle is

cos(t) v + sin(t) (a vb w).

We just need to find a and b. An advantage of this approach over the approach above is that the vector w is explicit in the parameterization.

Also, v and w could be vectors on some high-dimensional unit sphere. Even if v and w live in 100 dimensions, the subspace they span is two-dimensional and everything here works.

Since u is orthogonal to v, we have

u · v = (a v + b w) · v = a + b cos θ = 0

where the angle θ between v and w is given by

cos θ = v · w.

We can obtain another equation for a and b from the fact that u is a unit vector:

a² + b² + 2 ab cos θ = 1.

The solution to our system of equations for a and b is

a = ± cot θ
b = ∓ csc θ

and so an equation for our circle is

cos(t) v + sin(t) (cot θ v – csc θ w).

Wire gauge and user perspective

wire gauge measurement device

Wire gauge is a perennial source of confusion: larger numbers denote smaller wires. The reason is that gauge numbers were assigned from the perspective of the manufacturing process. Thinner wires require more steps in production. This is a common error in user interface design and business more generally: describing things from your perspective rather than from the customer’s perspective.


When you order food at a restaurant, the person taking your order may rearrange your words before repeating them back to you. The reason may be that they’re restating it in manufacturing order, the order in which the person preparing the food needs the information.


A rheostat is a device for controlling resistance in an electrical circuit. It would seem natural for an engineer to give a user a control to vary resistance in Ohms. But Ohm’s law says

V = IR,

i.e. voltage equals current times resistance. Users expect that when they turn a knob clockwise they get more of something—brighter lights, louder music, etc.—and that means more voltage or more current, which means less resistance. Asking users to control resistance reverses expectations.

If I remember correctly, someone designed a defibrillator once where a knob controlled resistance rather than current. If that didn’t lead to someone dying, it easily could have.


When I worked for MD Anderson Cancer Center, I managed the development of software for clinical trial design and conduct. Our software started out very statistician-centric and became more user-centric. This was a win, even for statisticians.

The general pattern was to move from eliciting technical parameters to eliciting desired behavior. Tell us how you want the design to behave and we’ll solve for the parameters to make that happen. Sometimes we weren’t able to completely automate parameter selection, but we were at least able to give the user a head start in knowing where to look.


Technical people don’t always want to have their technical hat on. Sometimes they want to relax and be consumers. When statisticians wanted to crank out a clinical trial design, they wanted software that was easy to use rather than technically transparent. That’s the backstory to this post.

It’s generally a good idea to conceal technical details, but provide a “service panel” to expose details when necessary.

Related posts