Category theory for programmers made easier

Posted on 2 November 2020 by John

I imagine most programmers who develop an interest in category theory do so after hearing about monads. They ask someone what a monad is, and they’re told that if they really want to know, they need to learn category theory.

Unfortunately, there are couple unnecessary difficulties anyone wanting to understand monads etc. is likely to face immediately. One is some deep set theory.

“A category is a collection of objects …”

“You mean like a set?”

“Ah, well, no. You see, Bertrand Russell showed that …”

There are reasons for such logical niceties, but they don’t matter to someone who wants to understand programming patterns.

Another complication is morphisms.

“As I was saying, a category is a collection of objects and morphisms between objects …”

“You mean like functions?”

“Well, they might be functions, but more generally …”

Yes, Virginia, morphisms are functions. It’s true that they might not always be functions, but they will be functions in every example you care about, at least for now.

Category theory is a framework for describing patterns in function composition, and so that’s why things like monads find their ultimate home in category theory. But doing category theory rigorously requires some setup that people eager to get into applications don’t have to be concerned with.

Patrick Honner posted on Twitter recently that his 8-year-old child asked him what area is. My first thought on seeing that was that a completely inappropriate answer would be that this is a deep question that wasn’t satisfactorily settled until the 20th century using measure theory. My joking response to Patrick was

Well, first we have to define σ-algebras. They’re kinda like topologies, but closed under countable union and intersection instead of arbitrarily union and finite intersection. Anyway, a measure is a …

It would be ridiculous to answer a child this way, and it is nearly as ridiculous to burden a programmer with unnecessary logical nuance when they’re trying to find out why something is called a functor, or a monoid, or a monad, etc.

I saw an applied category theory presentation that began with “A category is a graph …” That sweeps a lot under the rug, but it’s not a bad conceptual approximation.

So my advice to programmers learning category theory is to focus on the arrows in the diagrams. Think of them as functions; they probably are in your application [1]. Think of category theory as a framework for describing patterns. The rigorous foundations can be postponed, perhaps indefinitely, just as an 8-year-old child doesn’t need to know measure theory to begin understanding area.

Category theory for data science: cautious optimism

Posted on 26 June 2019 by John

I’m cautiously optimistic about applications of category theory. I believe category theory has the potential to be useful, but I’m skeptical of most claims to have found useful applications. Category theory has found useful application, especially behind the scenes, but a lot of supposed applications remind me of a line from Colin McLarty:

[Jean-Pierre] Serre created a series of concise elegant tools which Grothendieck and coworkers simplified into thousands of pages of category theory.

To be fair, Grothendieck put these recast tools to good use, but most mere mortals would stand a better chance of being able to use Serre’s versions.

My interest in category theory waxes and wanes, and just as it was at its thinnest crescent phase I ran across CQL, categorical query language. I haven’t had time to look very far into it, but it seems promising. The site’s modest prose relative to the revolutionary rhetoric of some category enthusiasts makes me have more confidence that the authors may be on to something useful.

How category theory is applied

Posted on 29 April 2019 by John

Instead of asking whether an area of mathematics can be applied, it’s more useful to as how it can be applied.

Differential equations are directly and commonly applied. Ask yourself what laws govern the motion of some system, write down these laws as differential equations, then solve them. Statistical models are similarly direct: propose a model and feed it data.

Linear algebra is extremely useful in application, but it’s not often applied so directly. Rarely would you look at a system and immediately see a linear system. Instead you might describe a system in terms of differential equations or statistical models, then use linear algebra as a key part of the solution process.

Numerical analysis is useful because, for example, it helps you practically solve large linear systems (which help you solve differential equations, which model physical systems).

Many areas of math are useful in application, but some are applied more directly than others. It’s a mistake to say something is not applied just because it is not applied directly.

The following quote from Colin McLarty describes how some of the most abstract math, including cohomology and category theory, is applied.

Cohomology does not itself solve hard problems in topology or algebra. It clears away tangled multitudes of individually trivial problems. It puts the hard problems in clear relief and makes their solution possible. The same holds for category theory in general.

While McLarty speaks of applications to other areas of math, the same applies to applications to other areas such as software engineering.

I suspect many supposed applications of category theory are post hoc glosses, dressing up ideas in categorical language that were discovered in a more tangible setting. At the same time, the applications of category theory may be understated because the theory works behind the scenes. As I discuss here, category theory may lead to insights that are best communicated in the language of the original problem, not in the language of categories.

Groups in categories

Posted on 21 April 2019 by John

The first time I saw a reference to a “group in a category” I misread it as something in the category of groups. But that’s not what it means. Due to an unfortunately choice of terminology, “in” is more subtle than just membership in a class.

This is related to another potentially misleading term, algebraic groups, mentioned in the previous post on isogenies. An algebraic group is a “group object” in the category of algebraic varieties. Note the mysterious use of the word “in.”

You may have heard the statement “A monad is just a monoid in the category of endofunctors.” While true, the statement is meant to be a joke because it abstracted so far from what most people want to know when they ask what a monad is in the context of functional programming. But notice this follows the pattern of an X in the category of Y‘s, even though here X stands for monoid rather than group. The meaning of “in the category of” is the same.

If you want to go down this rabbit hole, you could start with the nLab article on group objects. A group object is a lot like a group, but everything is happening “in” a category.

Take a look at the list of examples. The first says that a group object in the category of sets is a group. That’s reassuring. The second example says that a group object in the category of topological spaces is a topological group. At this point you may get the idea that an X in the category of Y‘s simply adds an X structure to a Y thing. But further down you’ll see that a group object in the category of groups is an Abelian group, which is an indication that something more subtle is going on.

Monads and generalized elements

Posted on 24 March 2019 by John

Paolo Perrone gives a nice, succinct motivation for monads in the introduction to his article on probability and monads.

… a monad is like a consistent way of extending spaces to include generalized elements of a specific kind.

He develops this idea briefly, and links to his dissertation where he gives a longer exposition (pages 8–14).

Related post: Monads are hard because …

Curry-Howard-Lambek correspondence

Posted on 20 November 2018 by John

Curry-Howard-Lambek is a set of correspondences between logic, programming, and category theory. You may have heard of the slogan proofs-as-programs or propositions-as-types. These refer to the Curry-Howard correspondence between mathematical proofs and programs. Lambek’s name is appended to the Curry-Howard correspondence to represent connections to category theory.

The term Curry-Howard isomorphism is often used but is an overstatement. Logic and programming are not isomorphic, nor is either isomorphic to category theory.

You’ll also hear the term computational trinitarianism. That may be appropriate because logic, programming, and category theory share common attributes without being identical.

The relations between logic, programming, and categories are more than analogies, but less than isomorphisms.

There are formal correspondences between parts of each theory. And aside from these rigorous connections, there is also a heuristic that something interesting in one area is likely to have an interesting counterpart in the other areas. In that sense Curry-Howard-Lambek is similar to the Langlands program relating number theory and geometry, a set of theorems and conjectures as well as a sort of philosophy.

Epi and mono

Posted on 25 August 2018 by John

My first math classes used the terms one to one and onto to describe functions. These Germanic names have largely been replaced with their French equivalents injective and surjective.

Monic and epic are the category theory analogs of injective and surjective respectively. This post will define these terms and show how they play out in several contexts. In some categories these new terms correspond exactly to their traditional counterparts, but in others they do not.

Sets and functions

A function f from a set A to a set B is injective (one-to-one) if different points on A go to different points in B. That is, f(x) = f(y) only if x = y.

The function f is surjective (onto) if everything in B is covered. That is, for every b in B, there is some a in A such that f(a) = b.

A function is bijective if it is both injective and surjective.

Categories and morphisms

How would we generalize these definitions to category theory? This is not obvious because the rules of category theory don’t let us peek inside objects and talk about elements. Everything has to be characterized in terms of objects and morphisms.

It turns out that the categorical way to view things is to focus on when composition can be cancelled out, either on the left or on the right.

Epimorphisms

A morphism f from an object X to an object Y is an epimorphism if for any other object Z, and any pair of morphisms g₁ and g₂ from Y to Z, if g₁ f = g₂ f then g₁ = g₂. Instead of saying f is an epimorphism, some authors would say f is epic or f is an epi.

If a morphism f is epic, we can cancel it out on the right.

(Unfortunately it is conventional to write function composition from right to left, like splicing in a little bit of Hebrew inside English prose. In the diagram above, f is on the left, and composition proceeds from left to right. But when we write the composition of functions we write from right to left, e.g. gf. When we say we can cancel out f on the right, we mean the right of the expression gf, not on the right of a commutative diagram!)

Monomorphisms

Monomorphism is the dual concept to epimorphism.

A morphism f from an object X to an object Y is an monomorphism if for any other object Z, and any pair of morphisms g₁ and g₂ from Z to X, if f g₁ = f g₂ then g₁ = g₂. Instead of saying f is a monomorphism, some authors would say f is monic or f is a mono.

If a morphism f is monic, we can cancel it out on the left, in the sense of fg as explained above.

Examples

In any category, an iso is epic and monic. That is, an isomorphism is always an epimorphism and a monomorphism. The converse holds in some categories but not in others.

Sets and groups

In the category of sets, a function f from X to Y is an epimorphism iff (if an only if) it is surjective.

Also in the category of sets, a function is a monomorphism iff it is injective.

Groups are similar in that a group homomorphism is an epimorphism iff it surjective, and a monomorphism iff it is injective.

Monoids

Recall that a monoid is like a group, except you don’t necessarily have inverses. That is, you have an associative operation and an identity element.

Let N be the non-negative integers with addition and let Z be the integers. Let f be the inclusion map that takes N into Z. Then f is injective and a monomorphism. It is not surjective, but it is an epimorphism because homomorphism from the integers to another monoid is determined by its values on the positive integers. Its range leaves out half the codomain, but we can still cancel it out on the right.

So here we have a function that is epic and monic, but not an isomorphism.

Rings

A ring homomorphism is a monomorphism iff it is injective. But as with monoids, a ring homomorphism can be an epimorphism without being surjective.

For example, consider the inclusion map from the integers to the rationals. A ring homomorphism is determined by its values on integers, so the inclusion map is an epimorphism even though it is not surjective.

Incidentally, homomorphism between fields is monic.

Currying in calculus, PDEs, programming, and categories

Posted on 11 August 2018 by John

logician Haskell Brooks Curry

Currying is a simple but useful idea. (It’s named after logician Haskell Curry [1] and has nothing to do with spicy cuisine.) If you have a function of two variables, you can think of it as a function of one variable that returns a function of one variable. So starting with a function f(x, y), we can think of this as a function that takes a number x and returns a function f(x, -) of y. The dash is a placeholder as in this recent post.

Calculus: Fubini’s theorem

If you’ve taken calculus then you saw this in the context of Fubini’s theorem:

$\int_a^b\!\int_c^d f(x, y) \, dx \, dy= \int_a^b\left(\int_c^d f(x, y)\, dx\right )\,dy$

To integrate the function of two variables f(x, y), you can temporarily fix y and integrate the remaining function of x. This gives you a number, the value of an integral, for each y, so it’s a function of y. Integrate that function, and you have the value of the original function of two variables.

The first time you see this you may think it’s a definition, but it’s not. You can define the integral on the left directly, and it will equal the result of the two nested integrations on the right. Or at least the two sides will often be equal. The conditions on Fubini’s theorem tell you exactly when the two sides are equal.

PDEs: Evolution equations

A more sophisticated version of the same trick occurs in partial differential equations. If you have an evolution equation, a PDE for a function on one time variable and several space variables, you can think of it as an ODE via currying. For each time value t, you get a function of the spatial variables. So you can think of your solution as a path in a space of functions. The spatial derivatives specify an operator on that space of functions.

(I’m glossing over details here because spelling everything out would take a lot of writing, and might obscure the big idea, which relevant for this post. If you’d like the full story, you can see, for example, my graduate advisor’s book. It was out of print when I studied it, but now it’s a cheap Dover paperback.)

Haskell programming

In the Haskell programming language (also named after Haskell Curry) you get currying for free. In fact, there’s no other way to express a function of two variables. For example, suppose you want to implement the function f(x, y) = x² + y.

    Prelude> f x y = x**2 + y

Then Haskell thinks of this as a function of one variable (i.e. x), that returns a function of one variable (i.e. f(x, -)) which itself returns a number (i.e. f(x, y)). You can see this by asking the REPL for the type of f:

    Prelude> :info f
    f :: Floating a => a -> a -> a

Technically, Haskell, just like lambda calculus, only has functions of one variable. You could create a product datatype consisting of a pair of variables and have your function take that as an argument, but it’s still a function on one variable, though that variable is not atomic.

Category theory

The way you’d formalize currying in category theory is to say that the following is a natural isomorphism:

$\mathrm{Hom}(A \times B, C) \cong \mathrm{Hom}(A, \mathrm{Hom}(B, C))$

For more on what Hom means, see this post.

Weakening the requirements for a group (especially section on semigroups)
Categorical products
Beta reduction

[1] In concordance with Stigler’s law of eponymy, currying was not introduced by Curry but Gottlob Frege. It was then developed by Moses Schönfinkel and developed further by Haskell Curry.

Hom functors and a glimpse of Yoneda

Posted on 6 August 2018 by John

Given two objects A and B, Hom(A, B) is simply the set of functions between A and B. From this humble start, things get more interesting quickly.

Hom sets

To make the above definition precise, we need to say what kinds of objects and what kinds of functions we’re talking about. That is, we specify a category C that the object belong to, and the functions are the morphisms of that category [1]. For example, in the context of groups, Hom(A, B) would be the set of group homomorphisms [2] between A and B, but in the context of continuous groups (Lie groups), we would restrict Hom(A, B) to be continuous group homomorphisms.

To emphasize that Hom refers to a set of morphisms in a particular category, sometimes you’ll see the name of the category as a subscript, as in Hom_C(A, B). Sometimes you’ll see the name of the category as a replacement for Hom as in C(A, B). You may also see the name of the category in square braces as in [C](A, B).

Hom functors

So far Hom has been a set, but you’ll also see Hom as a functor. This is where the notation takes a little more interpretation. You may see a capital H with objects as superscripts or subscripts:

$\begin{eqnarray*} H^A &=& \mathrm{Hom}(A, -) \\ H_A &=& \mathrm{Hom}(-, A) \end{eqnarray*}$

You may also see a lower case h instead. And I’ll use the name of the category instead of “Hom” just to throw that variation in too.

$\begin{eqnarray*} h^A &=& {\cal C}(A, -) \\ h_A &=& {\cal C}(-, A) \end{eqnarray*}$

I’ll explain what these mean and give a mnemonic for remembering which is which.

Action on objects

The dash in Hom(A, -) and Hom(-, A) means “insert your object here.” That is, Hom(A, –) takes an object B and maps it to the set of morphisms Hom(A, B), and Hom(-, A) takes an object B to Hom(B, A).

So Hom(A, -) and Hom(-, A) each take an object in the category C to a set of morphims, i.e. an element in the category Set. But that’s only half of what it takes to be a functor. A functor not only maps objects in one category to objects in another category, it also maps morphisms in one category to morphisms in the other. (And it does so in a way that the interactions between the maps of objects and morphisms interact coherently.)

Action on morphisms

Where do the morphisms come in? We’ve established what H^A and H_A do to objects, but what do they do to morphisms?

Suppose we have a morphism f: X → Y in the category C and a function g in Hom(A, X) to Hom(A, Y)? For each g in Hom(A, X), the composition fg is an element of Hom(A, Y).

Next suppose f: Y → X (note the reversed order) and a function g in Hom(X, A). Now the composition gf is an element of Hom(Y, A). Note that before we applied f after g, but here we pre-compose with f, i.e. apply f before g.

Aside from the notation, what’s going on is very simple. If you have a function from A to X and a function from X to Y, then the composition of the two is a function from A to Y. Similarly, if you have a function from Y to X and a function from X to A, the composition is a function from Y to A.

Note that H^A is a covariant functor, but H_A is a contravariant. More on covariant vs contravariant below.

Notation mnemonic

How can you keep H^A and H_A straight? It’s common to use superscript notation Y^X to indicate the set of functions from the superscript object X to the base object Y. You may have seen this before even if you don’t think you have.

The notation Y² denotes the product of Y with it self, such as R² for the plane, i.e. pairs of real numbers. A pair of things in Y is a function from a two-element set to Y. You could think of (y₁, y₂) as the result of mapping the set (1, 2) into Y.

You may also have seen the notation 2^X for the power set of X, i.e. the set of all subsets of X. You could think of the power set of X being the set of maps from X to the Boolean values (true, false) where an element x is mapped to true if and only if x is in that particular subset.

The notation using H or h with a superscript A stands for Hom(A, -), i.e. morphisms out of A, which is consistent with the usage described above. And so the other is the other, i.e. a subscript A stands for Hom(-, A), i.e morphisms into A.

(Unfortunately, some authors use the opposite of the convention used here, which blows the whole mnemonic away. But the convention used here is most common.)

Yoneda lemma

We’re close to being able to state one of the most important theorems in category theory, the Yoneda lemma. (Lemmas often turn out to be more useful and better known than the theorem they were first used to prove.) So while we’re in the neighborhood, we’ll take a look.

A corollary of the Yoneda lemma says

$\begin{eqnarray*} \mathrm{Hom}(H^A, H^B) &\cong& \mathrm{Hom}(B, A) \\ \mathrm{Hom}(H_A, H_B) &\cong& \mathrm{Hom}(A, B) \\ \end{eqnarray*}$

The meaning of “Hom” is different on the left and right because we’re looking at morphisms between different kinds of objects. On the right we have sets of morphisms in our category C as above. The left side takes more explanation.

What kind of things are H^A and H^B? They are functors from C to Set. The class of functors between two categories forms a category itself. The functors are the objects in this new category, and natural transformations are the morphisms. So Hom in this context is the set of natural transformations between the two functors.

What kind of things are H_A and H_B? They are contravariant functors from C to Set, and contravariant functors also form a category. However, contemporary category theory doesn’t like to speak of contravariant functors, preferring to only work with covariant functors, and leaving the term “covariant” implicit. So rather than saying H_A and H_B are contravariant functors on C, most contemporary writers would say they are (covariant) functors on a new category C^op, where “op” stands for opposite. That is, C^op is a category with all the same objects as C, but with all the arrows turned around. Every morphism from A to B in C corresponds to a morphism from B to A in C^op.

[1] Morphisms are a generalization of functions. Often morphisms are functions, but they might not be. But still, they have to be things that compose like functions.

[2] The name Hom is a shortened from of “homomorphism.” Different areas of math have different names for structure-preserving functions, and category theory wanted to have one name for them all. It used “Hom” as an allusion to what structure-preserving functions are called in group theory. Similarly, “morphism” is also a shorted form of “homomorphism.” I suppose the goal was to use names reminiscent of group theory, but different enough to remind the reader that the category theory counterparts are distinct.

Incidentally, “homomorphism” comes from the Greek roots meaning “similar” and “shape.” A homomorphism is a function between similar objects (objects in the same category) that preserves structure (shape).

Moore-Penrose pseudoinverse is not an adjoint

Posted on 24 April 2018 by John

The Moore-Penrose pseudoinverse of a matrix is a way of coming up with something like an inverse for a matrix that doesn’t have an inverse. If a matrix does have an inverse, then the pseudoinverse is in fact the inverse. The Moore-Penrose pseudoinverse is also called a generalized inverse for this reason: it’s not just like an inverse, it actually is an inverse when that’s possible.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A⁺ is the unique n by m matrix satisfying four conditions:

A A⁺ A = A
A⁺ A A⁺ = A⁺
(A A⁺)* = A A⁺
(A⁺ A)* = A⁺ A

The first equation says that AA⁺ is a left identity for A, and A⁺A is a identity for A.

The second equation says A⁺A is a left identity for A⁺, and A A⁺ is a right identity for A⁺.

The third and fourth equations say that A A⁺ and A⁺A are Hermitian.

If A is invertible, A A⁺ and A⁺A are both the identity matrix. Otherwise A A⁺ and A⁺A act an awful lot like the identity, as much as you could expect, maybe a little more than you’d expect.

Update: See this post for the relationship between the singular value decomposition and pseudoinverses, and how to compute both in Python and Mathematica.

Galois connections and adjoints

John Baez recently wrote that a Galois connection, a kind of categorical adjunction, is

“the best approximation to reversing a computation that can’t be reversed.”

That sounds like a pseudoinverse! And the first two equations defining a pseudoinverse look a lot like things you’ll see in the context of adjunctions, so the pseudoinverse must be an adjunction, right?

The question was raised on MathOverflow and Michal R. Przybylek answered

I do not think the concept of Moore-Penrose Inverse and the concept of categorical adjunction have much in common (except they both try to generalise the concept of inverse) …

and gives several reasons why. (Emphasis added.)

Too bad. It would have made a good connection. Applied mathematicians are likely to be familiar with Moore-Penrose pseudoinverses but not categorical adjoints. And pure mathematicians, depending on their interests, may be more familiar with adjoint functors than matrix pseudoinverses.

So what about John Baez’ comment? His comment was expository (and very helpful) but not meant to be rigorous. To make it rigorous you’d have to be rigorous about what you mean by “best approximation” etc. And when you define your terms carefully, in the language of category theory, you get adjoints. This means that the Moore-Penrose inverse, despite its many nice properties [1], doesn’t mesh well with categorical definitions. It’s not the best approximate inverse from a categorical perspective because it doesn’t compose well, and category theory values composition above all else. The Moore-Penrose pseudoinverse may be the best approximate inverse from some perspectives, but not from a categorical perspective.

Przybylek explains

… adjunctions compose … but Moore-Penrose pseudoinverses—generally—do not. … pseudoinverses are not stable under isomorphisms, thus are not categorical.

That’s the gist of his final point. Now let me fill in and expand slightly part of what I cut out.

If f: A → B is left adjoint to f⁺: B → A and g: B → C is left adjoint to g⁺: C → B then the composition gf: A → C is left adjoint to the composition f⁺g⁺: C → A, but Moore-Penrose pseudoinverses do not compose this way in general.

This turns out to be an interesting example, but not of what I first expected. Rather than the pseudoinverse of a matrix being an example of an adjoint, it is an example of something that despite having convenient properties does not compose well from a categorical perspective.

Category theory

Category theory for programmers made easier

More category theory posts

Category theory for data science: cautious optimism

How category theory is applied

More on applying abstract math

Groups in categories

More category theory posts

Monads and generalized elements

Curry-Howard-Lambek correspondence

Epi and mono

Sets and functions

Categories and morphisms

Epimorphisms

Monomorphisms

Examples

Sets and groups

Monoids

Rings

More category theory posts

Currying in calculus, PDEs, programming, and categories

Calculus: Fubini’s theorem

PDEs: Evolution equations

Haskell programming

Category theory

Related posts

Hom functors and a glimpse of Yoneda

Hom sets

Hom functors

Action on objects

Action on morphisms

Notation mnemonic

Yoneda lemma

Related posts

Moore-Penrose pseudoinverse is not an adjoint

Galois connections and adjoints

Related math posts