Category theory for data science: cautious optimism

I’m cautiously optimistic about applications of category theory. I believe category theory has the potential to be useful, but I’m skeptical of most claims to have found useful applications. Category theory has found useful application, especially behind the scenes, but a lot of supposed applications remind me of a line from Colin McLarty:

[Jean-Pierre] Serre created a series of concise elegant tools which Grothendieck and coworkers simplified into thousands of pages of category theory.

To be fair, Grothendieck put these recast tools to good use, but most mere mortals would stand a better chance of being able to use Serre’s versions.

My interest in category theory waxes and wanes, and just as it was was at its thinnest crescent phase I ran across CQL, categorical query language. I haven’t had time to look very far into it, but it seems promising. The site’s modest prose relative to the revolutionary rhetoric of some category enthusiasts makes me have more confidence that the authors may be on to something useful.

Related post: Categorical (Data Analysis) vs (Categorical Data) Analysis

How category theory is applied

Instead of asking whether an area of mathematics can be applied, it’s more useful to as how it can be applied.

Differential equations are directly and commonly applied. Ask yourself what laws govern the motion of some system, write down these laws as differential equations, then solve them. Statistical models are similarly direct: propose a model and feed it data.

Linear algebra is extremely useful in application, but it’s not often applied so directly. Rarely would you look at a system and immediately see a linear system. Instead you might describe a system in terms of differential equations or statistical models, then use linear algebra as a key part of the solution process.

Numerical analysis is useful because, for example, it helps you practically solve large linear systems (which help you solve differential equations, which model physical systems).

Many areas of math are useful in application, but some are applied more directly than others. It’s a mistake to say something is not applied just because it is not applied directly.

The following quote from Colin McLarty describes how some of the most abstract math, including cohomology and category theory, is applied.

Cohomology does not itself solve hard problems in topology or algebra. It clears away tangled multitudes of individually trivial problems. It puts the hard problems in clear relief and makes their solution possible. The same holds for category theory in general.

While McLarty speaks of applications to other areas of math, the same applies to applications to other areas such as software engineering.

I suspect many supposed applications of category theory are post hoc glosses, dressing up ideas in categorical language that were discovered in a more tangible setting. At the same time, the applications of category theory may be understated because the theory works behind the scenes. As I discuss here, category theory may lead to insights that are best communicated in the language of the original problem, not in the language of categories.

Related posts

Groups in categories

The first time I saw a reference to a “group in a category” I misread it as something in the category of groups. But that’s not what it means. Due to an unfortunately choice of terminology, “in” is more subtle than just membership in a class.

This is related to another potentially misleading term, algebraic groups, mentioned in the previous post on isogenies. An algebraic group is a “group object” in the category of algebraic varieties. Note the mysterious use of the word “in.”

You may have heard the statement “A monad is just a monoid in the category of endofunctors.” While true, the statement is meant to be a joke because it abstracted so far from what most people want to know when they ask what a monad is in the context of functional programming. But notice this follows the pattern of an X in the category of Y‘s, even though here X stands for monoid rather than group. The meaning of “in the category of” is the same.

If you want to go down this rabbit hole, you could start with the nLab article on group objects. A group object is a lot like a group, but everything is happening “in” a category.

Take a look at the list of examples. The first says that a group object in the category of sets is a group. That’s reassuring. The second example says that a group object in the category of topological spaces is a topological group. At this point you may get the idea that an X in the category of Y‘s simply adds an X structure to a Y thing. But further down you’ll see that a group object in the category of groups is an Abelian group, which is an indication that something more subtle is going on.

Related posts

Monads and generalized elements

Paolo Perrone gives a nice, succinct motivation for monads in the introduction to his article on probability and monads.

… a monad is like a consistent way of extending spaces to include generalized elements of a specific kind.

He develops this idea briefly, and links to his dissertation where he gives a longer exposition (pages 8–14).

Related post: Monads are hard because …

Curry-Howard-Lambek correspondence

Curry-Howard-Lambek is a set of correspondences between logic, programming, and category theory. You may have heard of the slogan proofs-as-programs or propositions-as-types. These refer to the Curry-Howard correspondence between mathematical proofs and programs. Lambek’s name is appended to the Curry-Howard correspondence to represent connections to category theory.

The term Curry-Howard isomorphism is often used but is an overstatement. Logic and programming are not isomorphic, nor is either isomorphic to category theory.

You’ll also hear the term computational trinitarianism. That may be appropriate because logic, programming, and category theory share common attributes without being identical.

The relations between logic, programming, and categories are more than analogies, but less than isomorphisms.

There are formal correspondences between parts of each theory. And aside from these rigorous connections, there is also a heuristic that something interesting in one area is likely to have an interesting counterpart in the other areas. In that sense Curry-Howard-Lambek is similar to the Langlands program relating number theory and geometry, a set of theorems and conjectures as well as a sort of philosophy.

Epi and mono

My first math classes used the terms one to one and onto to describe functions. These Germanic names have largely been replaced with their French equivalents injective and surjective.

Monic and epic are the category theory analogs of injective and surjective respectively. This post will define these terms and show how they play out in several contexts. In some categories these new terms correspond exactly to their traditional counterparts, but in others they do not.

Sets and functions

A function f from a set A to a set B is injective (one-to-one) if different points on A go to different points in B. That is, f(x) = f(y) only if xy.

The function f  is surjective (onto) if everything in B is covered. That is, for every b in B, there is some a in A such that f(a) = b.

A function is bijective if it is both injective and surjective.

Categories and morphisms

How would we generalize these definitions to category theory? This is not obvious because the rules of category theory don’t let us peek inside objects and talk about elements. Everything has to be characterized in terms of objects and morphisms.

It turns out that the categorical way to view things is to focus on when composition can be cancelled out, either on the left or on the right.

Epimorphisms

A morphism f from an object X to an object Y is an epimorphism if for any other object Z, and any pair of morphisms g1 and g2 from Y to Z, if g1 f = g2 f then g1 = g2. Instead of saying f is an epimorphism, some authors would say f is epic or f is an epi.

epimorphism diagram

If a morphism f is epic, we can cancel it out on the right.

(Unfortunately it is conventional to write function composition from right to left, like splicing in a little bit of Hebrew inside English prose. In the diagram above, f is on the left, and composition proceeds from left to right. But when we write the composition of functions we write from right to left, e.g. gf. When we say we can cancel out f on the right, we mean the right of the expression gf, not on the right of a commutative diagram!)

Monomorphisms

Monomorphism is the dual concept to epimorphism.

A morphism f from an object X to an object Y is an monomorphism if for any other object Z, and any pair of morphisms g1 and g2from Z to X, if f g1 = f g2 then g1 = g2. Instead of saying f is a monomorphism, some authors would say f is monic or f is a mono.

monomorphism diagramIf a morphism f is monic, we can cancel it out on the left, in the sense of fg as explained above.

Examples

In any category, an iso is epic and monic. That is, an isomorphism is always an epimorphism and a monomorphism. The converse holds in some categories but not in others.

Sets and groups

In the category of sets, a function f from X to Y is an epimorphism iff (if an only if) it is surjective.

Also in the category of sets, a function is a monomorphism iff it is injective.

Groups are similar in that a group homomorphism is an epimorphism iff it surjective, and a monomorphism iff it is injective.

Monoids

Recall that a monoid is like a group, except you don’t necessarily have inverses. That is, you have an associative operation and an identity element.

Let N be the non-negative integers with addition and let Z be the integers. Let f be the inclusion map that takes N into Z. Then f is injective and a monomorphism. It is not surjective, but it is an epimorphism because homomorphism from the integers to another monoid is determined by its values on the positive integers. Its range leaves out half the codomain, but we can still cancel it out on the right.

So here we have a function that is epic and monic, but not an isomorphism.

Rings

A ring homomorphism is a monomorphism iff it is injective. But as with monoids, a ring homomorphism can be an epimorphism without being surjective.

For example, consider the inclusion map from the integers to the rationals. A ring homomorphism is determined by its values on integers, so the inclusion map is an epimorphism even though it is not surjective.

Incidentally, homomorphism between fields is monic.

Related posts

Currying in calculus, PDEs, programming, and categories

logician Haskell Brooks Curry

Currying is a simple but useful idea. (It’s named after logician Haskell Curry [1] and has nothing to do with spicy cuisine.) If you have a function of two variables, you can think of it as a function of one variable that returns a function of one variable. So starting with a function f(x, y), we can think of this as a function that takes a number x and returns a function f(x, -) of y. The dash is a placeholder as in this recent post.

Calculus: Fubini’s theorem

If you’ve taken calculus then you saw this in the context of Fubini’s theorem:

\int_a^b\!\int_c^d f(x, y) \, dx \, dy= \int_a^b\left(\int_c^d f(x, y)\, dx\right )\,dy

To integrate the function of two variables f(x, y), you can temporarily fix y and integrate the remaining function of x. This gives you a number, the value of an integral, for each y, so it’s a function of y. Integrate that function, and you have the value of the original function of two variables.

The first time you see this you may think it’s a definition, but it’s not. You can define the integral on the left directly, and it will equal the result of the two nested integrations on the right. Or at least the two sides will often be equal. The conditions on Fubini’s theorem tell you exactly when the two sides are equal.

PDEs: Evolution equations

A more sophisticated version of the same trick occurs in partial differential equations. If you have an evolution equation, a PDE for a function on one time variable and several space variables, you can think of it as an ODE via currying. For each time value t, you get a function of the spatial variables. So you can think of your solution as a path in a space of functions. The spatial derivatives specify an operator on that space of functions.

(I’m glossing over details here because spelling everything out would take a lot of writing, and might obscure the big idea, which relevant for this post. If you’d like the full story, you can see, for example, my graduate advisor’s book. It was out of print when I studied it, but now it’s a cheap Dover paperback.)

Haskell programming

In the Haskell programming language (also named after Haskell Curry) you get currying for free. In fact, there’s no other way to express a function of two variables. For example, suppose you want to implement the function f(xy) = x² + y.

    Prelude> f x y = x**2 + y

Then Haskell thinks of this as a function of one variable (i.e. x), that returns a function of one variable (i.e. f(x, -)) which itself returns a number (i.e. f(x, y)). You can see this by asking the REPL for the type of f:

    Prelude> :info f
    f :: Floating a => a -> a -> a

Technically, Haskell, just like lambda calculus, only has functions of one variable. You could create a product datatype consisting of a pair of variables and have your function take that as an argument, but it’s still a function on one variable, though that variable is not atomic.

Category theory

The way you’d formalize currying in category theory is to say that the following is a natural isomorphism:

\mathrm{Hom}(A \times B, C) \cong \mathrm{Hom}(A, \mathrm{Hom}(B, C))

For more on what Hom means, see this post.

Related posts

[1] In concordance with Stigler’s law of eponymy, currying was not introduced by Curry but Gottlob Frege. It was then developed by Moses Schönfinkel and developed further by Haskell Curry.

Hom functors and a glimpse of Yoneda

Given two objects A and B, Hom(A, B) is simply the set of functions between A and B. From this humble start, things get more interesting quickly.

Hom sets

To make the above definition precise, we need to say what kinds of objects and what kinds of functions we’re talking about. That is, we specify a category C that the object belong to, and the functions are the morphisms of that category [1]. For example, in the context of groups, Hom(A, B) would be the set of group homomorphisms [2] between A and B, but in the context of continuous groups (Lie groups), we would restrict Hom(A, B) to be continuous group homomorphisms.

To emphasize that Hom refers to a set of morphisms in a particular category, sometimes you’ll see the name of the category as a subscript, as in HomC(A, B). Sometimes you’ll see the name of the category as a replacement for Hom as in C(A, B). You may also see the name of the category in square braces as in [C](AB).

Hom functors

So far Hom has been a set, but you’ll also see Hom as a functor. This is where the notation takes a little more interpretation. You may see a capital H with objects as superscripts or subscripts:

\begin{eqnarray*} H^A &=& \mathrm{Hom}(A, -) \\ H_A &=& \mathrm{Hom}(-, A) \end{eqnarray*}

You may also see a lower case h instead. And I’ll use the name of the category instead of “Hom” just to throw that variation in too.

\begin{eqnarray*} h^A &=& {\cal C}(A, -) \\ h_A &=& {\cal C}(-, A) \end{eqnarray*}

I’ll explain what these mean and give a mnemonic for remembering which is which.

Action on objects

The dash in Hom(A, -) and Hom(-, A) means “insert your object here.” That is, Hom(A) takes an object B and maps it to the set of morphisms Hom(AB), and Hom(-, A) takes an object B to Hom(BA).

So Hom(A, -) and Hom(-, A) each take an object in the category C to a set of morphims, i.e. an element in the category Set. But that’s only half of what it takes to be a functor. A functor not only maps objects in one category to objects in another category, it also maps morphisms in one category to morphisms in the other. (And it does so in a way that the interactions between the maps of objects and morphisms interact coherently.)

Action on morphisms

Where do the morphisms come in? We’ve established what HA and HA do to objects, but what do they do to morphisms?

Suppose we have a morphism fXY in the category and a function g in Hom(A, X) to Hom(A, Y)? For each g in Hom(AX), the composition fg is an element of Hom(AY).

Next suppose fYX (note the reversed order) and a function g in Hom(X, A). Now the composition gf is an element of Hom(YA). Note that before we applied f after g, but here we pre-compose with f, i.e. apply f before g.

Aside from the notation, what’s going on is very simple. If you have a function from A to X and a function from X to Y, then the composition of the two is a function from A to Y. Similarly, if you have a function from Y to X and a function from X to A, the composition is a function from Y to A.

Note that HA is a covariant functor, but HA is a contravariant. More on covariant vs contravariant below.

Notation mnemonic

How can you keep HA and HA straight? It’s common to use superscript notation YX to indicate the set of functions from the superscript object X to the base object Y. You may have seen this before even if you don’t think you have.

The notation Y² denotes the product of Y with it self, such as R² for the plane, i.e. pairs of real numbers. A pair of things in Y is a function from a two-element set to Y. You could think of (y1, y2) as the result of mapping the set (1, 2) into Y.

You may also have seen the notation 2X for the power set of X, i.e. the set of all subsets of X. You could think of the power set of X being the set of maps from X to the Boolean values (true, false) where an element x is mapped to true if and only if x is in that particular subset.

The notation using H or h with a superscript A stands for Hom(A, -), i.e. morphisms out of A, which is consistent with the usage described above. And so the other is the other, i.e. a subscript A stands for Hom(-, A), i.e morphisms into A.

(Unfortunately, some authors use the opposite of the convention used here, which blows the whole mnemonic away. But the convention used here is most common.)

Yoneda lemma

We’re close to being able to state one of the most important theorems in category theory, the Yoneda lemma. (Lemmas often turn out to be more useful and better known than the theorem they were first used to prove.) So while we’re in the neighborhood, we’ll take a look.

A corollary of the Yoneda lemma says

\begin{eqnarray*} \mathrm{Hom}(H^A, H^B) &\cong& \mathrm{Hom}(B, A) \\ \mathrm{Hom}(H_A, H_B) &\cong& \mathrm{Hom}(A, B) \\ \end{eqnarray*}

The meaning of “Hom” is different on the left and right because we’re looking at morphisms between different kinds of objects. On the right we have sets of morphisms in our category C as above. The left side takes more explanation.

What kind of things are HA and HB? They are functors from C to Set. The class of functors between two categories forms a category itself. The functors are the objects in this new category, and natural transformations are the morphisms. So Hom in this context is the set of natural transformations between the two functors.

What kind of things are HA and HB? They are contravariant functors from C to Set, and contravariant functors also form a category. However, contemporary category theory doesn’t like to speak of contravariant functors, preferring to only work with covariant functors, and leaving the term “covariant” implicit. So rather than saying HA and HB are contravariant functors on C, most contemporary writers would say they are (covariant) functors on a new category Cop, where “op” stands for opposite. That is, Cop is a category with all the same objects as C, but with all the arrows turned around. Every morphism from A to B in C corresponds to a morphism from B to A in Cop.

Related posts

[1] Morphisms are a generalization of functions. Often morphisms are functions, but they might not be. But still, they have to be things that compose like functions.

[2] The name Hom is a shortened from of “homomorphism.” Different areas of math have different names for structure-preserving functions, and category theory wanted to have one name for them all. It used “Hom” as an allusion to what structure-preserving functions are called in group theory. Similarly, “morphism” is also a shorted form of “homomorphism.” I suppose the goal was to use names reminiscent of group theory, but different enough to remind the reader that the category theory counterparts are distinct.

Incidentally, “homomorphism” comes from the Greek roots meaning “similar” and “shape.” A homomorphism is a function between similar objects (objects in the same category) that preserves structure (shape).

Moore-Penrose pseudoinverse is not an adjoint

The Moore-Penrose pseudoinverse of a matrix is a way of coming up with something like an inverse for a matrix that doesn’t have an inverse. If a matrix does have an inverse, then the pseudoinverse is in fact the inverse. The Moore-Penrose pseudoinverse is also called a generalized inverse for this reason: it’s not just like an inverse, it actually is an inverse when that’s possible.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A+ is the unique n by m matrix satisfying four conditions:

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

The first equation says that AA+ is a left identity for A, and A+A is a identity for A.

The second equation says A+A is a left identity for A+, and A A+ is a right identity for A+.

The third and fourth equations say that A A+ and A+A are Hermitian.

If A is invertible, A A+ and A+A are both the identity matrix. Otherwise A A+ and A+A act an awful lot like the identity, as much as you could expect, maybe a little more than you’d expect.

Update: See this post for the relationship between the singular value decomposition and pseudoinverses, and how to compute both in Python and Mathematica.

Galois connections and adjoints

John Baez recently wrote that a Galois connection, a kind of categorical adjunction, is

“the best approximation to reversing a computation that can’t be reversed.”

That sounds like a pseudoinverse! And the first two equations defining a pseudoinverse look a lot like things you’ll see in the context of adjunctions, so the pseudoinverse must be an adjunction, right?

The question was raised on MathOverflow and Michal R. Przybylek answered

I do not think the concept of Moore-Penrose Inverse and the concept of categorical adjunction have much in common (except they both try to generalise the concept of inverse) …

and gives several reasons why. (Emphasis added.)

Too bad. It would have made a good connection. Applied mathematicians are likely to be familiar with Moore-Penrose pseudoinverses but not categorical adjoints. And pure mathematicians, depending on their interests, may be more familiar with adjoint functors than matrix pseudoinverses.

So what about John Baez’ comment? His comment was expository (and very helpful) but not meant to be rigorous. To make it rigorous you’d have to be rigorous about what you mean by “best approximation” etc. And when you define your terms carefully, in the language of category theory, you get adjoints. This means that the Moore-Penrose inverse, despite its many nice properties [1], doesn’t mesh well with categorical definitions. It’s not the best approximate inverse from a categorical perspective because it doesn’t compose well, and category theory values composition above all else. The Moore-Penrose pseudoinverse may be the best approximate inverse from some perspectives, but not from a categorical perspective.

Przybylek explains

… adjunctions compose … but Moore-Penrose pseudoinverses—generally—do not. … pseudoinverses are not stable under isomorphisms, thus are not categorical.

That’s the gist of his final point. Now let me fill in and expand slightly part of what I cut out.

If f: AB is left adjoint to f+: BA and g: BC is left adjoint to g+: CB then the composition gfAC is left adjoint to the composition f+g+: C → A, but Moore-Penrose pseudoinverses do not compose this way in general.

This turns out to be an interesting example, but not of what I first expected. Rather than the pseudoinverse of a matrix being an example of an adjoint, it is an example of something that despite having convenient properties does not compose well from a categorical perspective.

Related posts:

[1] The book Matrix Mathematics devotes about 40 pages to stating theorems about the Moore-Penrose pseudoinverse.

Categorical Data Analysis

Categorical data analysis could mean a couple different things. One is analyzing data that falls into unordered categories (e.g. red, green, and blue) rather than numerical values (e.g. height in centimeters).

Another is using category theory to assist with the analysis of data. Here “category” means something more sophisticated than a list of items you might choose from in a drop-down menu. Instead we’re talking about applied category theory.

So we have ((categorical data) analysis) and (categorical (data analysis)), i.e. analyzing categorical data and categorically analyzing data. The former is far, far more common.

I ran across Alan Agresti’s classic book the other day in a used book store. The image below if from the third (2012) edition. The book store had the 1st (1990) edition with a more austere cover.

I bought Agresti’s book because it’s a good reference to have. But I was a little disappointed. My first thought was  that someone has written a book on category theory and statistics, which is not the case, as far as I know.

The main reference for category theory and statistics is Peter McCullagh’s 2002 paper What is a statistical model? That paper raised a lot of interesting ideas, but the statistics community did not take McCullagh’s bait.

commutative diagram for statistical models

Maybe this just wasn’t a fruitful idea. I suspect it is a fruitful idea, but the number of people available to develop it, conversant in both statistics and category theory, is very small. I’ve seen category theory used in mathematical modeling more generally, but not in statistics per se.

At its most basic, category theory asks you to be explicit about the domain and range (codomain) of functions. It would be very helpful if statisticians merely did this. Statistical notation is notoriously bad at revealing where a function goes from and to, or even when a function is a function. Just 0th level category theory, defining categories, would be useful. Maybe it would be useful to go on to identifying limits or adjoints, but simply being explicit about “from” and “to” would be a good start.

Category theory is far too abstract to completely carry out a statistical analysis. But it can prompt you to ask questions that check whether your model has any inconsistencies you hadn’t noticed. The idea of a “categorical error” doesn’t differ that much moving from its philosophical meaning under Aristotle to its mathematical meaning under MacLane. Nor does the idea of something being “natural.” One of the primary motivations for creating category theory was to come up with a rigorous definition of what it means for something in math to be “natural.”