Broadcasting and functors

In my previous post, I looked at the map Δ that takes a column vector to a diagonal matrix. I even drew a commutative diagram, which foreshadows a little category theory.

Suppose you have a function f of a real or complex variable. To an R programmer, if x is a vector, it’s obvious that f(x) means to apply f to every component of a vector. Python (NumPy) works the same way, and calls this broadcasting. To a mathematician, this looks odd. What does the logarithm of a vector, for example, even mean?

As in the previous post, we can use Δ to formalize things. We said that Δ has some nice properties, and in fact we will show it is a functor.

To have a functor, we have to have categories. (Historically, functors came first; categories were defined in order to define functors.) We will define C to be the category of column vectors and M the category of square matrices as before. Or rather, we should say the objects of C are column vectors and the objects of M are square matrices.

Categories need morphisms, functions between objects [1]. We define the morphisms on C to be analytic functions applied componentwise. So, for example, if

z = [1, 2, -3],

then

tan(z) = [tan(1), tan(2), tan(-3)].

The morphisms on M will be analytic functions on square matrices, not applied componentwise but applied by power series. That is, given an analytic function f, we define f of a square matrix X as the result of sticking the matrix X into the power series for f. For an example, see What is the cosine of a matrix?

We said that Δ is a functor. It takes column vectors and turns them into square matrices by putting their contents along the diagonal of a matrix. We gave the example in the previous post that [4, i, π] would be mapped to the matrix with these elements on the diagonal, i.e.

\Delta: \begin{pmatrix} 4 \\ i \\ \pi \end{pmatrix} \mapsto \begin{pmatrix} 4 & 0 & 0\\ 0 & i & 0 \\ 0 & 0 & \pi \end{pmatrix}

That says what Δ does on objects, but what does it do on morphisms? It takes an analytic function that was applied componentwise to column vectors, and turns it into a function that is applied via its power series to square matrices. That is, starting with a function

f(z) = \sum_{n=0}^\infty a_nz^n

we define the morphism f on C by

f : \begin{pmatrix} z_1 \\ z_2 \\ \vdots \\ z_n \end{pmatrix} \mapsto \begin{pmatrix} f(z_1) \\ f(z_2) \\ \vdots \\ f(z_n) \end{pmatrix}

and the morphism Δ f on M by

\Delta f : Z \mapsto \sum_{n=0}^\infty a_n Z^n

where Z is a square matrix.

We can apply f to a column vector, and then apply Δ to turn the resulting vector into a diagonal matrix, or we could apply Δ to turn the vector into a diagonal matrix first, and then apply f (technically,  Δf). That is, the follow diagram commutes:

\begin{diagram}   C & \rTo^{\Delta} & M \\   \uTo^{f} & & \uTo_{\Delta f} \\   C & \rTo^{\Delta} & M  \end{diagram}

Python example

Applying an analytic function to a diagonal matrix gives the same result as simply applying the function to the elements of the diagonal. But for more general square matrices, this is not the case. We will illustrate this with some Python code.

    import numpy as np
    from scipy.linalg import funm

    d = np.array([1, 2])
    D = np.diag(d)
    M = np.array([[1, np.pi], [2, 0]])

Now let’s look at some output.

    >>> np.sin(d)                     
    array([0.84147098, 0.90929743])   

    >>> np.sin(D)                     
    array([[0.84147098, 0.        ],  
           [0.        , 0.90929743]]) 

    >>> funm(D, np.sin)               
    array([[0.84147098, 0.        ],  
           [0.        , 0.90929743]]) 

So if we take the sine of d and turn the result into a matrix, we get the same thing as if we turn d into a matrix D and then take the sine of D, either componentwise or as an analytic function (with funm, function of a matrix).

Now let’s look at a general, non-diagonal matrix.

    >>> np.sin(M)
    array([[0.84147099, 0],
       [0.90929743, 0]])

    >>> funm(D, np.sin)
    array([[0.84147098, 0.        ],
           [0.        , 0.90929743]])

Note that the elements in the bottom row are in opposite positions in the two examples.

[1] OK, morphisms are not necessarily functions, but in practice they usually are.

Drawing commutative diagrams with Quiver

I recently discovered quiver, a tool for drawing commutative diagrams. It looks like a nice tool for drawing diagrams more generally, but it’s designed particularly to include the features you need when drawing the kinds of diagrams that are ubiquitous in category theory.

You can draw diagrams using the online app and export the result to LaTeX (tikz).

Here are some examples.

quiver diagrams

The app is mostly easy to figure out. The object and arrow labels are specified using LaTeX.

Quiver lets you position labels on either side of an arrow, or even inside the arrow as in the e in the first diagram above.

You can change the arrow and arrowhead styles, as in the second example.

It took me a little while to figure out how to have two arrows with the same destination and source, as in the third example. You use the “offset” setting to move arrows up or down do that they’re not on top of each other. And by default, the software helpfully assumes that if you want to move one arrow up a bit from center, you probably want to move the other one down from center by the same amount.

You can use the “curve” setting to draw arrows as in the final example. You can also draw arrows between arrows, as is common when working with 2-categories.

More category theory posts

Category theory for programmers made easier

I imagine most programmers who develop an interest in category theory do so after hearing about monads. They ask someone what a monad is, and they’re told that if they really want to know, they need to learn category theory.

Unfortunately, there are couple unnecessary difficulties anyone wanting to understand monads etc. is likely to face immediately. One is some deep set theory.

“A category is a collection of objects …”

“You mean like a set?”

“Ah, well, no. You see, Bertrand Russell showed that …”

There are reasons for such logical niceties, but they don’t matter to someone who wants to understand programming patterns.

Another complication is morphisms.

“As I was saying, a category is a collection of objects and morphisms between objects …”

“You mean like functions?”

“Well, they might be functions, but more generally …”

Yes, Virginia, morphisms are functions. It’s true that they might not always be functions, but they will be functions in every example you care about, at least for now.

Category theory is a framework for describing patterns in function composition, and so that’s why things like monads find their ultimate home in category theory. But doing category theory rigorously requires some setup that people eager to get into applications don’t have to be concerned with.

Patrick Honner posted on Twitter recently that his 8-year-old child asked him what area is. My first thought on seeing that was that a completely inappropriate answer would be that this is a deep question that wasn’t satisfactorily settled until the 20th century using measure theory. My joking response to Patrick was

Well, first we have to define σ-algebras. They’re kinda like topologies, but closed under countable union and intersection instead of arbitrarily union and finite intersection. Anyway, a measure is a …

It would be ridiculous to answer a child this way, and it is nearly as ridiculous to burden a programmer with unnecessary logical nuance when they’re trying to find out why something is called a functor, or a monoid, or a monad, etc.

I saw an applied category theory presentation that began with “A category is a graph …” That sweeps a lot under the rug, but it’s not a bad conceptual approximation.

So my advice to programmers learning category theory is to focus on the arrows in the diagrams. Think of them as functions; they probably are in your application [1]. Think of category theory as a framework for describing patterns. The rigorous foundations can be postponed, perhaps indefinitely, just as an 8-year-old child doesn’t need to know measure theory to begin understanding area.

More category theory posts

[1] The term “contravariant functor” has unfortunately become deprecated. In more modern presentations, all functors are covariant, but some are covariant in an opposite category. That does make the presentation more slick, but at the cost of turning arrows around that used to represent functions and now don’t really. In my opinion, category theory would be more approachable if we got rid of all “opposite categories” and said that functors come in two flavors, covariant and contravariant, at least in introductory presentations.

Category theory for data science: cautious optimism

I’m cautiously optimistic about applications of category theory. I believe category theory has the potential to be useful, but I’m skeptical of most claims to have found useful applications. Category theory has found useful application, especially behind the scenes, but a lot of supposed applications remind me of a line from Colin McLarty:

[Jean-Pierre] Serre created a series of concise elegant tools which Grothendieck and coworkers simplified into thousands of pages of category theory.

To be fair, Grothendieck put these recast tools to good use, but most mere mortals would stand a better chance of being able to use Serre’s versions.

My interest in category theory waxes and wanes, and just as it was at its thinnest crescent phase I ran across CQL, categorical query language. I haven’t had time to look very far into it, but it seems promising. The site’s modest prose relative to the revolutionary rhetoric of some category enthusiasts makes me have more confidence that the authors may be on to something useful.

Related post: Categorical (Data Analysis) vs (Categorical Data) Analysis

How category theory is applied

Instead of asking whether an area of mathematics can be applied, it’s more useful to as how it can be applied.

Differential equations are directly and commonly applied. Ask yourself what laws govern the motion of some system, write down these laws as differential equations, then solve them. Statistical models are similarly direct: propose a model and feed it data.

Linear algebra is extremely useful in application, but it’s not often applied so directly. Rarely would you look at a system and immediately see a linear system. Instead you might describe a system in terms of differential equations or statistical models, then use linear algebra as a key part of the solution process.

Numerical analysis is useful because, for example, it helps you practically solve large linear systems (which help you solve differential equations, which model physical systems).

Many areas of math are useful in application, but some are applied more directly than others. It’s a mistake to say something is not applied just because it is not applied directly.

The following quote from Colin McLarty describes how some of the most abstract math, including cohomology and category theory, is applied.

Cohomology does not itself solve hard problems in topology or algebra. It clears away tangled multitudes of individually trivial problems. It puts the hard problems in clear relief and makes their solution possible. The same holds for category theory in general.

While McLarty speaks of applications to other areas of math, the same applies to applications to other areas such as software engineering.

I suspect many supposed applications of category theory are post hoc glosses, dressing up ideas in categorical language that were discovered in a more tangible setting. At the same time, the applications of category theory may be understated because the theory works behind the scenes. As I discuss here, category theory may lead to insights that are best communicated in the language of the original problem, not in the language of categories.

More on applying abstract math

Groups in categories

The first time I saw a reference to a “group in a category” I misread it as something in the category of groups. But that’s not what it means. Due to an unfortunately choice of terminology, “in” is more subtle than just membership in a class.

This is related to another potentially misleading term, algebraic groups, mentioned in the previous post on isogenies. An algebraic group is a “group object” in the category of algebraic varieties. Note the mysterious use of the word “in.”

You may have heard the statement “A monad is just a monoid in the category of endofunctors.” While true, the statement is meant to be a joke because it abstracted so far from what most people want to know when they ask what a monad is in the context of functional programming. But notice this follows the pattern of an X in the category of Y‘s, even though here X stands for monoid rather than group. The meaning of “in the category of” is the same.

If you want to go down this rabbit hole, you could start with the nLab article on group objects. A group object is a lot like a group, but everything is happening “in” a category.

Take a look at the list of examples. The first says that a group object in the category of sets is a group. That’s reassuring. The second example says that a group object in the category of topological spaces is a topological group. At this point you may get the idea that an X in the category of Y‘s simply adds an X structure to a Y thing. But further down you’ll see that a group object in the category of groups is an Abelian group, which is an indication that something more subtle is going on.

More category theory posts

Monads and generalized elements

Paolo Perrone gives a nice, succinct motivation for monads in the introduction to his article on probability and monads.

… a monad is like a consistent way of extending spaces to include generalized elements of a specific kind.

He develops this idea briefly, and links to his dissertation where he gives a longer exposition (pages 8–14).

Related post: Monads are hard because …

Curry-Howard-Lambek correspondence

Curry-Howard-Lambek is a set of correspondences between logic, programming, and category theory. You may have heard of the slogan proofs-as-programs or propositions-as-types. These refer to the Curry-Howard correspondence between mathematical proofs and programs. Lambek’s name is appended to the Curry-Howard correspondence to represent connections to category theory.

The term Curry-Howard isomorphism is often used but is an overstatement. Logic and programming are not isomorphic, nor is either isomorphic to category theory.

You’ll also hear the term computational trinitarianism. That may be appropriate because logic, programming, and category theory share common attributes without being identical.

The relations between logic, programming, and categories are more than analogies, but less than isomorphisms.

There are formal correspondences between parts of each theory. And aside from these rigorous connections, there is also a heuristic that something interesting in one area is likely to have an interesting counterpart in the other areas. In that sense Curry-Howard-Lambek is similar to the Langlands program relating number theory and geometry, a set of theorems and conjectures as well as a sort of philosophy.

Epi and mono

My first math classes used the terms one to one and onto to describe functions. These Germanic names have largely been replaced with their French equivalents injective and surjective.

Monic and epic are the category theory analogs of injective and surjective respectively. This post will define these terms and show how they play out in several contexts. In some categories these new terms correspond exactly to their traditional counterparts, but in others they do not.

Sets and functions

A function f from a set A to a set B is injective (one-to-one) if different points on A go to different points in B. That is, f(x) = f(y) only if xy.

The function f  is surjective (onto) if everything in B is covered. That is, for every b in B, there is some a in A such that f(a) = b.

A function is bijective if it is both injective and surjective.

Categories and morphisms

How would we generalize these definitions to category theory? This is not obvious because the rules of category theory don’t let us peek inside objects and talk about elements. Everything has to be characterized in terms of objects and morphisms.

It turns out that the categorical way to view things is to focus on when composition can be cancelled out, either on the left or on the right.

Epimorphisms

A morphism f from an object X to an object Y is an epimorphism if for any other object Z, and any pair of morphisms g1 and g2 from Y to Z, if g1 f = g2 f then g1 = g2. Instead of saying f is an epimorphism, some authors would say f is epic or f is an epi.

epimorphism diagram

If a morphism f is epic, we can cancel it out on the right.

(Unfortunately it is conventional to write function composition from right to left, like splicing in a little bit of Hebrew inside English prose. In the diagram above, f is on the left, and composition proceeds from left to right. But when we write the composition of functions we write from right to left, e.g. gf. When we say we can cancel out f on the right, we mean the right of the expression gf, not on the right of a commutative diagram!)

Monomorphisms

Monomorphism is the dual concept to epimorphism.

A morphism f from an object X to an object Y is an monomorphism if for any other object Z, and any pair of morphisms g1 and g2from Z to X, if f g1 = f g2 then g1 = g2. Instead of saying f is a monomorphism, some authors would say f is monic or f is a mono.

monomorphism diagramIf a morphism f is monic, we can cancel it out on the left, in the sense of fg as explained above.

Examples

In any category, an iso is epic and monic. That is, an isomorphism is always an epimorphism and a monomorphism. The converse holds in some categories but not in others.

Sets and groups

In the category of sets, a function f from X to Y is an epimorphism iff (if an only if) it is surjective.

Also in the category of sets, a function is a monomorphism iff it is injective.

Groups are similar in that a group homomorphism is an epimorphism iff it surjective, and a monomorphism iff it is injective.

Monoids

Recall that a monoid is like a group, except you don’t necessarily have inverses. That is, you have an associative operation and an identity element.

Let N be the non-negative integers with addition and let Z be the integers. Let f be the inclusion map that takes N into Z. Then f is injective and a monomorphism. It is not surjective, but it is an epimorphism because homomorphism from the integers to another monoid is determined by its values on the positive integers. Its range leaves out half the codomain, but we can still cancel it out on the right.

So here we have a function that is epic and monic, but not an isomorphism.

Rings

A ring homomorphism is a monomorphism iff it is injective. But as with monoids, a ring homomorphism can be an epimorphism without being surjective.

For example, consider the inclusion map from the integers to the rationals. A ring homomorphism is determined by its values on integers, so the inclusion map is an epimorphism even though it is not surjective.

Incidentally, homomorphism between fields is monic.

More category theory posts

Currying in calculus, PDEs, programming, and categories

logician Haskell Brooks Curry

Currying is a simple but useful idea. (It’s named after logician Haskell Curry [1] and has nothing to do with spicy cuisine.) If you have a function of two variables, you can think of it as a function of one variable that returns a function of one variable. So starting with a function f(x, y), we can think of this as a function that takes a number x and returns a function f(x, -) of y. The dash is a placeholder as in this recent post.

Calculus: Fubini’s theorem

If you’ve taken calculus then you saw this in the context of Fubini’s theorem:

\int_a^b\!\int_c^d f(x, y) \, dx \, dy= \int_a^b\left(\int_c^d f(x, y)\, dx\right )\,dy

To integrate the function of two variables f(x, y), you can temporarily fix y and integrate the remaining function of x. This gives you a number, the value of an integral, for each y, so it’s a function of y. Integrate that function, and you have the value of the original function of two variables.

The first time you see this you may think it’s a definition, but it’s not. You can define the integral on the left directly, and it will equal the result of the two nested integrations on the right. Or at least the two sides will often be equal. The conditions on Fubini’s theorem tell you exactly when the two sides are equal.

PDEs: Evolution equations

A more sophisticated version of the same trick occurs in partial differential equations. If you have an evolution equation, a PDE for a function on one time variable and several space variables, you can think of it as an ODE via currying. For each time value t, you get a function of the spatial variables. So you can think of your solution as a path in a space of functions. The spatial derivatives specify an operator on that space of functions.

(I’m glossing over details here because spelling everything out would take a lot of writing, and might obscure the big idea, which relevant for this post. If you’d like the full story, you can see, for example, my graduate advisor’s book. It was out of print when I studied it, but now it’s a cheap Dover paperback.)

Haskell programming

In the Haskell programming language (also named after Haskell Curry) you get currying for free. In fact, there’s no other way to express a function of two variables. For example, suppose you want to implement the function f(xy) = x² + y.

    Prelude> f x y = x**2 + y

Then Haskell thinks of this as a function of one variable (i.e. x), that returns a function of one variable (i.e. f(x, -)) which itself returns a number (i.e. f(x, y)). You can see this by asking the REPL for the type of f:

    Prelude> :info f
    f :: Floating a => a -> a -> a

Technically, Haskell, just like lambda calculus, only has functions of one variable. You could create a product datatype consisting of a pair of variables and have your function take that as an argument, but it’s still a function on one variable, though that variable is not atomic.

Category theory

The way you’d formalize currying in category theory is to say that the following is a natural isomorphism:

\mathrm{Hom}(A \times B, C) \cong \mathrm{Hom}(A, \mathrm{Hom}(B, C))

For more on what Hom means, see this post.

Related posts

[1] In concordance with Stigler’s law of eponymy, currying was not introduced by Curry but Gottlob Frege. It was then developed by Moses Schönfinkel and developed further by Haskell Curry.