Moore-Penrose pseudoinverse is not an adjoint

The Moore-Penrose pseudoinverse of a matrix is a way of coming up with something like an inverse for a matrix that doesn’t have an inverse. If a matrix does have an inverse, then the pseudoinverse is in fact the inverse. The Moore-Penrose pseudoinverse is also called a generalized inverse for this reason: it’s not just like an inverse, it actually is an inverse when that’s possible.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A+ is the unique n by m matrix satisfying four conditions:

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

The first equation says that AA+ is a left identity for A, and A+A is a identity for A.

The second equation says A+A is a left identity for A+, and A A+ is a right identity for A+.

The third and fourth equations say that A A+ and A+A are Hermitian.

If A is invertible, A A+ and A+A are both the identity matrix. Otherwise A A+ and A+A act an awful lot like the identity, as much as you could expect, maybe a little more than you’d expect.

Update: See this post for the relationship between the singular value decomposition and pseudoinverses, and how to compute both in Python and Mathematica.

Galois connections and adjoints

John Baez recently wrote that a Galois connection, a kind of categorical adjunction, is

“the best approximation to reversing a computation that can’t be reversed.”

That sounds like a pseudoinverse! And the first two equations defining a pseudoinverse look a lot like things you’ll see in the context of adjunctions, so the pseudoinverse must be an adjunction, right?

The question was raised on MathOverflow and Michal R. Przybylek answered

I do not think the concept of Moore-Penrose Inverse and the concept of categorical adjunction have much in common (except they both try to generalise the concept of inverse) …

and gives several reasons why. (Emphasis added.)

Too bad. It would have made a good connection. Applied mathematicians are likely to be familiar with Moore-Penrose pseudoinverses but not categorical adjoints. And pure mathematicians, depending on their interests, may be more familiar with adjoint functors than matrix pseudoinverses.

So what about John Baez’ comment? His comment was expository (and very helpful) but not meant to be rigorous. To make it rigorous you’d have to be rigorous about what you mean by “best approximation” etc. And when you define your terms carefully, in the language of category theory, you get adjoints. This means that the Moore-Penrose inverse, despite its many nice properties [1], doesn’t mesh well with categorical definitions. It’s not the best approximate inverse from a categorical perspective because it doesn’t compose well, and category theory values composition above all else. The Moore-Penrose pseudoinverse may be the best approximate inverse from some perspectives, but not from a categorical perspective.

Przybylek explains

… adjunctions compose … but Moore-Penrose pseudoinverses—generally—do not. … pseudoinverses are not stable under isomorphisms, thus are not categorical.

That’s the gist of his final point. Now let me fill in and expand slightly part of what I cut out.

If f: AB is left adjoint to f+: BA and g: BC is left adjoint to g+: CB then the composition gfAC is left adjoint to the composition f+g+: C → A, but Moore-Penrose pseudoinverses do not compose this way in general.

This turns out to be an interesting example, but not of what I first expected. Rather than the pseudoinverse of a matrix being an example of an adjoint, it is an example of something that despite having convenient properties does not compose well from a categorical perspective.

Related math posts

[1] The book Matrix Mathematics devotes about 40 pages to stating theorems about the Moore-Penrose pseudoinverse.

Categorical Data Analysis

Categorical data analysis could mean a couple different things. One is analyzing data that falls into unordered categories (e.g. red, green, and blue) rather than numerical values (e.g. height in centimeters).

Another is using category theory to assist with the analysis of data. Here “category” means something more sophisticated than a list of items you might choose from in a drop-down menu. Instead we’re talking about applied category theory.

So we have ((categorical data) analysis) and (categorical (data analysis)), i.e. analyzing categorical data and categorically analyzing data. The former is far, far more common.

I ran across Alan Agresti’s classic book the other day in a used book store. The image below if from the third (2012) edition. The book store had the 1st (1990) edition with a more austere cover.

I bought Agresti’s book because it’s a good reference to have. But I was a little disappointed. My first thought was  that someone has written a book on category theory and statistics, which is not the case, as far as I know.

The main reference for category theory and statistics is Peter McCullagh’s 2002 paper What is a statistical model? That paper raised a lot of interesting ideas, but the statistics community did not take McCullagh’s bait.

commutative diagram for statistical models

Maybe this just wasn’t a fruitful idea. I suspect it is a fruitful idea, but the number of people available to develop it, conversant in both statistics and category theory, is very small. I’ve seen category theory used in mathematical modeling more generally, but not in statistics per se.

At its most basic, category theory asks you to be explicit about the domain and range (codomain) of functions. It would be very helpful if statisticians merely did this. Statistical notation is notoriously bad at revealing where a function goes from and to, or even when a function is a function. Just 0th level category theory, defining categories, would be useful. Maybe it would be useful to go on to identifying limits or adjoints, but simply being explicit about “from” and “to” would be a good start.

Category theory is far too abstract to completely carry out a statistical analysis. But it can prompt you to ask questions that check whether your model has any inconsistencies you hadn’t noticed. The idea of a “categorical error” doesn’t differ that much moving from its philosophical meaning under Aristotle to its mathematical meaning under MacLane. Nor does the idea of something being “natural.” One of the primary motivations for creating category theory was to come up with a rigorous definition of what it means for something in math to be “natural.”

Natural transformations

The ladder of abstractions in category theory starts with categories, then functors, then natural transformations. Unfortunately, natural transformations don’t seem very natural when you first see the definition. This is ironic since the original motivation for developing category theory was to formalize the intuitive notion of a transformation being “natural.” Historically, functors were defined in order to define natural transformations, and categories were defined in order to define functors, just the opposite of the order in which they are introduced now.

A category is a collection of objects and arrows between objects. Usually these “arrows” are functions, but in general they don’t have to be.

A functor maps a category to another category. Since a category consists of objects and arrows, a functor maps objects to objects and arrows to arrows.

A natural transformation maps functors to functors. Sounds reasonable, but what does that mean?

You can think of a functor as a way to create a picture of one category inside another. Suppose you have some category and pick out two objects in that category, A and B, and suppose there is an arrow f between A and B. Then a functor F would take A and B and give you objects FA and FB in another category, and an arrow Ff between FA and FB. You could do the same with another functor G. So the objects A and B and the arrow between them in the first category have counterparts under the functors F and G in the new category as in the two diagrams below.

A natural transformation α between F and G is something that connects these two diagrams into one diagram that commutes.

The natural transformation α is a collection of arrows in the new category, one for every object in the original category. So we have an arrow αA for the object A and another arrow αB for the object B. These arrows are called the components of α at A and B respectively.

Note that the components of α depend on the objects A and B but not on the arrow f. If f represents any other arrow from A to B in the original category, the same arrows αA and αB fill in the diagram.

Natural transformations are meant to capture the idea that a transformation is “natural” in the sense of not depending on any arbitrary choices. If a transformation does depend on arbitrary choices, the arrows αA and αB would not be reusable but would have to change when f changes.

The next post will discuss the canonical examples of natural and unnatural transformations.

Related: Applied category theory

Tidying up trivial details

The following quote gives a good description of the value of abstract mathematics. The quote speaks specifically of “universal algebra,” but consistent with the spirit of the quote you could generalize it to other areas of mathematics, especially areas such as category theory.

Universal algebra is the study of features common to familiar algebraic systems … [It] places the algebraic notions in their proper setting; it often reveals connexions between seemingly different concepts and helps to systemize one’s thoughts. … [T]his approach does not usually solve the whole problem for us, but only tidies up a mass of rather trivial detail, allowing us to concentrate our powers on the hard core of the problem.

Emphasis added. Source: Universal Algebra by P. M. Cohn

Related: Applied category theory

Category Theory and Facebook

From Drew Armstrong’s notes on adjoint functors:

Once upon a time, my opinion of category theory was the same as my opinion of Facebook: if I ignore it for long enough, hopefully it will go away. It is now my educated opinion that category theory will not go away, and in fact the language of category theory will continue to spread until it becomes the default foundation of mathematics.

More posts on category theory:

Turning math inside-out

Here’s one of the things about category theory that takes a while to get used to.

Mathematical objects are usually defined internally. For example, the Cartesian product P of two sets A and B is defined to be the set of all ordered pairs (ab) where a comes from A and b comes from B. The definition of P depends on the elements of A and B but it does not depend on any other sets.

Category theory turns this inside-out. Operations such as taking products are not defined in terms of elements of objects. Category theory makes no use of elements or subobjects [1]. It defines things by how they act, not their inner workings. People often stress what category theory does not depend on, but they less often stress what it does depend on. The definition of the product of two objects in any category depends on all objects in that category: The definition of the product of objects A and B contains the phrase “such that for any other object X …” [More on categorical products].

The payoff for this inside-out approach to products is that you can say something simultaneously about everything that acts like a product, whether it’s products of sets, products of fields (i.e. that they don’t exist), products of groups, etc. You can’t say something valid across multiple categories if you depend on details unique to one categories.

This isn’t unique to products. Universal properties are everywhere. That is, you see definitions containing “such that for any other object X …” all the time. In this sense, category theory is extremely non-local. The definition of a widget often depends on all widgets.

There’s a symmetry here. Traditional definitions depend on the internal workings of objects, but only on the objects themselves. There are no third parties involved in the definition. Categorical definitions have zero dependence on internal workings, but depend on the behavior of everything in the category. There are an infinite number of third parties involved! [2] You can have a definition that requires complete internal knowledge but zero external knowledge, or a definition that requires zero internal knowledge and an infinite amount of external knowledge.

Related: Applied category theory

* * *

[1] Category theory does have notions analogous to elements and subsets, but they are defined the same way everything else is in category theory, in terms of objects and morphisms, not by appealing to the inner structure of objects.

[2] You can have a category with a finite number of objects, but usually categories are infinite. In fact, they are usually so large that they are “classes” of objects rather than sets.

Category theory and Koine Greek

Fragment of the Gospel of John in Greek

When I was in college, I sat in on a communication workshop for Latin American preachers. This was unusual since I’m neither Latin American nor a preacher, but I’m glad I was there.

I learned several things in that workshop that I’ve used ever since. For example, when you’re gesturing about something moving forward in time, move your hand from left to right from the audience’s perspective. Since English speakers (and for the audience of this workshop, Spanish speakers) read from left to right, we think of time progressing from left to right. If you see someone talking about time moving forward, but you see motion from right to left, you feel a subtle cognitive dissonance. (Presumably you should reverse this when speaking to an audience whose primary language is Hebrew or Arabic.)

Another lesson from that workshop, the one I want to focus on here, is that you don’t always need to convey how you arrived at an idea. Specifically, the leader of the workshop said that if you discover something interesting from reading the New Testament in Greek, you can usually present your point persuasively using the text in your audience’s language without appealing to Greek. This isn’t always possible—you may need to explore the meaning of a Greek word or two—but you can use Greek for your personal study without necessarily sharing it publicly. The point isn’t to hide anything, only to consider your audience. In a room full of Greek scholars, bring out the Greek.

This story came up in a recent conversation with Brent Yorgey about category theory. You might discover something via category theory but then share it without discussing category theory. If your audience is well versed in category theory, then go ahead and bring out your categories. But otherwise your audience might be bored or intimidated, as many people would be listening to an argument based on the finer points of Koine Greek grammar. Microsoft’s LINQ software, for example, was inspired by category theory principles, but you’d be hard pressed to find any reference to this because most programmers don’t want to know or need to know where it came from. They just want to know how to use it.

Some things may sound profound when expressed in esoteric language, such as category theory or Koine Greek, that don’t seem so profound in more down-to-earth language. Expressing yourself in a different language helps filter out pedantry from useful ideas. (On the other hand, some things that looked like pure pedantry have turned out to be very useful. Some hairs are worth splitting.)

Sometimes you have to introduce a new terms because there isn’t a colloquial counterpart. Monads are a good example, a concept from category theory that has entered software development. A monad is what it is, and analogies to burritos and other foods don’t really help. Better to introduce the term and say plainly what it is.

More on applied category theory

New Twitter account for functional programming and categories

I’m starting a new Twitter account @FunctorFact for functional programming and category theory.

These two subjects have a lot of overlap, and some tweets will combine both, but many will be strictly about one or the other. So some content will be strictly about programming, some strictly about math, and some combining ideas from both.

FunctorFact icon

Categorical products

Introduction

There’s an odd sort of partisan spirit to discussions of category theory. They often have the flavor of “Category theory is great!” or “Category theory is a horrible waste of time!” You don’t see this sort of partisanship around, say, probability. Probability theory is what it is, and if you need it, you use it. If you don’t need it, you don’t use it. I think of category theory in a similar way. It’s good for some things and not for others.

In this post I’ll look at just one little piece of category theory, the definition of products, and use it to give a flavor of category theory in general.

Initial objections

The first time I saw category theory’s definition of a product I thought it was a bizarre complication. “The product of A and B is an object P such that for any other object X  …”

What is this X doing in our definition? It’s not our product, nor is it one of the things we’re taking the product of.  And why introduce a diagram? Is the product of two mathematical objects a picture?! Why not come out and say what a product is rather than saying what it does? It’s just ordered pairs, right?

Category theory is all about how things behave rather than what they’re made of inside. So you could say that talking about pairs of elements violates the rules of the game. But that raises the question of why play this game at all. What do we get in return for placing such severe and unusual restrictions on ourselves?

The answer is that we get to see broader connections. When we focus on behavior rather than internal composition, we can see that two things behave the same even though they look different inside. Software developers should be familiar with this idea: depend on interface rather than implementation.

Definition

OK, so what is this mysterious definition of product? It’s a mouthful, but we’ll explain why it has to be what it is.

Given two objects A and B in some category, a product of A and B is an object P in that category and a pair of morphisms π1: PA and π2: PB such that for every object X with morphisms f1: X → A and f2: X → B, there exists a unique morphism f that makes the following diagram commute.

Commutative diagram for categorical product

Whew! That’s a lot more work than saying a product is the set of ordered pairs (ab) with a from A and b from B. And it’s not the first definition of product a student should see. However, there are three reasons why it’s worth introducing later:

  1. The ordered pair definition is not complete.
  2. The categorical definition is not as complex as it seems.
  3. The categorical definition makes new connections visible.

Why not ordered pairs

Saying “a product is just ordered pairs” isn’t enough. You have to say how the product relates to the things it’s a product of. In the case of a Cartesian product of sets, the projections are so obvious that it’s hard to realize they’re even there, but in general they need to be specified.

Another reason the ordered pair definition isn’t complete is that you need to say how the product is structured. If you’re taking the product of groups, for example, then you have to say how the group operation is defined on these ordered pairs. (There’s more than one way to do this. See here.) Or if you’re taking the product of two topological spaces, then you have to say what the topology is on this set whose points are the ordered pairs.

The categorical definition doesn’t tell you how to construct a product, but it tells you how to know when you’ve found something that works. That’s the trade-off: in order to have a theory that exposes wider connections, it can’t be tied to a specific example. Whether that’s an acceptable trade-off depends on your aim.

To reach further with our theory, we have to look at how things behave rather than how they are constructed. So how does a product behave? It lets you take components: here’s the first component, here’s the second. That’s about it. The categorical definition formalizes this in terms of projections, and it says that this is a universal property of products: anything else that acts like a product factors uniquely through the product.

In general you can’t just say products are ordered pairs. Sometimes products are not pairs, and sometimes pairs are not products. So the ordered pair definition doesn’t always apply. And when it does apply, it keeps us from seeing how products relate to coproducts, limits, and other operations.

When products are not pairs

Here’s an example of a product that’s not a pair. A partially ordered set can be viewed as a category. The elements of the set are the objects of the category, and there is an there is a morphism from a to b if a ≤ b. In that case the product of a and b is their minimum a ∧ b.

When pairs are not products

Here’s an example of a pair that’s not a product. The category of fields does not generally have products. You can form ordered pairs of elements from two fields, but you can’t always define any operation on these pairs that will turn them into a field.

For example, the number of elements in a finite field must be a power of a prime. If you take a field of order 5 and a field of order 7, there are 35 ordered pairs of elements, but there is no field of order 35.

But is it worth it?

The categorical definition of products is difficult to understand. It’s analogous to the δ-ε definition of limits: not the first thing you think of, but the rigorous definition that will generalize well into new situations.

Abstraction should follow experience, not precede it. You need to have multiple examples of products in you mind before you see any advantage to abstracting the idea of a product.

So what does the abstraction buy you? Maybe nothing! It depends on what you’re after. One thing it might do for you is help you to be more consistent. Programming language designers, for example, use category theory to make languages more consistent and easier to think about. A language might want to handle various kinds of products uniformly, even when the products look very different at first. In addition to consistently implementing what they should, category theory might guide designers to not implement what they shouldn’t. For example, above we said that it doesn’t make sense in general to take the product of two fields.

Category theory also suggests new questions. For example, duality is pervasive throughout category theory. For every concept, there’s a co-concept. So once you identify a product in some context, it’s natural to ask what coproducts are, and these tend to be less obvious than products. And going back to consistency, category theory might guide you to handle dual concepts in a dual manner.

More category theory posts