Enriched categories

We begin with a couple examples.

First, the set of linear transformations from one vector space to another is itself a vector space.

Second, the set of continuous linear operators from one Banach space to another is itself a Banach space. Or maybe better, this set can be made into a Banach space.

In the first example, it’s pretty obvious how to add linear transformations and how to multiply a linear transformation by a scalar.

The second is a little more involved. Banach spaces are vector spaces, but with more structure. They have a norm, and the space is complete with respect to the topology defined by that norm. So while it’s obvious that the set of continuous linear operators between two Banach spaces is a vector space, it’s not quite obvious that it is in fact a Banach space. The latter requires that we define a norm on this space of continuous operators, and that we prove that this new space is complete. That’s why I said the set “can be made into” a Banach space because some construction is required.

The fancy way to describe these examples is to say that they are both examples of a category enriched over itself. A category is “enriched” if the set of morphisms [1] between two objects can be given the structure of a category itself, a category with more structure than the category of sets. This new category need not be the same as the one you started out with, but it can be.

If the morphisms between objects in a category C have the structure of category D, then we say C is enriched over D. If C = D, then we say the category is enriched over itself. The category of vector spaces with linear transformations is enriched over itself, as is the category of Banach spaces with continuous linear operators.

This post was motivated by a recent comment that said

Another categorical difference between working with groups and working with abelian groups is that “the category of abelian groups is enriched over itself” — in plainer language, between two groups G and H there’s a *set* of homomorphisms from G to H, but if G and H are abelian then this set of homomorphisms has the structure of an *abelian group* as well!

The proof that the homomorphisms between two Abelian groups forms an Abelian group is very simple. We show that if we define the addition of homomorphisms f and g element-wise, the result is a homomorphism.

\begin{align*} (f + g)(x + y) &\equiv f(x + y) + g(x + y) \\ &= f(x) + f(y) + g(x) + g(y) \\ &= f(x) + g(x) + f(y) + g(y) \\ &\equiv (f + g)(x) + (f + g)(y) \end{align*}

The critical step is the third line where we swap the order of f(y) and g(x). That’s where we use the fact that we’re working with Abelian groups.

Denoting the group operation + implies by convention that we’re working with Abelian groups; it goes against convention to use + to denote anything that isn’t commutative. But if our group operation were not commutative, the proof above would be invalid. And not only would the proof be invalid, the theorem would be false. There’s no way to salvage the theorem with a different proof. The set of homomorphisms between two general groups may not be a group.

[1] Think of morphisms as structure-preserving functions. Linear transformations preserve the structure of vector spaces. When our objects have more structure, the morphisms are more restrictive. We wouldn’t want to just consider linear maps between Banach spaces because arbitrary linear maps don’t preserve the topology of the spaces. Instead we look at continuous linear maps. In general morphisms don’t have to be functions, they just have to behave like them, i.e. satisfy the axioms that were motivated by structure-preserving functions.

p-norm trig functions and “squigonometry”

This is the fourth post in a series on generalizations of sine and cosine.

The first post looked at defining sine as the inverse of the inverse sine. The reason for this unusual approach is that the inverse sine is given in terms of an arc length and an integral. We can generalize sine by generalizing this arc length and/or generalizing the integral.

The first post mentioned that you could generalize the inverse sine by replacing “2” with “p” in an integral. Specifically, the function

F_p(x) = \int_0^x (1 - |t|^p)^{-1/p} \,dt

is the inverse sine when p = 2 and in general is the inverse of the function sinp. Unfortunately, there two different ways to define sinp. We next present a generalization that includes both definitions as special cases.

Edmunds, Gurka, and Lang [1] define the function

F_{p,q}(x) = \int_0^x (1 - t^q)^{-1/p} \,dt

and define sinp,q to be its inverse.

The definition of sinp at the top of the post corresponds to sinp,q with p = q in the definition of Edmunds et al.

The other definition, and the one we’ll use for the rest of the post, corresponds to sinr,s where s = p and r = (p-1)/p.

This second definition sinp has a geometric interpretation analogous to that in the previous post for hyperbolic functions [2]. That is, we start at (1, 0) and move clockwise along the p-norm circle until we sweep out an area of α/2. When we have swept out that much area, we are at the point (cosp α, sinp α).

When p = 4, the p-norm circle is also known as a “squircle,” and the p-norm sine and cosine analogs are sometimes placed under the heading “squigonometry.”

Previous posts in the series

[1] David E. Edmunds, Petr Gurka, Jan Lang. Properties of generalized trigonometric functions. Journal of Approximation Theory 164 (2012) 47–56.

[2] Chebolu et al. Trigonometric functions in the p-norm https://arxiv.org/abs/2109.14036

Geometric derivation of hyperbolic trig functions

This is the third post in a series on generalizing sine and cosine.

The previous post looked at a generalization of the sine and cosine functions that come from replacing a circle with a lemniscate, a curve that looks like a figure eight. This post looks at replacing the circle with a hyperbola.

On the unit circle, an arc of length θ starting at (1, 0) and running counterclockwise ends at (cos θ, sin θ), and this can be used to define the two trig functions. The lemniscate functions use an analogous approach, transferring arc length to a new curve.

Hyperbolic functions do not do this. Instead, they generalize a different property of the circle. Again we start out at (1, 0) and move counterclockwise around the unit circle, but this time we look at area of the sector rather than the length of the arc. The sector that ends at (cos θ, sin θ) has area θ/2. We could define sine and cosine by this relation: cos α and sin α are the x and y coordinates of where we stop when we’ve swept out an area of θ/2. This would be an awkward way to define sine and cosine, but it generalizes.

Start at (1, 0) and move along the hyperbola x² – y² = 1 in the first quadrant until the area bounded by the hyperbola, the x-axis, and a line from the origin to your location (x, y) is α/2. Then the x and y coordinates of the place where you stop are cosh x and sinh x respectively.

This is interesting in its own right, but it is also useful to tie together the first post in this series and the next because it is the area generalization rather than the arc length generalization that gives a geometric interpretation to the functions sinp and cosp.

Lemniscate functions

In the previous post I said that you could define the inverse sine as the function that gives the arc length along a circle, then define sine to be the inverse of the inverse sine. The purpose of such a backward definition is that it generalizes to other curves besides the circle. For example, it generalizes to the lemniscate, a curve studied by Bernoulli.

The leminiscate in rectangular coordinates satisfies

(x^2 + y^2)^2 = x^2 - y^2

and in polar coordinates

r^2 = \cos 2\theta

The function arcsl(x), analogous to arcsin(x), is defined as the length of the arc along the leminiscate from the origin to the point (x, y). The length of the arc from (x, y) to the x-axis is arccl(x).

\begin{align*} \mbox{arcsl}(x) &= \int_0^x \frac{dt}{\sqrt{1 - t^4}} \\ \mbox{arccl}(x) &= \int_x^1 \frac{dt}{\sqrt{1 - t^4}} \\ \end{align*}

The lemniscate sine, sl, is the inverse of arcsl, and the lemniscate cosine, cl, is the inverse of arccl. These functions were first studied by Giulio Fagnano three centuries ago.

The lemniscate functions sl and cl are elliptic functions, and so they have a lot of nice properties and satisfy a lot of identities. See Wikipedia, for example. Update: see this follow up post on addition theorems.

Lemniscate constant

As mentioned in the previous post, generalizations of the sine and cosine functions have corresponding generalizations of π.

Just as the period of sine and cosine is 2π, the period of lemninscate sine and lemniscate cosine is 2ϖ.

The number ϖ is called the lemniscate constant. It is written with Unicode character U+03D6, GREEK SMALL LETTER OMEGA PI. The LaTeX command command is \upvarpi.

The lemnmiscate constant ϖ is related to Gauss’ constant G by ϖ = πG.

The area of a squircle is √2 ϖ.

There is also a connection to the beta function: 2ϖ = B(1/4, 1/2).

Generalized trigonometry

In a recent post I mentioned in passing that trigonometry can be generalized from functions associated with a circle to functions associated with other curves. This post will go into that a little further.

The equation of the unit circle is

x^2 + y^2 = 1

and so in the first quadrant

y = \sqrt{1 - x^2}

The length of an arc from (1, 0) to (cos θ, sin θ) is θ. If we write the arc length as an integral we have

\int_0^{\sin \theta} (1 -t^2)^{-1/2} \,dt = \theta

and so

F(x) = \int_0^x (1 - t^2)^{-1/2} \,dt

is the inverse sine of x. Sine is the inverse of the inverse of sine, so we could define the sine function to be the inverse of F.

This would be a complicated way to define the sine function, but it suggests ways to create variations on sine: take the length of an arc along a curve other than the circle, and call the inverse of this function a new kind of sine. Or tinker with the integral defining F, whether or not the resulting integral corresponds to the length along a familiar curve, and use that to define a generalized sine.

Example: sinp

We can replace the 2’s in the integral above with p‘s, defining Fp as

F_p(x) = \int_0^x (1 - |t|^p)^{-1/p} \,dt

and defining sinp to be the inverse of Fp. When p = 2, sinp(x) = sin(x). This idea goes back to E. Lungberg in 1879.

The function sinp has its applications. For example, just as the sine function is an eigenfunction of the Laplacian, sinp is an eigenfunction of the p-Laplacian.

We can extend sinp to be a periodic function with period 4Fp(1). The constants πp are defined as 2Fp(1) so that sinp has period πp and π2 = π.

Future posts

I intend to explore several generalizations of sine and cosine. What happens if you replace a circle with an ellipse or a hyperbola? Or a squircle? How do these variations on sine and cosine compare to the originals? Do they satisfy analogous identities? How do they appear in applications? I’d like to address some of these questions in future posts.

From graph theory to category theory

Let G be a directed graph whose nodes are the positive integers and whose edges represent relations between two integers. In our first example we’ll draw an edge from x to y if x is a multiple of y. In our second example we’ll draw an edge from x to y if xy.

In both examples we define a function p(x, y) to be the unique node such that whenever a node z has directed edges going to x and to y, there is also a directed node from z to p(x, y).

Multiplication example

In this example there will be an edge from x to y if (and only if) x is a multiple of y. So, for instance, there is an edge from every even number to 2. There are edges from 15 to 1, 3, 5, and 15.

Now suppose there is some node with edges to 6 and 7. Call this node p(6, 7) or just p. Then p must be some multiple of 6 and 7. Also, by our definition of p we know that if there is an edge from any node z to 6 and 7, there must also be an edge from z to p. This says every multiple of 6 and 7 must also be a multiple of p. So p = 42. The node labeled z could be 4200, for example, but p can only be 42.

To generalize from this example, the node p(x, y) is the least common multiple of x and y.

Order example

In this example there will be an edge from x to y if and only if xy. Every positive integer points to itself and to every smaller integer.

Now what would p(x, y) be? It’s something no less than x and no less than y. And by definition of p, every number greater than p(x, y) is at least as big as p(x, y). That is, p(x, y) is the smallest integer no less than x or y, i.e. the maximum of x and y.

The reveal

The integer p(x, y) is the product of x and y in the sense of category theory. It may also be the product in the basic sense of multiplying two numbers together, but it might not be. The definition in terms of nodes and edges generalizes the notion of product, so that the familiar product is an example, the canonical example, but not the only example.

The category theory notion of a product abstracts something that multiplication and maximum have in common. More on this here.

We could go back and define a function c(x, y) by saying replacing “to” with “from” in the definition of p. That is, c(x, y) is the unique node such that whenever a node z has directed edges coming from x and from y, there is also a directed node to z coming from c(x, y). The function c is the coproduct.

Emphasizing the edges

In category theory, definitions, such as the definition of product and coproduct, depend not just on the objects but on the morphisms between them. In graph theory language, definitions depend not just on the nodes but also on the edges. Keep the same objects but define different morphisms and you get different products, as we did above.

Often there is a standard set of morphisms (edges), so standard that they are often left implicit. That’s usually OK, but sometimes the morphisms need to be emphasized, either because they are not the usual morphisms or because we need to stress some property of the morphisms. Morphisms are typically structure-preserving functions, and we may need to emphasize the structure-preserving part.

Test functions

Test functions are how you can make sense of functions that aren’t really functions.

The canonical example is the Dirac delta “function” that is infinite at the origin, zero everywhere else, and integrates to 1. That description is contradictory: a function that is 0 almost everywhere integrates to 0, even if you work in extended real numbers where a function can take on the value ∞.

You can make things like the delta function rigorous by saying they’re not functions of real numbers, but functions that operate on other functions, i.e. test functions. More on that here. These functions acting on test functions are called generalized functions or distributions. (This this post for how this kind of distribution differs from a probability distribution, but is analogous.)

Analogy with test charges

To say rigorously how these generalized functions behave you show how they act on test functions φ. Test functions are analogous to test charges: you can describe an electric field by saying what force it would exert on a test charge.

A test charge is somewhat idealized. It has to be so small that it tests a field without effecting it. This isn’t really possible, but you can think of it as a limit. You’re looking at the limit of the force per unit charge as the charge goes to zero.

Similarly, a test function is ideal in that it very well behaved so the generalized functions that act on it can be badly behaved. Test functions are infinitely differentiable, and they either have compact support or have extremely thin tails, depending on context.

Analogy with category theory

While writing the previous post I thought about an analogy between distribution theory and category theory. I worked with distribution theory a lot in grad school, and I found it natural that the definition of a distribution depended on how it acted in relation to something else, i.e. how it acted on all test functions.

But I found category definitions that involved extraneous objects puzzling. For example, the product of two objects is a third object such that for any fourth object (!) a certain diagram commutes. Why is this superfluous object doing injecting itself into the definition? If I’d thought of it as a test object then I would have found the definition more palatable.

As with distribution theory, you’re defining something by how it relates to all elements of some collection. But in distribution theory, your distributions and your test functions are very distinct things. In category theory, your test objects are peers of the thing you’re testing.

Groups vs Abelian groups: Pedantic or profound?

This article will probably only be of interest to a small number of readers. Those unfamiliar with category theory may find it bewildering, and those well versed in category theory may find it trivial. My hope is that someone in between, someone just starting to get a handle on category theory, will find it helpful. I wish I’d run across an article like this when I was in school.

My confusion

As a student I was confused by the inordinate stress on the distinction between general groups and Abelian groups. This seemed like a very simple thing that was being overemphasized. Does multiplication commute or not? If it does, you have an Abelian group; otherwise you do not. That’s all. And yet my professor seemed to think something deep was going on.

What I didn’t appreciate at the time is that there is something deep going on, not when you look at individual groups but when you look at kinds of groups collectively. That is, the category of general groups is quite different from the category of Abelian groups. This distinction was totally lost on me at the time.

Clarifying example

I ran across an exercise recently that pinpoints what I was missing. The exercise asks the reader to show that the product of two cyclic groups is a coproduct in the category of Abelian groups but it is not a coproduct in the category of groups.

Wrong perspective

Here’s how I would have thought about the problem at the time. The coproduct of two cyclic groups is their direct sum, and that’s the same group as the product. The coproduct is an Abelian group, so it’s a group, so it’s in the category of groups. So the statement in the exercise is wrong!

The exercise wasn’t wrong; the thinking above is wrong. But it’s wrong in a very subtle way.

In my mind, a category was a label that you put on things after you’ve done your calculations. This is a bear, it’s brown, so it’s a brown bear. What’s hard about that? What I was missing was the idea of a category as a working context, not just a classification label.

Right perspective

Products and coproducts are defined in the context of a category. That’s what I was missing. In my mind, the coproduct of two groups was defined in some operational way. But what I thought of as a definition was a theorem. The definition depends on the context of a category, the category of Abelian groups, and the thing defined in that context turns out to have the operational properties that I took to be the definition.

You can’t just carry out some calculation and ask what category your result lies in, because the definition of what you’re calculating depends on the context of the category.

In category theory, products and coproducts are defined by universal properties. The (co)product of two objects A and B in a category is defined by saying that something holds for every object C in the category. (More on this here.)

In the category of Abelian groups, we’re saying that something happens for every Abelian group, but not necessarily for every group. That’s why a coproduct in the category of Abelian groups may not be a coproduct in the category of all groups.

Supereggs, squigonometry, and squircles

The Depths of Wikipedia twitter account posted a screenshot about supereggs that’s popular at the moment. It says

there’s no way this is real. they must be making these words up

above a screenshot from the Wikipedia article on supereggs saying

The definition can be changed to have an equality rather than an inequality; this changes the superegg to being a surface of revolution rather than a solid.

I assume the Twitter account is having fun, not seriously suggesting that the terms are made up.

The terms “superegg” and “squircles” are whimsical but have been around for decades and have precise meanings. I hadn’t heard of “squigonometry,” but there are many variations on trigonometry that replace a circle with another curve, the best known example being hyperbolic trigonometry.

The equation for the volume of the superegg looked familiar but not quite right. It turns out the definition of superegg is not quite what I thought it was.

Brass superegg by Piet Hein

Piet Hein coined the terms superellipse and superegg. The photo above is a brass superegg made by Piet Hein [1].

A superellipse is what mathematicians more commonly call a p-norm ball in two dimensions. I assumed that a superegg was a p-norm ball in three dimensions, but it’s not quite.

A unit p-norm ball in 3 dimensions has equation

|x|^p + |y|^p + |z|^p = 1

A superegg, however, has equation

\left(\sqrt{x^2 + y^2}\right)^p + |z|^p = 1

If you slice a p-norm ball horizontally or vertically you get another p-norm ball. So in three dimensions, either a vertical or horizontal slice gives you a superellipse.

But a horizontal slice of a superegg is a circle while a vertical slice is a superellipse, which is not a circle unless p = 2. Said another way, supereggs are symmetric about the z-axis but p-norm balls are not.

I’ve left out one detail: superellipses and supereggs typically stretch one of the axes. So you’d replace x with x/k in the definition of a superellipse or replace z with z/k in the definition of a superegg. A squircle is a superellipse with the two axes equally, and typically p is set to 4 or a value near 4.

Related posts

[1] Photo by Malene Thyssen, licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

Corny AI

Clippy

Meredith Whittaker posted on Twitter that

In addition to being the best in privacy, Signal is also the best in not subjecting you to corny ‘AI’ features no one asked for or wants.

I love the phrase “corny AI.” That’s exactly what a lot of AI features are.

“Would you like help composing that tweet?”

“No than you, I can write tiny text messages by myself.”

AI is the new Clippy.

I’m sure someone will object that these are early days, and AI applications will get better. That’s probably true, but they’re corny today. Inserting gimmicky, annoying technology now on the basis that future versions might be useful is like serving someone unripe fruit.

“This banana is hard and bitter!”

“Yes, but you should enjoy it because it would have been soft and sweet a few days from now.”

Of course not all AI is corny. For example, GPS has become reliable and unobtrusive. But there’s a rush to add AI just so a company can say their product includes AI. If the AI worked really well, the company would brag about how well the software works, not what technology it uses.