This post looks at how to partition complexity between definitions and theorems, and why it’s useful to be able to partition things more than one way.

## Quadratic equations

Imagine the following dialog in an algebra class.

“Quadratic equations always have two roots.”

“But what about (

x− 5)² = 0. That just has one root,x= 5.”“Well, the 5 counts twice.”

## Bézout’s theorem

Here’s a more advanced variation on the same theme.

“A curve of degree

mand a curve of degreenintersect inmnplaces. That’s Bézout’s theorem.”“What about the parabola

y= (x− 5)² and the liney= 0. They intersect at one point, not two points.”“The point of intersection has multiplicity two.”

“That sounds familiar. I think we talked about that before.”

“What about the parabola

y=x² + 1 and the liney= 0. They don’t intersect at all.”“You have to look at complex numbers. They intersect at

x=iandx= −i.”“Oh, OK. But what about the line y = 5 and the line

y= 6. They don’t intersect, even for complex numbers.”“They intersect at the point at infinity.”

In order to make the statement of Bézout’s theorem simple you have to establish a context that depends on complex definitions. Technically, you have to work in complex projective space.

## Definitions and theorems

Michael Spivak says in the preface to his book Calculus on Manifolds

… the proof of [Stokes’] theorem is … an utter triviality. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions …

There are good reasons why the theorems should all be easy and the definitions hard.

There are good reasons, *for the mathematician*, to make the theorems easy and the definitions hard. But for students, there may be good reasons to do the opposite.

Here math faces a tension that programming languages (and spoken languages) face: how to strike a balance between the needs of novices and the needs of experts.

In my opinion, math should be taught bottom-up, starting with simple definitions and hard theorems, valuing transparency over elegance. Then, motivated by the complication of doing things the naive way, you go back and say “In light of what we now know, let’s go back and define things differently.”

It’s tempting to think you can save a lot of time by starting with the abstract final form of a theory rather than working up to it. While that’s *logically* true, it’s not *pedagogically* true. A few people with an unusually high abstraction tolerance can learn this way, accepting definitions without motivation or examples, but not many. And the people who do learn this way may have a hard time applying what they learn.

## Applications

Application requires moving up and down levels of abstraction, generalizing and particularizing. And particularizing is harder than it sounds. This lesson was etched into my brain by an incident I relate here. Generalization can be formulaic, but recognizing specific instances of more general patterns often requires a flash of insight.

Spivak said there are good reasons why the theorems should all be easy and the definitions hard. But I’d add there are also good reasons to remember how things were formulated with hard theorems and easy definitions.

It’s good, for example, to understand analysis at a high level as in Spivak’s book, with all the machinery of differential forms etc. and also be able to swoop down and grind out a problem like a calculus student.

Going back to Bézout’s theorem, suppose you need to find real solutions a system of equations that amounts to finding where a quadratic and cubic curve intersect. You have a concrete problem, then you move up to the abstract setting of Bézout’s theorem learn that there are at most six solutions. Then you go back down to the real world (literally, as in real numbers) and find two solutions. Are there any more solutions that you’ve overlooked? You zoom back up to the abstract world of Bézout’s theorem, and find four more by considering multiplicities, infinities, and complex solutions. Then you go back down to the real world, satisfied that you’ve found all the real solutions.

A pure mathematician might climb a mountain of abstraction and spend the rest of his career there, but applied mathematicians have to go up and down the mountain routinely.

The frogurt is also cursed.

The dialogue on Bézout’s theorem is very funny, but it bothers me. I have two problems:

1. It seems clear that y = (x – 5)² is being considered a curve of degree two. That’s fine with me. But I want to say that y = 0 is a curve of degree zero. That conflicts with the fact that the two curves intersect – regardless of how many times the intersection point counts, two times zero is zero, and one (or two) is more than that.

2. This problem becomes worse in the followup example where the two curves are y = 5 and y = 6. For the earlier problem to work out, we need y = 0 to be a curve of degree one. But that implies that y = 5 and y = 6 should have only a single intersection point. For their intersection at infinity to have multiplicity two — since these two curves would appear to be identical in their degree, whatever that might be — they would need to be curves of degree √2.

What am I missing? How are we calculating the degree of a curve?

I understand why y = 0 looks like it should be called a curve of degree zero, because there’s no x. You probably implicitly have in mind the definition “the degree of a curve y = p(x) is the degree of the polynomial in x.” That’s a definition you’ll see in school.

The definition a curve implicit here is the zero set of a polynomial

in two variables, and the degree of that curve is the degree of that polynomial.The parabola is the zero set of the second degree polynomial y – (x – 5)². The line y = 0 is the zero set of the first degree polynomial y. The lines y = 5 and y = 6 are the zero sets of the polynomials y – 5 and y – 6.

That explains why y = 0 has degree one. But it immediately implies that y = 5 and y = 6 also have degree one, being the zero sets of “f(x,y) = y – 5” and “f(x,y) = y – 6”. How can the intersection of those two curves have multiplicity two? Doesn’t the theorem tell us that they only have one intersection point?

(I can think of a reason why the number of intersections “should” be two – if we consider the two lines y = 5 and x = 6, those should intersect at both the point at infinity and the conventional point (6, 5). But they’re still curves of degree one, and if the number of intersections is the product of their degrees, it must also be one. Parallel lines meet at infinity; do non-parallel lines fail to do the same?)

The theorem doesn’t say that the intersection of the lines y = 5 and y = 6 at infinity has order two, but that intersection of either with a parabola has order two. This is hard to see without getting into the details of projective coordinates, which I may do as its own blog post.

Update: This post goes into the details of why y = 5 and y = 6 intersect at exactly one point in the projective plane.It’s difficult for me to read the dialog in the “Bézout’s theorem” section other than as stating that the lines “y = 5” and “y = 6” have an intersection at the point at infinity, and that that intersection is of order two. Should the dialog be changed?