# The world’s sneakiest substitution

Something that seems like an isolated trick may turn out to be much more important. This is the case with a change of variables discovered by Karl Weierstrass.

Every calculus student learns a handful of standard techniques: u-substitutions, partial fractions, integration by parts, and trig substitutions. Then there is one more technique that is like something off a secret menu, something know only to the cognoscenti: Weierstrass’ t = tan(x/2) trick [1]. Michael Spivak called this “the world’s sneakiest substitution.”

This was on page 325 of the first edition of Spivak’s Calculus, page 360 of the second edition.

This innocent but apparently unmotivated change of variables has the following magical consequences:

• sin(x) = 2t/(1 + t2)
• cos(x) = (1 – t2) / (1 + t2)
• dx = 2 dt/(1 + t2).

This means that the trick can convert an integral containing any rational combination of trig functions into one involving only rational functions, which then can (in principle) be integrated in closed form using partial fractions. It’s the key that unlocks all trig integrals.

However, this is not as practical as it sounds. You may get a high-degree rational function, and while in theory the rational function can be integrated using partial fractions, the decomposition may be tedious if not impossible. Still, it’s interesting that a single trick reduces one large class of integration problems to another.

Now let’s leave integration behind. The equations above say that as t varies, the functions 2t/(1 + t2) and (1 – t2) / (1 + t2) take on all the values that sin(x) and cos(x) do. This means, for example, that you can draw a circle using graphics hardware that does not support sines and cosines. In theory the range of t would have to be infinite, but in practice the range would only have to be large enough to fill in the right pixels.

If we can parameterize a circle with rational functions, what else can we parameterize this way? It turns out, for example, that elliptic curves do not have rational parameterizations. This question is its own niche of algebraic geometry. I don’t know the history here, but it’s plausible some of this math was motivated by wondering what else could be done along the lines of Weierstrass’ trick.

[1] When I’ve mentioned this trick before, some have told me this was a routine part of their course. That was not my experience, either as a student or as an instructor. As a freshman I learned the trick from a friend who was the best of our cohort at integration, someone who would have done well at an integration bee if we’d had one. I felt I’d been let in on a secret.

# Certifying that a system of polynomial equations has no solution

It’s usually easier to show that a problem has a solution than to show that it does not have a solution.

## Analogy with prime numbers

Showing that a number is prime amounts to saying that the problem of finding nontrivial factors has no solution. How could you convince a skeptic that a large number N is prime? If N is small enough that it is possible to divide N by every integer less than √N then that would do it, but that’s not practical if, say, N has 100 digits. Nevertheless there is a way.

Primality certificates are a way to prove that a number is prime, just as demonstrating a factor is a way to prove that a number is not prime. A primality certificate is a set of calculations a skeptic can run that will prove that the number in question is prime. Producing such a certificate takes a lot of computation, but verifying the certificate does not.

I wrote several articles on primality certificates at the beginning of this year. For example, here is one on Pratt certificates and here is one on elliptic curve certificates.

## Analogy with convex optimization

Claiming that you’ve found the maximum value of some function amounts to saying a certain problem has no solution, namely the problem of finding where the function takes on a larger value than the one you claim is optimal. How can you convince a skeptic that you’ve really found the maximum?

You could say “I’ve tried really hard. Here’s a list of things I’ve tried, and see, none of them give a larger value.” Sometimes that’s all you can do. But for convex optimization problems, you can produce a certificate that a skeptic could evaluate that proves that your solution is optimal. More on that here.

## Systems of polynomial equations

Now suppose you have a system of polynomial equations. If you’re able to find a solution, then this solution is its own certificate. Want to prove that supposed solution actually is a solution? Plug it in and see.

But how do you prove that you haven’t overlooked any possibilities and there simply isn’t a solution to be found? Once again there is a kind of certificate.

Suppose you have a system of m polynomial equations in n complex variables.

The system of equations has a solution if and only if the algebraic variety generated by the f‘s is non-empty.

If we can find a polynomial combination of the f‘s that equals 1, i.e. we can find g1 through gm such that

then the variety generated by the f‘s is empty. Such a polynomial combination would be a certificate that the system of equations has no common solution, a certificate of infeasibility. There is an algorithm for finding this polynomial combination, the extended Buchberger algorithm, an analog of the extended Euclidean algorithm.

There is a corresponding theorem for polynomials of a real variable, the so-called positivstellensatz, but it is a little more complicated. The name “complex numbers” is unfortunate because passing to “complex” numbers often simplifies problems.

# An algebraic superegg

One year ago I wrote about a variant of the squircle that is quantitatively close to the customary definition but that has nicer algebraic properties.

That post used the term p-squircle for the usual squircle with equation

where p > 2, and the term s-squircle for the variation with equation

where s is between 0 and 1. When p = 2 or s = 0 we get a circle. As p goes to infinity or s = 1 we get a square.

Now the superegg is formed by rotating a squircle about an axis. To match the orientation in the posts I’ve written about supereggs, replace y with z above and rotate about the z-axis. We will also introduce a scaling factor h, dividing z by h.

The superegg analog of the s-squircle would have equation

The s-superegg is much easier to work with (for some tasks) than the p-superegg with equation

because the former is a polynomial equation and the latter is transcendental (unless p is an even integer). The s-superegg is an algebraic variety while the p-superegg is not in general.

Here’s a plot with s = 0.9 and h = 1.

This plot was made with the following Mathematica code.

    ContourPlot3D[
x^2 + y^2 + z^2 == 0.9^2 (x^2 + y^2) z^2 + 1,
{x, -1, 1}, {y, -1, 1}, {z, -1, 1}]

Yesterday I said that the p-superegg is stable for all p > 2 because the curvature at the bottom is 0. This means the center of curvature is at infinity, which is above the center of mass.

The s-superegg has positive curvature at the bottom for all 0 < s < 1 and so the center of curvature is finite. As s increases, the superegg becomes more like a cylinder and the center of curvature increases without bound. So the s-superegg will be stable for sufficiently large s.

I calculate the center of curvature in the next post.

# Nonlinear algebra

What is nonlinear algebra?

Negations are tricky. They may be the largest source of bugs in database queries. You have to carefully think about what exactly are you negating.

Any time you see “non-” attached to something, you have to ask what the context is in which the negation takes place. For example, if you hear someone say “non polar bear,” do they mean a non-polar bear, i.e. any bear besides a polar bear, such as a grizzly bear? Or do they mean non-polarbear, i.e. anything that isn’t a polarbear, such as a giraffe or an espresso machine?

Another subtlety is whether “non” means “definitely not” or “not necessarily.” I specialized in nonlinear PDEs in grad school, and “nonlinear” always meant “not necessarily linear.” I don’t recall ever seeing a theorem that required an operator to not be linear, only theorems that did not require operators to necessarily be linear.

With that introduction, what could nonlinear algebra mean? To many people, it would mean working with equations that contain nonlinear terms, such as solving sin(x) = 2x. The sine function is certainly nonlinear, but it’s not the kind of function algebraists are usually interested in.

To a large extent, the algebraist’s universe consists of polynomials. In that context, “nonlinear” means “polynomials that are not necessarily linear.” So things that are nonlinear in the sense that x³ is nonlinear, not in the sense that sin(x) is nonlinear.

If you think of linear algebra as the math surrounding the solution of systems of linear equations, nonlinear algebra is the math surrounding the solution of systems of polynomial equations. Note that this uses “non” in the “not necessarily” sense: polynomials that are not necessarily linear. Linearity is not assumed, but it’s not excluded either.

You might be thinking “isn’t that just algebraic geometry?” if you’re familiar with algebraic geometry. Strictly speaking the answer is yes, but there’s a difference in connotation if not denotation.

Nonlinear algebra, at least how the term has been used recently, means the application-motivated study of ways to solve polynomial equations. As a recent book review puts it,

Nonlinear algebra’s focus is on computation and applications, and the theoretical results that need to be developed accordingly. Michalek and Sturmfels explain that this name is not just a rebranding of algebraic geometry but that it is intended to capture this focus, and to be more friendly to applied mathematicians, questioning the historic boundaries between pure and applied mathematics.

I like the term nonlinear algebra. Algebraic geometry is a vast subject, and there needs to be a term that distinguishes solving polynomial equations for applications from studying some of the most esoteric math out there. One could argue that the latter is ultimately the proper theory of the former, but a lot of people don’t have the time or abstraction tolerance for that.

The term nonlinear algebra may be better than applied algebraic geometry because the latter could imply that the goal is to master the entire edifice of algebraic geometry and look for applications. Nonlinear algebra implies it is for someone looking to learn math as immediately useful for solving nonlinear equations as linear algebra is for solving linear equations.

# Rational Trigonometry

Rational trigonometry is a very different way of looking at geometry. At its core are two key ideas. First, instead of distance, do all your calculations in terms of quadrance, which is distance squared. Second, instead of using angles to measure the separation between lines, use spread., which turns out to be the square of the sine of the angle [1].

What’s the point of these two changes? Quadrance and spread can be computed purely algebraically in terms of point coordinates. No square roots, no sines, and no cosines.

The word “rational” in rational trigonometry enters in two ways. The spread of two lines is a rational function of their coordinate descriptions, i.e. the ratio of polynomials. Also, if the coordinates of a set of points are rational numbers, so are the quadrances and spreads.

I became interested in rational trigonometry when I was working on a project to formally verify drone collision avoidance software. Because all calculations are purely algebraic, involving no transcendental functions, theorems are easier to prove.

Rational trigonometry is such a departure from the conventional approach—trig without trig functions!—you have to ask whether it has advantages rather than simply being novel for the sake of being novel. I mentioned one advantage above, namely easier formal theorem proving, but this is a fairly esoteric application.

Another advantage is accuracy. Because a computer can carry out rational arithmetic exactly, rational trigonometry applied to rational points yields exact results. No need to be concerned about the complications of floating point numbers.

A less obvious advantage is that the theory of rational trigonometry is independent of the underlying field [2]. You can work over the rational numbers, but you don’t need to. You could work over real or complex numbers, or even finite fields. Because you don’t take square roots, you can work over fields that don’t necessarily have square roots. If you’re working with integers modulo a prime, half of your numbers have no square root and the other half have two square roots. More on that here.

Why would you want to do geometry over finite fields? Finite fields are important in applications: error-correcting codes, cryptography, signal processing, combinatorics, etc. And by thinking of problems involving finite fields as geometry problems, you can carry your highly developed intuition for plane geometry into a less familiar setting. You can forget for a while that you’re working over a finite field and proceed as you ordinarily would if you were working over real numbers, then check your work to make sure you didn’t do anything that is only valid over real numbers.

[1] Expressing spread in terms of sine gives a correspondence between rational and classical trigonometry, but it is not a definition. Spread is defined as the ratio of certain quadrances. It is not necessary to first compute a sine, and indeed rational trigonometry extends to contexts where it is not possible to define a sine.

[2] Rational trig is not entirely independent of field. It requires that fields not have characteristic 2, fields where 1 + 1 ≠ 0. This rules out finite fields of order 2n.

# Why determinants with columns of ones?

Geometric equations often involve a determinant with a column of 1s. For example, the equation of a line through two points:

The equation of a circle through three points:

The equation of a general conic section through five points

The equation of a Mobius (bilinear) transformation sending z1, z2, and z3 to w1, w2, and w3:

Why all the determinants and why all the 1s?

When you see a determinant equal to zero, you immediately think of matrix rows or columns being linearly dependent. But in the examples above it isn’t the Cartesian coordinates that are linearly dependent but projective coordinates that are dependent.

The 1s are in the last column, though they need not be, as a clue as to where they came from. You could permute the rows and columns any way you like and the determinant would still be zero. The 1s are in the last column because you can take Cartesian coordinates into projective coordinates by adding a 1 at the end.

This 1 is sort of a silent partner, and can be ignored much of the time. But the last projective coordinate is critical when it’s necessary to be rigorous about points at infinity. The examples above are interesting because they are an application of homogeneous coordinates when there’s no concern about points at infinity.

# When a cubic or quartic has a double root

Thanks to the quadratic equation, it’s easy to tell whether a quadratic equation has a double root. The equation

has a double root if and only if the discriminant

is zero.

The discriminant of a cubic is much less known, and the analogs for higher order polynomials are unheard of. There is a discriminant for polynomials of all degrees, though the complexity of the discriminant grows quickly with the degree of the polynomial.

This post will derive the discriminant of a cubic and a quartic.

## Resultants

The resultant of a pair of polynomials is zero if and only if the two polynomials have a common root. Resultants have come up in a couple previous posts about solving trigonometric equations.

A polynomial p(x) has a double root if p and its derivative p‘ are both zero somewhere. The discriminant of p is the resultant of p and p’.

The resultant of two polynomials is a determinant of their Sylvester matrix. This matrix is easier to describe by example than by equation. You basically fill a matrix with shifts of the coefficients of both polynomials and fill in the gaps with zeros.

MathWorld gives the following Mathematica code for the Sylvester matrix of two inputs.

    SylvesterMatrix1[poly1_, poly2_,  var_] :=
Function[{coeffs1, coeffs2}, With[
{l1 = Length[coeffs1], l2 = Length[coeffs2]},
Join[
l1 + l2 -  2], l2 - 2],
l1 + l2 - 2], l1 - 2]
]
]
][
Reverse[CoefficientList[poly1, var]],
Reverse[CoefficientList[poly2, var]]
]


## Cubic discriminant

If we apply this to the cubic polynomial

we get the following matrix.

We can compute the resultant by taking the determinant of the above matrix.

    g[x_] := a x^3 + b x^2 + c x + d
SylvesterMatrix1[g[x], D[g[x], x], x]


We get the following result

and we can verify that this is the same result we would get from calling the Resultant directly with

    Resultant[g[x], D[g[x], x], x]

Although the resultant is defined in terms of a determinant, that doesn’t mean that resultants are necessarily computed by computing determinants. The Sylvester matrix is a very special matrix, and there are clever ways to exploit its structure to create more efficient algorithms.

Each term in the resultant has a factor of a, and the discriminant is the resultant divided by −a.

## Quartic discriminant

Now let’s repeat our exercise for the quartic. The Sylvester matrix for the quartic polynomial

and its derivative is

I created the image above with the following Mathematica code.

    f[x_] := a x^4 + b x^3 + c x^2 + d x + e
TeXForm[SylvesterMatrix1[f[x], D[f[x], x], x]]


If we take the determinant, we get the resultant, but it’s a mess.

Again each term has a factor of a, so we can divide by a. to get the discriminant.

If we want to use this in code, we can have Mathematica export the expression in C code using CForm. To generate Python code, it’s more convenient to use FortranForm since Python like Fortran uses ** for exponents.

The following Python code was created by pasting the output of

    FortranForm[Resultant[f[x], D[f[x], x], x]]

and making it into a function.

    def quartic_resultant(a, b, c, d, e):
return a*b**2*c**2*d**2 - 4*a**2*c**3*d**2 - 4*a*b**3*d**3 + 18*a**2*b*c*d**3 \
- 27*a**3*d**4 - 4*a*b**2*c**3*e + 16*a**2*c**4*e + 18*a*b**3*c*d*e \
- 80*a**2*b*c**2*d*e - 6*a**2*b**2*d**2*e + 144*a**3*c*d**2*e \
- 27*a*b**4*e**2 + 144*a**2*b**2*c*e**2 - 128*a**3*c**2*e**2 \
- 192*a**3*b*d*e**2 + 256*a**4*e**3


Let’s try this on a couple examples. First

which has a double root at 0.

As expected

    quartic_resultant(1, -5, 6, 0, 0)

returns 0.

Next let’s try

The call

    quartic_resultant(1, -10, 35, -50, 24)

returns 144. We expected a non-zero result since our polynomial has distinct roots at 1, 2, 3, and 4.

## Higher order discriminants

In general the discriminant of an nth degree polynomial is the resultant of the polynomial and its derivative, up to a constant. There’s no need to worry about this constant if you’re only concern is whether the discriminant is zero. To get the exact discriminant, you divide the resultant by the leading coefficient of the polynomial and adjust the sign.

The sign convention is a little strange. If you look back at the examples above, we divided by −a in the cubic case but we divided by a in the quartic case. You might reasonably guess that you should divide by a and multiply by (−1)n. But that won’t give you right sign for the quadratic case. The conventional sign is

(−1)n(n − 1)/2.

So when n equals 2 or 3 we get a negative sign, but when n equals 4 we don’t.

# Finding where two quadratic curves intersect

Suppose you have two quadratic polynomials in two real variables, f(x, y) and g(x, y), and you want to know whether the two curves

f(x, y) = 0

and

g(x, y) = 0

intersect, and if they do, find where they intersect.

David Eberly has a set of notes on solving systems of polynomial equations that give the conditions you need. The conditions are quite complicated, but they are explicit and do not involve any advanced math. He gives 14 equations that combine to form the coefficients in a 4th degree polynomial that will tell you whether the curves intersect, and will get you started on finding the points of intersection.

## Algebraic geometry

There is a more elegant solution, but it requires much more background. It requires working in the complex projective plane ℙ² rather than the real plane ℝ². What does that mean? First, it means you work with complex numbers rather than real numbers. Second, it means adding “points at infinity.” This sounds awfully hand-wavy, but it can all be made rigorous using homogeneous coordinates with three complex numbers.

If you haven’t seen this before, you may be thinking “You want me to go from pairs of real numbers to triples of complex numbers? And this is supposed to make things easier?!” Yes, it actually does make things easier.

Once you move to ℙ², the question of whether the curves f(x, y) = 0 and g(x, y) = 0 intersect becomes trivial: yes, they intersect. Always. A special case of Bézout’s theorem says any two quadratic curves are either the same curve or they intersect in at 4 points, counting intersections with multiplicity.

Your original question was whether the curves intersect in ℝ², and now that question reduces to asking whether the 4 points guaranteed by Bézout’s theorem lie in ℝ². They might not be real, or they might be points at infinity.

Even if you go by David Eberly’s more elementary approach, you still face a similar question. You have a fourth degree polynomial, and you need to know whether it has real roots. How would you do that? You’d probably find all four complex roots first and see whether any are in fact real. Moving from the real numbers to the complex numbers makes it easier to find the roots of polynomials, and moving to complex projective space carries this idea further.

In the real plane ℝ² the zero set of a quadratic polynomial can be an ellipse, a parabola, or a hyperbola. (The circle is a special case of an ellipse.) But in ℙ² there is only one kind of conic section: there is a change of variables that turns the zero set of any quadratic polynomial into a circle. So now we’ve reduced the problem to finding where two circles intersect.

Finding the intersection of two conic sections illustrates the principles discussed in the preface to Algebraic Geometry: A Problem Solving Approach.

One way of doing mathematics is to ask bold questions about the concepts your are interested in studying. Usually this leads to fairly complicated answers having many special cases. An important advantage to his approach is that questions are natural and easy to understand. A disadvantage is that proofs are hard and often involve clever tricks, the origins of which are very hard to see.

A second approach is to spend time carefully defining the basic terms, with the aim that the eventual theorems and proofs are straightforward. Here the difficulty is in understanding how the definitions, which often initially seem somewhat arbitrary, ever came to be. The payoff is that the deep theorems are more natural, their insights more accessible, and the theory made more aesthetically pleasing.

# Ideal background for algebraic geometry

… in an ideal world, people would learn this material over many years, after having background courses in commutative algebra, algebraic topology, differential geometry, complex analysis, homological algebra, number theory, and French literature.

# Solving systems of polynomial equations

In a high school algebra class, you learn how to solve polynomial equations in one variable, and systems of linear equations. You might reasonably ask “So when do we combine these and learn to solve systems of polynomial equations?” The answer would be “Maybe years from now, but most likely never.” There are systematic ways to solve systems of polynomial equations, but you’re unlikely to ever see them unless you study algebraic geometry.

Here’s an example from [1]. Suppose you want to find the extreme values of

on the unit sphere using Lagrange multipliers. This leads to the following system of polynomial equations where λ is the Lagrange multiplier.

There’s no obvious way to go about solving this system of equations. However, there is a systematic way to approach this problem using a “lexicographic Gröbner basis.” This transforms the problem from into something that looks much worse but that is actually easier to work with. And most importantly, the transformation is algorithmic. It requires some computation—there are numerous software packages for doing this—but doesn’t require a flash of insight.

The transformed system looks intimidating compared to the original:

We’ve gone from four equations to eight, from small integer coefficients to large fraction coefficients, from squares to seventh powers. And yet we’ve made progress because the four variables are less entangled in the new system.

The last equation involves only z and factors nicely:

This cracks the problem wide open. We can easily find all the possible values of z, and once we substitute values for z, the rest of the equations simplify greatly and can be solved easily.

The key is that Gröbner bases transform our problem into a form that, although it appears more difficult, is easier to work with because the variables are somewhat separated. Solving one variable, z, is like pulling out a thread that then makes the rest of the threads easier to separate.

* * *

[1] David Cox et al. Applications of Computational Algebraic Geometry: American Mathematical Society Short Course January 6-7, 1997 San Diego, California (Proceedings of Symposia in Applied Mathematics)