Checksum polynomials

A large class of checksum algorithms have the following pattern:

  1. Think of the bits in a file as the coefficients in a polynomial P(x).
  2. Divide P(x) by a fixed polynomial Q(x) mod 2 and keep the remainder.
  3. Report the remainder as a sequence of bits.

In practice there’s a little more to the algorithm than this, such as appending the length of the file, but the above pattern is at the heart of the algorithm.

There’s a common misconception that the polynomial Q(x) is irreducible, i.e. cannot be factored. This may or may not be the case.

CRC-32

Perhaps the most common choice of Q is

Q(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x3 + x2 + x + 1

This polynomial is used in the cksum utility and is part of numerous standards. It’s know as CRC-32 polynomial, though there are other polynomials occasionally used in 32-bit implementations of the CRC algorithm. And it is far from irreducible as the following Mathematica code shows. The command

    Factor[x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + 
           x^11 + x^10 + x^8 +  x^7 + x^5 + x^4 + 
           x^3 + x^2 + x + 1, Modulus -> 2]

shows that Q can be factored as

(1 + x)5 (1 + x + x3 + x4 + x6) (1 + x + x2 + x5 + x6)
(1 + x + x4 + x6 + x7) (1 + x + x4 + x5 + x6 + x7 + x8)

(Mathematica displays polynomials in increasing order of terms.)

Note that the factorization is valid when done over the field with 2 elements, GF(2). Whether a polynomial can be factored, and what the factors are, depends on what field you do your arithmetic in. The polynomial Q(x) above is irreducible as a polynomial with real coefficients. It can be factored working mod 3, for example, but it factors differently mod 3 than it factors mod 2. Here’s the factorization mod 3:

(1 + 2 x2 + 2 x3 + x4 + x5) (2 + x + 2 x2 + x3 + 2 x4 + x6 + x7)
(2 + x + x3 + 2 x7 + x8 + x9 + x10 + 2 x12 + x13 + x15 + 2 x16 + x17 + x18 + x19 + x20)

CRC-64

The polynomial

Q(x) = x64 + x4 + x3 + x + 1

is known as CRC-64, and is part of several standards, including ISO 3309. This polynomial is irreducible mod 2 as the following Mathematica code confirms.

    IrreduciblePolynomialQ[x^64 + x^4 + x^3 + x + 1, Modulus -> 2]

The CRC algorithm uses this polynomial mod 2, but out of curiosity I checked whether it is irreducible in other contexts. The following code tests whether the polynomial is irreducible modulo the first 100 primes.

    Table[IrreduciblePolynomialQ[x^64 + x^4 + x^3 + x + 1, 
        Modulus -> Prime[n]], {n, 1, 100}]

It is irreducible mod p for p = 2, 233, or 383, but not for any other primes up to 541. It’s also irreducible over the real numbers.

Since Q is irreducible mod 2, the check sum essentially views its input P(x) as a member of the finite field GF(264).

Related posts

Curvature at Cairo

I was flipping through Gravitation [1] this weekend and was curious about an illustration on page 309. This post reproduces that graph.

The graph is centered at Cairo, Egypt and includes triangles whose side lengths are the distances between cities. The triangles are calculated using only distances, not by measuring angles per se.

The geometry of each triangle is Euclidean: giving the three edge lengths fixes all the features of the figure, including the indicated angle. … The triangles that belong to a given vertex [i.e. Cairo], laid out on a flat surface, fail to meet.

I will reproduce the plot in Python because I’m more familiar with making plots there. But I’ll get the geographic data out of Mathematica, because I know how to do that there.

Geographic information from Mathematica

I found the distances between the various cities using the GeoDistance function in Mathematica. The arguments to GeoDistance are “entities” which are a bit opaque. When using Mathematica interactively, you can use ctrl + = to enter the name of an entity. There’s some guesswork, e.g. whether I meant New York City or the state of New York when I entered “New York”, but Mathematica guessed correctly. The following code lists the city entities explicitly.

    cities = {
        Entity["City", {"Cairo", "Cairo", "Egypt"}], 
        Entity["City", {"Delhi", "Delhi", "India"}], 
        Entity["City", {"Moscow", "Moscow", "Russia"}], 
        Entity["City", {"Brussels", "Brussels", "Belgium"}], 
        Entity["City", {"Reykjavik", "Hofudhborgarsvaedhi", "Iceland"}], 
        Entity["City", {"NewYork", "NewYork", "UnitedStates"}], 
        Entity["City", {"CapeTown", "WesternCape", "SouthAfrica"}], 
        Entity["City", {"PortLouis", "PortLouis", "Mauritius"}] }

Most of these are predictable, but I would not have guessed the code for Reykjavik or Cape Town. I found these by using the command InputForm and entering the entities as above.

I found the distance from Cairo to each of the other cities with

    Table[GeoDistance[cities[[1]], cities[[i]]], {i, 2, 8}]

and the distances from the cities to their neighbors with

    Table[GeoDistance[cities[[i]], cities[[i + 1]]], {i, 2, 7}]
    GeoDistance[cities[[8]], cities[[2]]]

Drawing the plot

Now that we’ve got the data, how do we draw the plot?

Let’s put Cairo at the origin. First we draw a line from Cairo to Delhi. We might as well put Delhi on the x-axis to make things simple.

Next we need to plot Moscow. We know the distance R1 from Cairo to Moscow, and the distance R2 from Delhi to Moscow. So imagine drawing a circle of radius R1 centered at Cairo and a circle of radius R1 centered at Delhi. Moscow is located where the two circles intersect. The previous post shows how to find the intersection of circles.

The two circles intersect in at two points, so which do we choose? We choose the intersection point that preserves the orientation of the original graph (and the globe). As we go through the cities in counterclockwise order, the cross product of the previous line to the next line should have positive z component.

This shows that the original graph was not to scale, though the gap between triangles was approximately to scale. In hindsight this should have been obvious: Brussels and Reykjavik are much closer to each other than Capetown and New York are.

The gap

Why the gap? Because the earth is curved at Cairo (and everywhere else). If the earth were flat, the triangles would fit together without any gaps.

There’s no gap when you take spherical triangles on the globe. But even though the triangles preserve length when projected to the plane, they cannot preserve angles too. The sum of the angles in a spherical triangle adds up to more than 180°, and the amount by which the sum exceeds 180° is proportional to the size of the spherical triangle. Since the angles of triangles in the plane do add up to 180°, each flat triangle fails to capture a bit of the corresponding spherical triangles, and the failures add up to the gap we see in the image.

[1] Gravitation by Misner, Thorne, and Wheeler. 1973.

Creating a Traveling Salesman Tour of Texas with Mathematica

A Traveling Salesman tour visits a list of destinations using the shortest path. There’s an obvious way to find the shortest path connecting N points: try all N! paths and see which one is shortest. Unfortunately, that might take a while.

Texas has 254 counties, and so calculating a tour of Texas counties by brute force would examine 254! paths, over 10500 paths. In theory, large Traveling Salesman problems are unsolvable. In practice they can often be solved quickly. As is often the case, the key is to give yourself just a little slack and look for solutions that are close to optimal.

I’ve used the example of a Traveling Salesman tour of Texas before because it makes a nice visual. People asked me for the code that made the image, but I didn’t save the code and didn’t remember offhand how to re-create it. So here’s the code for future reference.

Incidentally, computing the tour itself took only a second or two. Creating the visualization took several seconds.

    texas = Entity["AdministrativeDivision", "Texas"]; 
    counties = texas["Subdivisions"];
    tour = FindShortestTour[texas["Subdivisions"]];
    GeoGraphics[{Thick, Red, GeoPath[counties[[tour[[2]]]]]}]

Here counties is a list of objects representing Texas counties, sorted by alphabetical order, from Anderson County to Zavala County.

The tour object is a pair of a distance and a list of integers. The distance, 6780.74 nautical miles, is the length of the tour. The integers are the indexes of the counties in the tour.

{6780.74 nmi, {1, 107, 234, …, 201, 37, 1}}

The tour starts with the first county, Anderson County. It’s got to start somewhere, and I expect it always starts with the first item in the list. Next it goes to the 107th county, Henderson County, and so on. Because FindShortestTour returns a closed tour, the tour ends where it started, in Anderson County.

Related posts: Traveling Salesman tours of Africa, Americas, Eurasia and Oceania.

Cayley graphs in Mathematica

The previous post mentioned the group S4, the group of all permutations of a set with four elements. This post will show a way to visualize this group.

The Mathematica command

    CayleyGraph[
        SymmetricGroup[4], 
        VertexLabels -> Placed["Name", Center],
        VertexSize -> 0.4]

generates the graph below.

Cayley graph of alternating group S4

This is an interesting image, but what does it mean?

The elements of S4 are represented by the circled numbers. The numbers correspond to the permutations of four elements, listed in lexicographical order. If you label the four elements a, b, c, and d then the permutations are listed in alphabetical order. Permutation 1 is [1, 2, 3, 4] to itself and Permutation 24 is its reverse [4, 3, 2, 1].

In the Mathematica application, mousing over a number shows which permutation it represents, though the static image above doesn’t have this feature.

The blue arrows represent the permutation that swaps the first two elements. So the blue arrow between node 1 and node 7 says that swapping the first two elements of Permutation 1 gives you Permutation 7, which is [2, 1, 3, 4]. The blue arrow going back from 7 to 1 says that the same swapping operation applied to Permutation 7 returns you to Permutation 1.

All the blue arrows come in pairs because swapping is its own inverse.

The green arrows represent a rotation. For example, the green arrow from 1 to 10 says that rotation turns [1, 2, 3, 4] into [2, 3, 4, 1]. The rotation operation is not its own inverse, so the arrows only go in one direction. But every green arrow is part of a diamond because applying the rotation operation four times sends you back where you started.

You can get from any permutation to any other permutation by repeatedly either swapping the first two elements or applying a rotation. In group theoretical terminology, these two permutations generate the group S4.

Related posts

Permutations and centralizers in Mathematica

I had a fleeting thought about how little I use group theory when I realize I used it just this week.

A couple days ago I needed to know which permutations of 4 elements commute with reversal. If r takes a sequence and reverses it, I need to find all permutations p such that pr = rp.

In group theory jargon, the group of all permutations of 4 elements is the symmetric group S4. The subgroup of elements that commute with r is the centralizer of r. So my task was to find the centralizer of r in S4. How do I pose this task to Mathematica?

Mathematica represents permutations as disjoint cycles. The permutation r is represented as

    Cycles[{{4, 1}, {2, 3}}]

because swapping the first and last elements, then swapping the middle two elements, reverses a list of four elements.

To find the centralizer of r I asked Mathematica

    GroupCentralizer[SymmetricGroup[4], Cycles[{{4, 1}, {2, 3}}]]

This returns

    PermutationGroup[{Cycles[{{1, 4}}], Cycles[{{2, 3}}], Cycles[{{1, 2}, {3, 4}}]}]

This does list the permutations that commute with r but rather the generators of the group of such permutations. If we ask for the elements of the group above with

    GroupElements[%]

this returns

    {Cycles[{}], 
     Cycles[{{2, 3}}], 
     Cycles[{{1, 2}, {3, 4}}], 
     Cycles[{{1, 2, 4, 3}}], 
     Cycles[{{1, 3, 4, 2}}], 
     Cycles[{{1, 3}, {2, 4}}], 
     Cycles[{{1, 4}}], 
     Cycles[{{1, 4}, {2, 3}}]}

I use basic group theory and other algebra all the time, but I don’t think of it that way. In this example, I had a question about permutations, and it only occurred to me later that I could phrase my question in the vocabulary of group theory. I use ideas from algebra more often than I use the vocabulary of algebra.

Related posts

Drag equation exponent variation

The motion of a falling body of mass m is given by

m \frac{dv}{dt} = mg - kv^r

where the term −kvr accounts for drag due to air resistance. One can derive r = 2 under simple physical assumptions, but if I remember correctly other values of r may be more realistic in certain circumstances. I don’t know much about the physics here; if you know about the use of other values of r, please let me know by leaving a comment.

Terminal velocity

When r = 1 or r = 2 the differential equation above can be solved in terms of elementary functions, but otherwise it cannot. Nevertheless one can show that for all values of r the object reaches a terminal velocity, and calculate that velocity without explicitly solving the differential equation. William Waterhouse demonstrated this in a one-page article [1]. He rewrites the equation to look at time as a function of velocity rather than velocity as a function of time

\frac{dt}{dv} = \frac{1}{g} \frac{1}{1 - (k/mg)v^r}

and concludes

t = \frac{1}{g} \int_0^v \frac{dv}{1 - (k/mg)v^r} + t_0

He notes that the integral diverges as v approaches

 \left(\frac{mg}{k}\right)^{1/r}

and so that is the terminal velocity, i.e. it takes an infinite amount of time to achieve this velocity. Waterhouse recommends this derivation as “a good example of deriving information about a problem without knowing an explicit solution.”

I would add that such an approach is the norm, not an exception. A closed-form solution to a differential equation in nice when you can get it, but usually not possible. And even when you can find a closed-form solution, you may be able to achieve your goal more directly by not using it.

Hypergeometric solution

I suspected the differential equation could be solved for general values of r using special functions, and that is the case. Mathematica was able to evaluate the integral for t as a function of v in terms of a hypergeometric function.

v \, _2F_1\left(1,\frac{1}{r};1+\frac{1}{r};\frac{g k v^r}{m}\right)

When I asked Mathematica to solve the differential equation directly, it said that the solution is the inverse function of the function above. Apparently Waterhouse and Mathematica agree that it is easier to think of t as a function of v rather than the original formulation.

The notation 2F1 indicates there are two upper parameters and one lower parameter. In our application, the upper parameters are 1 and 1/r, the lower parameter is 1 + 1/r, and the function is evaluated at gkvr/m. You can find a brief introduction to hypergeometric functions here. A hypergeometric function 2F1 has a singularity at 1, and so we could derive the terminal velocity from the explicit solution.

Mathematica implementation

Let c = gk/m. Then we can express velocity as a function of time in Mathematica by

    f[r_, c_, v_] := InverseFunction
    [
        #1 Hypergeometric2F1[1, 1/r, 1 + 1/r, c #1^r] &
    ][t]

and use this to make plots of the velocity for various values of c and r.

The following sets c = 2 and varies r over 1, 1.1, 1.2, … 2.

    Plot[Table[f[2, d/10, t], {d, 10, 20}], {t, 0, 4}, PlotRange -> All]

Here’s the output.

The terminal velocity decreases as r increases. The opposite is true for c < 1.

[1] William C. Waterhouse. A Fact about Falling Bodies. Mathematics Magazine, Vol. 44, No. 1 (Jan., 1971), pp. 33–34. The article straddles two pages, but takes up less than half of each page.

Double duals of polyhedra

The previous post mentioned the dual of a tetrahedron is another tetrahedron. The dual of a cube is an octahedron and the dual of an octahedron is a cube. And the dual of a dodecahedron is an icosahedron, and the dual of an icosahedron is a dodecahedron.

So if you take the dual of a regular solid twice, you get back another regular solid of the same type. But you do not get the same regular solid back. To see this, let’s look at how you form the dual of a polyhedron.

To find the dual of a polyhedron, you make a new polyhedron whose vertices are the centroids of the faces of the original polyhedron. The vertices of a regular polyhedron all lie on the surface of a sphere. A face of the polygon lies inside the sphere, so the centroids of the solid are all inside the sphere. Since the centroids form the vertices of a regular solid, they also all lie on a sphere, but a smaller sphere inside the first sphere.

Size of duals

If you take the double dual of a regular solid, the dual of the dual, then you get a smaller solid of the same type. How much smaller? We can get Mathematica to calculate this for us.

The function DualPolyhedron does what the name implies. Let’s apply this to the default tetrahedron. When we enter

    DualPolyhedron[ Tetrahedron[] ]

we get back

    Tetrahedron[{2 Pi/3}, Pi}, 1/3]

The default tetrahedron, returned by calling Tetrahedron[], has vertices

  • (1, 0, 0)
  • (0, 0, 1)
  • (1, 0, 1)
  • (1, 1, 1)

Each edge has length 1. Its dual has been rotated by 2π/3 about the z axis and by π about the y axis, and has edges of length 1/3. Taking the dual of a tetrahedron rotates the tetrahedron and shrinks it by a factor of 3 (in each dimension).

Size of double duals

If we take the double dual of a tetrahedron, we get a tetrahedron nine times smaller. Let’s see what happens to the rest of the regular solids.

    DualPolyhedron[DualPolyhedron[#]] & /@ 
        {Tetrahedron[], Octahedron[], Cube[],
         Dodecahedron[], Icosahedron[]}

This tells us again what we found above for the double dual of a tetrahedron. It also tells us that taking the double dual of an octahedron or cube results in an octahedron or cube 3 times smaller. And it tells us that taking the double dual of a dodecahedron or icosahedron reduces its size by a factor of

(5 + 2√ 5)/15 = 0.6314…

Taking the double dual shrinks a regular solid, and the amount of shrinkage goes down as the number of faces goes up.

Orientation of double duals

The process of taking the double dual may also rotate the solid. The second dual of the standard tetrahedron, for example, is not just 9 times smaller. It’s also been rotated. We can see this by calling PolyhedronCoordinates. The code

    PolyhedronCoordinates[ 
        DualPolyhedron[ DualPolyhedron[ Tetrahedron[] ] ] ]

shows that the new vertices are

  • (7/9, 2/9, 6/9)
  • (7/9, 2/9, 7/9)
  • (7/9, 3/9, 7/9)
  • (6/9, 2/9, 7/9)

So the second dual of the tetrahedron is not just 9 times smaller.

However, taking the double dual of the rest of the regular solids does not change the orientation. You could verify this for the cube and dodecahedron using Mathematica. It then follows that taking the double dual of octahedra and icosahedra does not change the orientation.

Sixth dual

If you take the dual of a tetrahedron 6 times you do return to the original orientation, but not if you take the dual fewer times. You can verify this by running the following:

    NestList[DualPolyhedron, Tetrahedron[], 6]

Related posts

Poisson distribution tail bounds

Yesterday Terence Tao published a blog post on bounds for the Poisson probability distribution. Specifically, he wrote about Bennett’s inequalities and a refinement that he developed or at least made explicit. Tao writes

This observation is not difficult and is implicitly in the literature … I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog.

Bennett’s inequalities say that for a Poisson random variable X with mean λ,

P(X \geq \lambda(1+u)) \leq \exp(-\lambda h(u))

for u ≥ 0 and

P(X \leq \lambda(1+u)) \leq \exp(-\lambda h(u))

for -1 < u ≤ 0 where

h(u) = (1+u) \log(1+u) - u

I wanted to visualize how tight Bennett’s bounds are and got some interesting images due to the discrete nature of the Poisson distribution.

Here’s a plot of how tight the right-tail estimate is, the gap between the two sides of the first inequality above.

Here u ranges from 0 to 2 along the front edge of the graphic and λ ranges from 5 to 10.

And here’s a plot of how tight the left-tail estimate is, the gap between the two sides of the second inequality above.

Mathematica tweaking

The latter image was easy to make, but the first image required a couple adjustments: the image had holes in it, and the view point was awkward.

I fixed the holes in the plot by adding the option PlotPoints->50 to make the sampling finer. I fixed the view point by grabbing the image and rotating it until I thought it looked better. I could have saved the rotated image directly, but I was curious how to do this with the Export command. To do this I needed to specify the ViewPoint explicitly in the plotting command, but I didn’t know how to get the value of ViewPoint that’s I’d implicitly found via my mouse. A comment on the Mathematica Stack Exchange site told me what I needed to know.

Simply edit the output cell and wrap Options[…, ViewPoint] around the already rotated output. The graphics should be in the place of …. ViewVertical may also change during rotating, as well as some other parameters.

Related posts

Jacobi functions with complex parameter

Jacobi functions are complex-valued functions of a complex variable z and a parameter m. Often this parameter is real, and 0 ≤ m < 1. Mathematical software libraries, like Python’s SciPy, often have this restriction. However, m could be any complex number.

The previous couple of posts spoke of the fundamental rectangle for Jacobi functions. But in general, Jacobi functions (and all other elliptic functions) have a fundamental parallelogram.

When m is real, the two periods of the Jacobi sn function are 4K(m) and 2K(1-m) i and so the function repeats horizontally and vertically. (Here K is the complete elliptic integral of the first kind.) When m is complex, the two periods are again 4K(m) and 2K(1-m) i, but now 4K(m) and 2K(1-m) are complex numbers and their ratio is not necessarily real. This means sn repeats over a parallelogram which is not necessarily a rectangle.

The rest of the post will illustrate this with plots.

First, here is a plot of sn(K(1/2)z, 1/2). The height of the graph represents the absolute value of the function and the color represents its phase. (The argument z was multiplied by K(1/2) to make the periods have integer values, making it a little easier to see the periods.)

Notice that the plot features line up with the coordinate axes, the real axis running from 0 to 8 in the image and the complex axis running from 0 to 4.

Here’s the analogous plot for sn(z, 2 + 2i).

Now the features are running on a diagonal. The pits are where the function is zero and the white ellipses are poles that have have the tops cut off to fit in a finite plot.

It will be easier to see what’s going on if we switch to flat plots. The color represents phase as before, but now magnitude is denoted by contour lines.

Here’s a plot of sn(K(1/2)z, 1/2)

and here’s a plot of sn(z, 2 + 2i).

According to theory the two periods should be

4 K(2 + 2i) = 4.59117 + 2.89266 i

and

2i K(-1 – 2i) = -1.44633 + 2.29559 i.

We can show that this is the case by plotting the real and imaginary parts of sn(z, 2 + 2i) as we move in these two directions.

The Mathematica code

    Plot[
        {Re[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]],
         Im[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]]}, 
        {t, 0, 4}]

produces this plot

and the code

    Plot[
        {Re[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]], 
         Im[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]]}, 
        {t, 0, 4}]

produces this plot.

Related posts

When a cubic or quartic has a double root

Thanks to the quadratic equation, it’s easy to tell whether a quadratic equation has a double root. The equation

ax^2 + bx + c = 0

has a double root if and only if the discriminant

b^2 - 4ac

is zero.

The discriminant of a cubic is much less known, and the analogs for higher order polynomials are unheard of. There is a discriminant for polynomials of all degrees, though the complexity of the discriminant grows quickly with the degree of the polynomial.

This post will derive the discriminant of a cubic and a quartic.

Resultants

The resultant of a pair of polynomials is zero if and only if the two polynomials have a common root. Resultants have come up in a couple previous posts about solving trigonometric equations.

A polynomial p(x) has a double root if p and its derivative p‘ are both zero somewhere. The discriminant of p is the resultant of p and p’.

The resultant of two polynomials is a determinant of their Sylvester matrix. This matrix is easier to describe by example than by equation. You basically fill a matrix with shifts of the coefficients of both polynomials and fill in the gaps with zeros.

MathWorld gives the following Mathematica code for the Sylvester matrix of two inputs.

    SylvesterMatrix1[poly1_, poly2_,  var_] :=
      Function[{coeffs1, coeffs2}, With[
        {l1 = Length[coeffs1], l2 = Length[coeffs2]},
          Join[
            NestList[RotateRight, PadRight[coeffs1,
              l1 + l2 -  2], l2 - 2],
            NestList[RotateRight, PadRight[coeffs2,
              l1 + l2 - 2], l1 - 2]
          ]
        ]
      ][
        Reverse[CoefficientList[poly1, var]],
        Reverse[CoefficientList[poly2, var]]
    ]

Cubic discriminant

If we apply this to the cubic polynomial

ax^3 + bx^2 + dx + c

we get the following matrix.

\left( \begin{array}{ccccc} a & b & c & d & 0 \\ 0 & a & b & c & d \\ 3 a & 2 b & c & 0 & 0 \\ 0 & 3 a & 2 b & c & 0 \\ 0 & 0 & 3 a & 2 b & c \\ \end{array} \right)

We can compute the resultant by taking the determinant of the above matrix.

    g[x_] := a x^3 + b x^2 + c x + d
    SylvesterMatrix1[g[x], D[g[x], x], x]

We get the following result

-a b^2 c^2 + 4 a^2 c^3 + 4 a b^3 d - 18 a^2 b c d + 27 a^3 d^2

and we can verify that this is the same result we would get from calling the Resultant directly with

    Resultant[g[x], D[g[x], x], x]

Although the resultant is defined in terms of a determinant, that doesn’t mean that resultants are necessarily computed by computing determinants. The Sylvester matrix is a very special matrix, and there are clever ways to exploit its structure to create more efficient algorithms.

Each term in the resultant has a factor of a, and the discriminant is the resultant divided by –a.

Quartic discriminant

Now let’s repeat our exercise for the quartic. The Sylvester matrix for the quartic polynomial

ax^4 + bx^3 + cx^2 + dx + e

and its derivative is

\left( \begin{array}{ccccccc} a & b & c & d & e & 0 & 0 \\ 0 & a & b & c & d & e & 0 \\ 0 & 0 & a & b & c & d & e \\ 4 a & 3 b & 2 c & d & 0 & 0 & 0 \\ 0 & 4 a & 3 b & 2 c & d & 0 & 0 \\ 0 & 0 & 4 a & 3 b & 2 c & d & 0 \\ 0 & 0 & 0 & 4 a & 3 b & 2 c & d \\ \end{array} \right)

I created the image above with the following Mathematica code.

    f[x_] := a x^4 + b x^3 + c x^2 + d x + e
    TeXForm[SylvesterMatrix1[f[x], D[f[x], x], x]]

If we take the determinant, we get the resultant, but it’s a mess.

256 a^4 e^3-192 a^3 b d e^2-128 a^3 c^2 e^2+144 a^3 c d^2 e \\ -27 a^3 d^4+144 a^2 b^2 c e^2-6 a^2 b^2 d^2 e-80 a^2 b c^2 d e \\ +18 a^2 b c d^3+16 a^2 c^4 e-4 a^2 c^3 d^2-27 a b^4 e^2\\ +18 a b^3 c d e -4 a b^3 d^3 -4 a b^2 c^3 e+a b^2 c^2 d^2

Again each term has a factor of a, so we can divide by a. to get the discriminant.

If we want to use this in code, we can have Mathematica export the expression in C code using CForm. To generate Python code, it’s more convenient to use FortranForm since Python like Fortran uses ** for exponents.

The following Python code was created by pasting the output of

    FortranForm[Resultant[f[x], D[f[x], x], x]]

and making it into a function.

    def quartic_resultant(a, b, c, d, e):
        return a*b**2*c**2*d**2 - 4*a**2*c**3*d**2 - 4*a*b**3*d**3 + 18*a**2*b*c*d**3 \ 
        - 27*a**3*d**4 - 4*a*b**2*c**3*e + 16*a**2*c**4*e + 18*a*b**3*c*d*e \
        - 80*a**2*b*c**2*d*e - 6*a**2*b**2*d**2*e + 144*a**3*c*d**2*e \
        - 27*a*b**4*e**2 + 144*a**2*b**2*c*e**2 - 128*a**3*c**2*e**2 \
        - 192*a**3*b*d*e**2 + 256*a**4*e**3

Let’s try this on a couple examples. First

x^2(x-2)(x-3) = x^4 -5x^3 + 6x^2

which has a double root at 0.

As expected

    quartic_resultant(1, -5, 6, 0, 0)

returns 0.

Next let’s try

(x-1)(x-2)(x-3)(x-4) = x^4 -10x^3 +35x^2 - 50x + 24

The call

    quartic_resultant(1, -10, 35, -50, 24)

returns 144. We expected a non-zero result since our polynomial has distinct roots at 1, 2, 3, and 4.

Higher order discriminants

In general the discriminant of an nth degree polynomial is the resultant of the polynomial and its derivative, up to a constant. There’s no need to worry about this constant if you’re only concern is whether the discriminant is zero. To get the exact discriminant, you divide the resultant by the leading coefficient of the polynomial and adjust the sign.

The sign convention is a little strange. If you look back at the examples above, we divided by –a in the cubic case but we divided by a in the quartic case. You might reasonably guess that you should divide by a and multiply by (-1)n. But that won’t give you right sign for the quadratic case. The conventional sign is

(-1)n(n – 1)/2.

So when n equals 2 or 3 we get a negative sign, but when n equals 4 we don’t.

Related posts