Covariant and contravariant

The terms covariant and contravariant come up in many contexts. An earlier post discussed how the terms are used in programming and category theory. The meaning in programming is an instance of the general use in category theory.

Vector fields can be covariant or contravariant too. This is also an instance of the categorical usage, except the terminology is backward.

Michael Spivak explains:

Nowadays such situations are always distinguished by calling the things which go in the same direction “covariant” and the things which go in the opposite direction “contravariant.” Classical terminology used these same words, and it just happens to have reversed this: a vector field is called a contravariant vector field, while a section of T*M is called a covariant vector field. And no one had the gall or authority to reverse terminology sanctified by years of usage. So it’s very easy to remember which kind of vector field is covariant, and which is contravariant — it’s just the opposite of what it logically ought to be.

Emphasis added.

In defense of classical nomenclature, it was established decades before category theory. And as Spivak explains immediately following the quotation above, the original terminology made sense in its original context.

From Spivak’s Differential Geometry, volume 1. I own the 2nd edition and quoted from it. But it’s out of print so I linked to the 3rd edition. I doubt the quote changed between editions, but I don’t know.

Related: Applied category theory

Geeky company names

I started a discussion on Twitter this evening about consulting company names. Here are some of the names.

  • Turing Machine Computing: If we can’t do it, it can’t be done.
  • Heisenberg Consulting: You can have speed or quality, but not both at the same time.
  • Perelman Consulting: Please don’t pay us. We don’t want your money.
  • Gödel Systems: Your job is done but we can’t prove it.
  • Gödel Consulting: because no one is supplying ALL your needs.
  • Lebesgue Consulting: We’ve got your measure.
  • Noether Consulting: We find the conserved values of your system.
  • Fourier consulting: We transform your world periodically.
  • Zorn’s Consulting: Your choice is axiomatic.
  • Spherical Computing: Without parallel.
  • Markov Chain Consulting: It doesn’t matter how we got here.
  • Dirac Consulting: We get right to the point.
  • Shannon Consulting: We’ll find a way to deliver your message.
  • Neyman & Pearson Consulting: No one is more powerful than us.
  • Complex Conjugate Consulting: We make your product real.
  • Hadamard Consulting: Real solutions by complex methods.
  • Zeno Consulting: We’ll get you arbitrarily close to where you want to be.
  • Hilbert Consulting: You think you have a problem?
  • Riemann Hypothesis Consulting: When your job is on the line and everything is critical

Here are footnotes explaining the puns above.

  • Turing: In computer science, Turing machines define the limits of what is computable.
  • Heisenberg: The Heisenberg Uncertainty Principle says that there is a limit to how well you can know a particle’s momentum and position. The more accurately you know one, the less you know about the other.
  • Perelman: Turned down prize money from the Fields Institute and Clay Institute after solving the Poincaré conjecture.
  • Gödel: His incompleteness theorem says that number theory contains true theorems that cannot be proved.
  • Lebesgue: Founder of measure theory, a rigorous theory of length, area, volume, etc.
  • Noether: Established a deep connection between symmetry and conservation laws.
  • Fourier: Known for Fourier transforms and Fourier series, expressing functions as sums or integrals of periodic functions.
  • Zorn: Known for Zorn’s lemma, equivalent to the axiom of choice.
  • Spherical: There are no parallel lines in spherical geometry.
  • Markov Chain: The probability distribution for the next move in a Markov chain depends only on the current state and not on previous history.
  • Complex Conjugate: A complex number times its conjugate is a real number. See xkcd.
  • Dirac: The reference here is to the Dirac delta function. Informally, a point mass. Formally, a distribution.
  • Shannon: Founder of communication theory.
  • Neyman-Pearson: The Neyman-Pearson lemma concerns most powerful hypothesis tests.
  • Hadamard: Said “The shortest path between two truths in the real domain passes through the complex domain.” That is, techniques from complex analysis are often the easiest way to approach problems from real analysis.
  • Zeno: Zeno’s paradox says you cannot get anywhere because first you have to get halfway there, then halfway again, etc.
  • Hilbert: Created a famous list of 23 research problems in math in 1900.
  • Riemann: The Riemann hypothesis says that all the non-trivial zeros of the Riemann zeta function line on the critical line Re(z) = 1/2.

Pulp science fiction and vibrations

This morning I ran across Pulp-o-mizer and decided my series of posts on mechanical vibrations could use a little sensational promotion.

The posts are

Consulting in differential equations

Damped, driven oscillations

This is the final post in a four-part series on vibrating systems. The first three parts were

and now we consider driven, damped vibrations. We are looking at the equation

m u'' + γ u' + k u = F cos ωt

where all coefficients are positive.

As in the previous post, we need to find one solution to the equation with the forcing term F cos ωt and add it to the general solution to the homogeneous (free) equation.

The particular solution we’re looking for is simply R cos(ωt– φ), but the dependence of R and φ on the coefficients m, γ, and k is complicated. We’ll come back to this later, but first we’ll see what we can understand without knowing the exact form of R and φ.

The solutions to the unforced equation decay exponentially over time because of damping. The long-term behavior of the system, known as the steady state solution, is just R cos(ωt– φ). The frequency of the steady state solution is the same as the frequency of the forcing function, though the phase will be different in general.

To find the amplitude and phase of the steady state solution, we first define Δ > 0 by

Δ2 = m202 – ω2)2 + γ2 ω2

where ω0 = √(k/m) is the natural frequency of the system.

Then R = F /Δ . Note that Δ is small when the driving frequency ω is close to the natural frequency. That’s not where Δ is minimized, though when γ is small the minimum of Δ occurs close to the natural frequency. The minimum of Δ, and hence the maximum of the amplitude R, occurs at a lower frequency, namely

ω2 = ω02 – γ2 / 2m2.

The phase φ satisfies

cos φ = m02 – ω2)/ Δ


sin φ = γω/ Δ.

For low frequency ω, the phase φ is small, i.e. the response is nearly in phase with the excitation. For ω =ω0, i.e. forcing at the natural frequency, the response lags the excitation by π/2, a quarter cycle.

* * *

If your company needs help with an application of differential equations, please call or email to discuss your project.


Exact chaos

Pick a number x between 0 and 1. Then repeatedly replace x with 4x(1-x). For almost all starting values of x, the result exhibits chaos. Two people could play this game with starting values very close together, and eventually their sequences will diverge.

It’s somewhat surprising that the iterative process described above can be written down in closed form. Starting from a value x0, the value after n iterations is

sin( 2n arcsin( √ x0 ) )2.

Now suppose two people start with the same initial value. One repeatedly applies 4x(1-x) and the other uses the formula above. If both carried out their calculations exactly, both would produce the same output at every step. But what if both used a computer?

The two approaches correspond to the Python functions f and g below. Because both functions are executed in finite precision arithmetic, both have errors, but they have different errors. Suppose we want to look at the difference between the two functions as we increase n.

from scipy import arcsin, sin, sqrt, linspace
from matplotlib import pyplot as plt

def f(x0, n):
    x = x0
    for _ in range(n):
        x = 4*x*(1-x)
    return x

def g(x0, n):
    return sin(2.0**n * arcsin(sqrt(x0)))**2

n = 40
x = linspace(0, 1, 100)
plt.plot(x, f(x, n) - g(x, n))
plt.ylim(-1, 1)

When we run the code, nothing exciting happens. The difference is a flat line.

Next we increase n to 45 and we start to see the methods diverge.

The divergence is large when n is 50.

And the two functions are nearly completely uncorrelated when n is 55.


So which function is more accurate, f or g? As noted in the comments, the two functions have different kinds of numerical errors. The former accumulates arithmetic precision error at each iteration. The latter shifts noisy bits into significance by multiplying by 2^n. Apparently both about the same overall error, though they have different distributions of error.

I recomputed g using 100-digit precision with mpmath and used the results as the standard to evaluate the output of f and g in ordinary precision. Here’s a plot of the errors when n = 45, first with f

and then with g.

The average absolute errors of f and g are 0.0024 and 0.0015 respectively.

Wonky but free

Rachel Kroll wrote a blog post last Friday entitled I mortgaged my future with a Mac. The part I found most interesting is near the end of the post.

Instead of staying with my wonky-but-free ways of doing things, I shifted all of my stuff over to the Mac. … Now when I want to get back out, I have to do all of the work I thought I had managed to avoid by using a Mac in the first place.

I’m not interested in the pros and cons of using a Mac, but I really liked the phrase “wonky but free.” I have a growing appreciation for things that are wonky-but-free, technologies that make a poor first impression but have long-term benefits.

Simpler for whom?

Here’s an amusing sentence I ran across this morning:

The code was simplified in 2003 and is harder to understand.

Maybe the author is being sarcastic, but I doubt it. I believe he means something like this:

In 2003, the code was made more concise. This made it simpler from the perspective of the maintainers of the code, sophisticated programmers well versed with the project. Unfortunately, this makes the code harder for novices, such as the intended audience for this book, to understand.

Quotation from An Introduction to Programming in Emacs Lisp.

See Simple versus easy for a discussion of what it means to be simple. We often conflate the two terms. Rich Hickey argues that ease is subjective but that simplicity can be objectively defined. He makes an interesting case, though his definition of simplicity may not be what people have in mind or want when they talk about simplicity.

Undamped forced vibrations

This is the third in a series of four blog posts on mechanical vibrations. The first two posts were

We’re looking at solutions to the differential equation

m u'' + γ u' + k u = F cos ωt

The first two posts considered “free” vibrations, systems with no external input. The coefficient F is zero in such systems. Now we consider a the effect of an external driving force of the form F cos ωt. We’ll leave out damping effects in this post, that is, we will assume γ = 0. The final post in this series will cover damped, driven oscillations, i.e. the case of γ and F both positive.

When the right-hand side of a linear differential equation is 0, we say the equation is homogeneous. When the right-hand side is not zero, we say the equation is non-homogeneous.

The general solution to a non-homogeneous linear differential equation is the general solution to the corresponding homogeneous equation plus any particular solution to the non-homogeneous equation. That is, we all we have to do is find one solution to the equation with the forcing term, and add it to the general solution to the equation without the forcing term.

Recall that the natural frequency of our equation without forcing is ω defined by

ω02 = k/m.

and that the general solution without forcing is

u(t) = R cos(ω0t – φ)

where the amplitude R and the phase φ are determined by the initial conditions. To find the general solution to the non-homogeneous we need to find one solution to the non-homogeneous equation, and here’s one:

F cos(ωt) /(m02 – ω2))

provided that the natural frequency ω0 is not equal to the driving frequency ω. This says that the general solution is the sum of two periodic functions, one with the natural frequency ω0 and one with the frequency of the driving force ω. In general, the sum of two periodic functions can be fairly complicated, and not necessarily periodic. More on that here.

Next we will look at two phenomena that the solutions can illustrate: beats and resonance. The latter happens when the natural and driving frequencies are equal.


If the starts at position 0 and with velocity 0, then the amplitudes of the homogeneous and non-homogeneous solutions are the same. The solution is

F /(m02 – ω2)) ( cos(ωt) – cos(ω0t) )

We can apply the trig identity

cos(A) – cos(B) = 2 sin( (AB)/2 ) sin(( A+B)/2 )

to rewrite the solution as the amplitude 2F /(m02 – ω2)) times

sin( (ω0 – ω) t/2 ) sin( (ω0 + ω) t/2 )

If the natural and driving frequencies are similar, i.e. |ω0 – ω| is small, then the first term is a low frequency factor multiplying the second high frequency term. This phenomena is known as beats. You can hear it, for example, when two instruments are nearly but not quite in tune. If one is playing an A 440 and the other an A 441, you’ll hear a pitch of 440.5 getting louder and softer at a frequency of one cycle per second.

Here’s an example of beats, plotting sin(t) sin(10t). The solid blue line is the product. The dotted green lines are sin(t) and -sin(t).

illustrating beats

Incidentally, this is related to how AM radio works: the broadcast content is multiplied by the carrier wave.


When we wrote down the solution above, we assumed the natural and driving frequencies were not equal. If they are, if the system is being driven at its natural frequency, we have resonance. In that case, the solution to the non-homogeneous equation is

(F/2mω) t sin ωt.

Because the solution is proportional to t sin ωt, this says we have oscillations whose amplitude grows linearly with time. In theory, the oscillations can become arbitrarily large. In reality there is always some damping, which eventually puts some limit on the amplitude of the oscillations. Also, the system may break before damping would reign in the oscillations.

Here’s an example of resonance, plotting 0.25 t sin(t) in solid blue and ± 0.25 t in dotted green.


The final post in this series will look at the effect of damping on forced vibrations.

For daily posts about differential equations, follow @diff_eq on Twitter.

diff_eq twitter icon

How mathematicians see physics

From the preface to Physics for Mathematicians:

In addition to presenting the advanced physics, which mathematicians find so easy, I also want to explore the workings of elementary physics, and mysterious maneuvers — which physicists seem to find so natural — by which one reduces a complicated physical problem to a simple mathematical question, which I have always found so hard to fathom.

That’s exactly how I feel about physics. I’m comfortable with differential equations and manifolds. It’s blocks and pulleys that kick my butt.

Can regular expressions parse HTML or not?

Can regular expressions parse HTML? There are several answers to that question, both theoretical and practical.

First, let’s look at theoretical answers.

When programmers first learn about regular expressions, they often try to use them on HTML. Then someone wise will tell them “You can’t do that. There’s a computer science theorem that says regular expressions are not powerful enough.” And that’s true, if you stick to the original meaning of “regular expression.”

But if you interpret “regular expression” the way it is commonly used today, then regular expressions can indeed parse HTML. This post by Nikita Popov explains that what programmers commonly call regular expressions, such as PCRE (Perl compatible regular expressions), can match context-free languages.

Well-formed HTML is context-free. So you can match it using regular expressions, contrary to popular opinion.

So according to computer science theory, can regular expressions parse HTML? Not by the original meaning of regular expression, but yes, PCRE can.

Now on to the practical answers. The next lines in Nikita Popov’s post say

But don’t forget two things: Firstly, most HTML you see in the wild is not well-formed (usually not even close to it). And secondly, just because you can, doesn’t mean that you should.

HTML in the wild can be rather wild. On the other hand, it can also be simpler than the HTML grammar allows. In practice, you may be able to parse a particular bit of HTML with regular expressions, even old fashioned regular expressions. It depends entirely on context, your particular piece of (possibly malformed) HTML and what you’re trying to do with it. I’m not advocating regular expressions for HTML parsing, just saying that the question of whether they work is complicated.

This opens up an interesting line of inquiry. Instead of asking whether strict regular expressions can parse strict HTML, you could ask what is the probability that a regular expression will succeed at a particular task for an HTML file in the wild. If you define “HTML” as actual web pages rather than files conforming to a particular grammar, every technique will fail with some probability. The question is whether that probability is acceptable in context, whether using regular expressions or any other technique.

Related post: Coming full circle

For daily tips on regular expressions, follow @RegexTip on Twitter.

Regex tip icon

Free damped vibrations

This is the second post in a series on vibrations determine by the equation

m u'' + γ u' + k u = F cos ωt

The first post in the series looked at the simplest case, γ = 0 and F = 0. Now we’ll look at the more realistic and more interesting case of damping γ > 0. We won’t consider forcing yet, so F = 0 and our equation reduces to

m u'' + γ u' + k u = 0.

The effect of damping obviously depends on the amount of damping, i.e. the size of γ relative to other terms. Damping removes energy from the system and so the amplitude of the oscillations goes to zero over time, regardless of the amount of damping. However, the system can have three qualitatively different behaviors: under-damping, critical damping, and over-damping.

The characteristic polynomial of the equation is

m x2 + γ x + k

This polynomial has either two complex roots, one repeated real root, or two real roots depending on whether the discriminant γ2 – 4mk is negative, zero, or positive. These three states correspond to under-damping, critical damping, and over-damping.


When damping is small, the system vibrates at first approximately as if there were no damping, but the amplitude of the solutions decreases exponentially. As long as γ2 < 4mk the system is under-damped and the solution is

u(t) = R exp( –γt/2m ) cos( μt– φ )


μ = √(4mk – γ2) / 2m.

The amplitude R and the phase φ are determined by the initial conditions.

As an example, let γ =1, m = 1, and k = 5. This implies μ =  √19/2. Set the initial conditions so that R = 6 and φ = 1. The solid blue line below is the solution 6 exp(-t/2) cos(t– 1), and the dotted green lines above and below are the exponential envelope 6 exp(-t/2).

And here is a video of the mass-spring-dashpot system in action made using the code discussed here.

Note that the solution is not simply an oscillation at the natural frequency multiplied by an exponential term. The damping changes the frequency of the periodic term, though the change is small when the damping is small.

The factor μ is called the quasi-frequency. Recall from the previous post that the natural frequency, the frequency of the solution with no damping, is denoted ω0. The quasi-frequency is related to the natural frequency by

μ = √(1 – γ2/4mk) ω0 ≈ (1 – γ2/8mk) ω0.

The approximation above holds for small γ since a Taylor expansion shows that √(1 – x) ≈ 1 – x/2 for small x.

This says that the quasi-frequency decreases the natural frequency by a factor of approximately γ2/8mk. So when γ is small there is little change to the natural frequency, but the amount of change grows quadratically as γ increases.

Critical damping

Critical damping occurs when γ2 = 4mk. In this case the solution is given by

u(t) = (A + Bt) exp( – γt/2m).

The position of the mass will asymptotically approach 0. Depending on the initial velocity, it will either go monotonically to zero or reach some maximum displacement before it turns around and goes to 0.

For an example, let m = 1 and k = 5 as before. For critical damping, γ must equal √20. Choose initial conditions so that A = 1 and B = 10.


When γ2 > 4mk the system is over-damped. When damping is that large, the system does not oscillate at all.

The solution is

u(t) = A exp(r1 t) + B exp(r2 t)

where r1 and r2 are the two roots of

m x2 + γ x + k = 0.

Because γ > 0, both roots are negative and solutions decay exponentially to 0.

Coming up

The next two posts in the series will add a forcing term, i.e. F > 0 in our differential equation. The next post will look at undamped driven vibrations, and the final post will look at damped driven vibrations.

Click to find out more about consulting for numerical computing

Python animation for mechanical vibrations

Stéfan van der Walt wrote some Python code to animate the system described in yesterday’s post on mechanical vibrations.

Stéfan posted his code on github. It currently illustrates undamped free vibrations, but could be modified to work with damped or driven vibrations.

The code includes an option to save the output as an mp4 file. You must have ffmpeg installed to save a matplotlib animation as mp4.

Fourier series before Fourier

I always thought that Fourier was the first to come up with the idea of expressing general functions as infinite sums of sines and cosines. Apparently this isn’t true.

The idea that various functions can be described in terms of Fourier series … was for the first time proposed by Daniel Bernoulli (1700–1782) to solve the one-dimensional wave equation (the equation of motion of a string) about 50 years before Fourier. … However, no one contemporaneous to D. Bernoulli accepted the idea as a general method, and soon the study was forgotten.

Source: The Nonlinear World

Perhaps Fourier’s name stuck to the idea because he developed it further than Bernoulli did.

Related posts:

History of weather prediction

I’ve just started reading Invisible in the Storm: The Role of Mathematics in Understanding Weather, ISBN 0691152721.

The subtitle may be a little misleading. There is a fair amount of math in the book, but the ratio of history to math is pretty high. You might say the book is more about the role of mathematicians than the role of mathematics. As Roger Penrose says on the back cover, the book has “illuminating descriptions and minimal technicality.”

Someone interested in weather prediction but without a strong math background would enjoy reading the book, though someone who knows more math will recognize some familiar names and theorems and will better appreciate how they fit into the narrative.

Related posts: