The terms covariant and contravariant come up in many contexts. An earlier post discussed how the terms are used in programming and category theory. The meaning in programming is an instance of the general use in category theory.

Vector fields can be covariant or contravariant too. This is also an instance of the categorical usage, except the terminology is backward.

Michael Spivak explains:

Nowadays such situations are always distinguished by calling the things which go in the same direction “covariant” and the things which go in the opposite direction “contravariant.” Classical terminology used these same words, and it just happens to have reversed this: a vector field is called a contravariant vector field, while a section of T*M is called a covariant vector field. And no one had the gall or authority to reverse terminology sanctified by years of usage. So it’s very easy to remember which kind of vector field is covariant, and which is contravariant — it’s just the opposite of what it logically ought to be.

Emphasis added.

In defense of classical nomenclature, it was established decades before category theory. And as Spivak explains immediately following the quotation above, the original terminology made sense in its original context.

From Spivak’s Differential Geometry, volume 1. I own the 2nd edition and quoted from it. But it’s out of print so I linked to the 3rd edition. I doubt the quote changed between editions, but I don’t know.

I started a discussion on Twitter this evening about consulting company names. Here are some of the names.

Turing Machine Computing: If we can’t do it, it can’t be done.

Heisenberg Consulting: You can have speed or quality, but not both at the same time.

Perelman Consulting: Please don’t pay us. We don’t want your money.

Gödel Systems: Your job is done but we can’t prove it.

Gödel Consulting: because no one is supplying ALL your needs.

Lebesgue Consulting: We’ve got your measure.

Noether Consulting: We find the conserved values of your system.

Fourier consulting: We transform your world periodically.

Zorn’s Consulting: Your choice is axiomatic.

Spherical Computing: Without parallel.

Markov Chain Consulting: It doesn’t matter how we got here.

Dirac Consulting: We get right to the point.

Shannon Consulting: We’ll find a way to deliver your message.

Neyman & Pearson Consulting: No one is more powerful than us.

Complex Conjugate Consulting: We make your product real.

Hadamard Consulting: Real solutions by complex methods.

Zeno Consulting: We’ll get you arbitrarily close to where you want to be.

Hilbert Consulting: You think you have a problem?

Riemann Hypothesis Consulting: When your job is on the line and everything is critical

Here are footnotes explaining the puns above.

Turing: In computer science, Turing machines define the limits of what is computable.

Heisenberg: The Heisenberg Uncertainty Principle says that there is a limit to how well you can know a particle’s momentum and position. The more accurately you know one, the less you know about the other.

Perelman: Turned down prize money from the Fields Institute and Clay Institute after solving the Poincaré conjecture.

Gödel: His incompleteness theorem says that number theory contains true theorems that cannot be proved.

Lebesgue: Founder of measure theory, a rigorous theory of length, area, volume, etc.

Noether: Established a deep connection between symmetry and conservation laws.

Fourier: Known for Fourier transforms and Fourier series, expressing functions as sums or integrals of periodic functions.

Zorn: Known for Zorn’s lemma, equivalent to the axiom of choice.

Spherical: There are no parallel lines in spherical geometry.

Markov Chain: The probability distribution for the next move in a Markov chain depends only on the current state and not on previous history.

Complex Conjugate: A complex number times its conjugate is a real number. See xkcd.

Dirac: The reference here is to the Dirac delta function. Informally, a point mass. Formally, a distribution.

Shannon: Founder of communication theory.

Neyman-Pearson: The Neyman-Pearson lemma concerns most powerful hypothesis tests.

Hadamard: Said “The shortest path between two truths in the real domain passes through the complex domain.” That is, techniques from complex analysis are often the easiest way to approach problems from real analysis.

Zeno: Zeno’s paradox says you cannot get anywhere because first you have to get halfway there, then halfway again, etc.

Hilbert: Created a famous list of 23 research problems in math in 1900.

Riemann: The Riemann hypothesis says that all the non-trivial zeros of the Riemann zeta function line on the critical line Re(z) = 1/2.

Pick a number x between 0 and 1. Then repeatedly replace x with 4x(1-x). For almost all starting values of x, the result exhibits chaos. Two people could play this game with starting values very close together, and eventually their sequences will diverge.

It’s somewhat surprising that the iterative process described above can be written down in closed form. Starting from a value x_{0}, the value after n iterations is

sin( 2^{n} arcsin( √ x_{0} ) )^{2}.

Now suppose two people start with the same initial value. One repeatedly applies 4x(1-x) and the other uses the formula above. If both carried out their calculations exactly, both would produce the same output at every step. But what if both used a computer?

The two approaches correspond to the Python functions f and g below. Because both functions are executed in finite precision arithmetic, both have errors, but they have different errors. Suppose we want to look at the difference between the two functions as we increase n.

from scipy import arcsin, sin, sqrt, linspace
from matplotlib import pyplot as plt
def f(x0, n):
x = x0
for _ in range(n):
x = 4*x*(1-x)
return x
def g(x0, n):
return sin(2.0**n * arcsin(sqrt(x0)))**2
n = 40
x = linspace(0, 1, 100)
plt.plot(x, f(x, n) - g(x, n))
plt.ylim(-1, 1)
plt.show()

When we run the code, nothing exciting happens. The difference is a flat line.

Next we increase n to 45 and we start to see the methods diverge.

The divergence is large when n is 50.

And the two functions are nearly completely uncorrelated when n is 55.

Update

So which function is more accurate, f or g? As noted in the comments, the two functions have different kinds of numerical errors. The former accumulates arithmetic precision error at each iteration. The latter shifts noisy bits into significance by multiplying by 2^n. Apparently both about the same overall error, though they have different distributions of error.

I recomputed g using 100-digit precision with mpmath and used the results as the standard to evaluate the output of f and g in ordinary precision. Here’s a plot of the errors when n = 45, first with f

and then with g.

The average absolute errors of f and g are 0.0024 and 0.0015 respectively.

Rachel Kroll wrote a blog post last Friday entitled I mortgaged my future with a Mac. The part I found most interesting is near the end of the post.

Instead of staying with my wonky-but-free ways of doing things, I shifted all of my stuff over to the Mac. … Now when I want to get back out, I have to do all of the work I thought I had managed to avoid by using a Mac in the first place.

I’m not interested in the pros and cons of using a Mac, but I really liked the phrase “wonky but free.” I have a growing appreciation for things that are wonky-but-free, technologies that make a poor first impression but have long-term benefits.

Here’s an amusing sentence I ran across this morning:

The code was simplified in 2003 and is harder to understand.

Maybe the author is being sarcastic, but I doubt it. I believe he means something like this:

In 2003, the code was made more concise. This made it simpler from the perspective of the maintainers of the code, sophisticated programmers well versed with the project. Unfortunately, this makes the code harder for novices, such as the intended audience for this book, to understand.

See Simple versus easy for a discussion of what it means to be simple. We often conflate the two terms. Rich Hickey argues that ease is subjective but that simplicity can be objectively defined. He makes an interesting case, though his definition of simplicity may not be what people have in mind or want when they talk about simplicity.

Toward the end of his life, Beethoven added metronome markings to the scores of his symphonies to indicate exactly how fast they should be performed. The tempos indicated in the scores are consistently faster than how the symphonies are usually performed.

In addition to presenting the advanced physics, which mathematicians find so easy, I also want to explore the workings of elementary physics, and mysterious maneuvers — which physicists seem to find so natural — by which one reduces a complicated physical problem to a simple mathematical question, which I have always found so hard to fathom.

That’s exactly how I feel about physics. I’m comfortable with differential equations and manifolds. It’s blocks and pulleys that kick my butt.

Can regular expressions parse HTML? There are several answers to that question, both theoretical and practical.

First, let’s look at theoretical answers.

When programmers first learn about regular expressions, they often try to use them on HTML. Then someone wise will tell them “You can’t do that. There’s a computer science theorem that says regular expressions are not powerful enough.” And that’s true, if you stick to the original meaning of “regular expression.”

But if you interpret “regular expression” the way it is commonly used today, then regular expressions can indeed parse HTML. This post by Nikita Popov explains that what programmers commonly call regular expressions, such as PCRE (Perl compatible regular expressions), can match context-free languages.

Well-formed HTML is context-free. So you can match it using regular expressions, contrary to popular opinion.

So according to computer science theory, can regular expressions parse HTML? Not by the original meaning of regular expression, but yes, PCRE can.

Now on to the practical answers. The next lines in Nikita Popov’s post say

But don’t forget two things: Firstly, most HTML you see in the wild is not well-formed (usually not even close to it). And secondly, just because you can, doesn’t mean that you should.

HTML in the wild can be rather wild. On the other hand, it can also be simpler than the HTML grammar allows. In practice, you may be able to parse a particular bit of HTML with regular expressions, even old fashioned regular expressions. It depends entirely on context, your particular piece of (possibly malformed) HTML and what you’re trying to do with it. I’m not advocating regular expressions for HTML parsing, just saying that the question of whether they work is complicated.

This opens up an interesting line of inquiry. Instead of asking whether strict regular expressions can parse strict HTML, you could ask what is the probability that a regular expression will succeed at a particular task for an HTML file in the wild. If you define “HTML” as actual web pages rather than files conforming to a particular grammar, every technique will fail with some probability. The question is whether that probability is acceptable in context, whether using regular expressions or any other technique.

This is the second post in a series on vibrations determine by the equation

mu'' + γ u' + k u = F cos ωt

The first post in the series looked at the simplest case, γ = 0 and F = 0. Now we’ll look at the more realistic and more interesting case of damping γ > 0. We won’t consider forcing yet, so F = 0 and our equation reduces to

mu'' + γ u' + k u = 0.

The effect of damping obviously depends on the amount of damping, i.e. the size of γ relative to other terms. Damping removes energy from the system and so the amplitude of the oscillations goes to zero over time, regardless of the amount of damping. However, the system can have three qualitatively different behaviors: under-damping, critical damping, and over-damping.

The characteristic polynomial of the equation is

mx^{2} + γ x + k

This polynomial has either two complex roots, one repeated real root, or two real roots depending on whether the discriminant γ^{2} – 4mk is negative, zero, or positive. These three states correspond to under-damping, critical damping, and over-damping.

Under-damping

When damping is small, the system vibrates at first approximately as if there were no damping, but the amplitude of the solutions decreases exponentially. As long as γ^{2} < 4mk the system is under-damped and the solution is

u(t) = R exp( –γt/2m ) cos( μt– φ )

where

μ = √(4mk – γ^{2}) / 2m.

The amplitude R and the phase φ are determined by the initial conditions.

As an example, let γ =1, m = 1, and k = 5. This implies μ = √19/2. Set the initial conditions so that R = 6 and φ = 1. The solid blue line below is the solution 6 exp(-t/2) cos(t– 1), and the dotted green lines above and below are the exponential envelope 6 exp(-t/2).

And here is a video of the mass-spring-dashpot system in action made using the code discussed here.

Note that the solution is not simply an oscillation at the natural frequency multiplied by an exponential term. The damping changes the frequency of the periodic term, though the change is small when the damping is small.

The factor μ is called the quasi-frequency. Recall from the previous post that the natural frequency, the frequency of the solution with no damping, is denoted ω_{0}. The quasi-frequency is related to the natural frequency by

The approximation above holds for small γ since a Taylor expansion shows that √(1 – x) ≈ 1 – x/2 for small x.

This says that the quasi-frequency decreases the natural frequency by a factor of approximately γ^{2}/8mk. So when γ is small there is little change to the natural frequency, but the amount of change grows quadratically as γ increases.

Critical damping

Critical damping occurs when γ^{2} = 4mk. In this case the solution is given by

u(t) = (A + Bt) exp( – γt/2m).

The position of the mass will asymptotically approach 0. Depending on the initial velocity, it will either go monotonically to zero or reach some maximum displacement before it turns around and goes to 0.

For an example, let m = 1 and k = 5 as before. For critical damping, γ must equal √20. Choose initial conditions so that A = 1 and B = 10.

Over-damping

When γ^{2} > 4mk the system is over-damped. When damping is that large, the system does not oscillate at all.

The solution is

u(t) = A exp(r_{1}t) + B exp(r_{2}t)

where r_{1} and r_{2} are the two roots of

mx^{2} + γ x + k = 0.

Because γ > 0, both roots are negative and solutions decay exponentially to 0.

Coming up

The next two posts in the series will add a forcing term, i.e. F > 0 in our differential equation. The next post will look at undamped driven vibrations, and the final post will look at damped driven vibrations.

I always thought that Fourier was the first to come up with the idea of expressing general functions as infinite sums of sines and cosines. Apparently this isn’t true.

The idea that various functions can be described in terms of Fourier series … was for the first time proposed by Daniel Bernoulli (1700–1782) to solve the one-dimensional wave equation (the equation of motion of a string) about 50 years before Fourier. … However, no one contemporaneous to D. Bernoulli accepted the idea as a general method, and soon the study was forgotten.

The subtitle may be a little misleading. There is a fair amount of math in the book, but the ratio of history to math is pretty high. You might say the book is more about the role of mathematicians than the role of mathematics. As Roger Penrose says on the back cover, the book has “illuminating descriptions and minimal technicality.”

Someone interested in weather prediction but without a strong math background would enjoy reading the book, though someone who knows more math will recognize some familiar names and theorems and will better appreciate how they fit into the narrative.