Schwarzian derivative

There are many ways the basic derivative can be generalized: partial derivatives, directional derivatives, covariant derivatives, etc. These all reduce to the basic derivative under the right circumstances.

The Schwarzian derivative is not like that. It’s not a generalization of the familiar derivative but rather a differential operator analogous to a derivative. The Schwarzian derivative of a function f is defined [1] as

S(f) = \left(\frac{f''}{f'}\right)' - \frac{1}{2} \left(\frac{f''}{f'}\right)^2

To understand the motivation behind such an arbitrary-looking definition, we need to first look at functions of the form

g(z) = \frac{az + b}{cz + d}

called Möbius transformations, or more descriptively, fractional linear transformations [2]. These transformations are very important in complex analysis and have a lot of interesting properties. For example, the image of a circle in the complex plane under a Möbius transformation is another circle [3].

Möbius transformations are to the Schwarzian derivative roughly what constants are to the ordinary derivative. A function is a Möbius transformation if and only if its Schwarzian derivative is zero.

Since the Schwarzian derivative is defined in terms of ordinary derivatives, adding a constant to a function doesn’t change its Schwarzian derivative. Furthermore, the Schwarzian derivative is defined in terms of the ratio of ordinary derivatives, so multiplying a function by a constant doesn’t change its Schwarzian derivative either.

Even more generally, applying a Möbius transformation to a function doesn’t change its Schwarzian derivative. That is, for a fractional linear transformation like g(z) above

S(g \circ f) = S(f)

for any function f. So you can pull Möbius transformations out of a Schwarzian derivative sorta like the way you can pull constants out of an ordinary derivative. The difference though is that instead of the Möbius transformation moving to the outside, it simply disappears.

You can think of the Schwarzian derivative as measuring how well a function can be approximated by a Möbius transformation. Schwarzian derivatives come up frequently in applications of complex analysis, such as conformal mapping.

More kinds of derivatives

[1] The Schwarzian derivative of a constant function is defined to be zero.

[2] Möbius transformations require adbc to not equal zero.

[3] For this to always be true, you have to include a line as a special case of a circle, a circle of infinite radius if you like. If you don’t like that definition, then you can rephrase the statement above as saying Möbius transformations map circles and lines to circles and lines.

Searching Greek and Hebrew with regular expressions

According to the Python Cookbook, “Mixing Unicode and regular expressions is often a good way to make your head explode.” It is thus with fear and trembling that I dip my toe into using Unicode with Greek and Hebrew.

I heard recently that there are anomalies in the Hebrew Bible where the final form of a letter is deliberately used in the middle of a word. That made me think about searching for such anomalies with regular expressions. I’ll come back to that shortly, but I’ll start by looking at Greek where things are a little simpler. Continue reading

Descartes and Toolz

I was looking recently at the Python module toolz, a collection of convenience functions. A lot of these functions don’t do that much. They don’t save you much code, but they do make your code more readable by making it more declarative. You may not realize need them until you see them.

For example, there is a function partitionby that breaks up a sequence at the points where a given function’s value changes. I’m pretty sure that function would have improved some code I’ve written recently, making it more declarative than procedural, but I can’t remember what that was.

Although I can’t think of my previous example, I can think of a new one, and that is Descartes’ rule of signs.

Given a polynomial p(x), read the non-zero coefficients in order and keep note of how many times they change sign, either from positive to negative or vice versa. Call that number n. Then the number of positive roots of p(x) either equals n or n minus a positive even number.

For example, suppose

p(x) = 4x4 + 3.1x3x2 – 2x + 6.

The coefficients are 4, 3.1, -1, -2, and 6. The list of coefficients changes signs twice: from positive to negative, and from negative to positive. Here’s a first pass at how you might have Python split the coefficients to look sign changes.

    from toolz import partitionby

    coefficients = [4, 3.1, -1, -2, 6]
    parts = partitionby(lambda x: x > 0, coefficients)
    print([p for p in parts])

This prints

    [(4, 3.1), (-1, -2), (6,)]

The first argument to partitionby an anonymous function that tests whether its argument is positive. When this function changes value, we have a sign alteration. There are three groups of consecutive coefficients that have the same sign, so there are two times the signs change. So our polynomial either has two positive roots or no positive roots. (It turns out there are no positive roots.)

The code above isn’t quite right though, because Descartes said to only look at non-zero coefficients. If we change our anonymous function to

    lambda x: x >= 0

that will work for zeros in the middle of positive coefficients, but it will give a false positive for zeros in the middle of negative coefficients. We can fix the code with a list comprehension. The following example works correctly.

    coefficients = [4, 0, 3.1, -1, 0, -2, 6]
    nonzero = [c for c in coefficients if c != 0]
    parts = partitionby(lambda x: x > 0, nonzero)
    print([p for p in parts])

If our coefficients were in a NumPy array rather than a list, we could remove the zeros more succinctly.

    from numpy import array

    c = array(coefficients)
    parts = partitionby(lambda x: x > 0, c[c != 0])

The function partitionby returns an iterator rather than a list. That’s why we don’t just print parts above. Instead we print [p for p in parts] which makes a list. In applications, it’s often more efficient to have an iterator than a list, generating items if and when they are needed. If you don’t need all the items, you don’t have to generate them all. And even if you do need all the items, you could save memory by not keeping them all in memory at once. I’ll ignore such efficiencies here.

We don’t need the partitions per se, we just need to know how many there are. The example that escapes my mind would have been a better illustration if it needed to do more with each portion than just count it. We could count the number of sign alternations for Descartes rule as follows.

   len([p for p in parts]) - 1

Related posts

Driving vibrations with sawtooth waves

The previous post looked at driving a vibrating system with square waves rather than the more customary sine waves.

You could think of a square wave as a crude approximation to a sine wave. A sawtooth wave is another crude approximation to a sine wave, and so it would be interesting to see how systems driven by a sawtooth forcing function differ from those driven by a square or sinusoidal forcing function.

Sine wave, sawtooth wave, and square wave

With a square wave, the differences were most pronounced at low frequencies, i.e. when the driving frequency was low relative to the natural frequency. The same is true for sawtooth waves, but with some interesting differences.

As before we will look at the equation

u'' + u = sin(ωt)

and with sine replaced with a variation, this time a sawtooth wave rather than a square. We will use initial conditions u(0) = u‘(0) = 1.

We start with ω = 1, i.e. driving the system at its natural frequency.

Resonance with sawtooth forcing

We see the resonance we’d expect when driving a system at its natural frequency. The amplitudes are lower for the sawtooth forcing function than the sinusoidal forcing function.

When we back away a little from the resonant frequency, setting ω = 0.9, we see beats.

Sawtooth wave beats

When we reduce ω to 0.5, we get resonance again for the sawtooth forcing solutions.

Beats for sawtooth-driven solution

At ω = 0.4 we see something like beats again.

Sawtooth driven beats

And at ω = 1/3 it looks like we have resonance again for the sawtooth-driven system.

Sawtooth driven resonance

We also see beats at ω = 1/n for all integer n. I’ve tried this for n up to 12. But at other frequencies that are not reciprocals of integers I see periodic solutions.

I have only investigated this numerically. Do any of you know of analytical results that apply here?

Update: There’s a simple reason why driving frequencies of 1/n do indeed create resonance: The Fourier series for the driving function has a component at the resonant frequency. Thanks to gmvh for pointing this out.

The square wave has a similar resonance at driving frequency 1/n, but only when n is an odd number. I missed that because I didn’t happen to try any frequencies of that form. Here’s an example with n = 5.

Driving oscillations with square waves

What happens when you drive a vibrating system with a square wave rather than a sine wave? Do you still see the same kinds of behavior, such as beats and resonance? When does the difference between a square wave and a sine wave matter most? Those are the questions this post will address.


Basic mechanical oscillations are modeled by the equation

m u'' + γ u' + k u = F sin(ωt + φ)

You can think of m, γ, and k as mass, damping, and spring constant. The same equation describes non-mechanical oscillations, such as electrical systems. Sometimes the terms such as “mass” are still used when they are metaphorical rather than literal. I wrote a four-part series of posts on mechanical vibrations a while back starting here.

The term on the right hand side is the forcing function. F is the amplitude of the driving force and ω is its frequency.

The natural frequency of a system modeled by the equation above is

ω02 = k/m.

When F = 0, the solutions to the differential equation will have this frequency. When F is not zero, the solutions a component with the natural frequency and a component with the driving frequency. When the two frequencies are different, you get beats. When the two frequencies are the same, you get resonance. More on beats and resonance here.

In this post we will look at solutions to the equation above where the forcing function is a square wave rather than a sine wave. That is, we will replace

sin(ωt + φ)


sign( sin(ωt + φ) )



is 1 when x is positive and -1 when x is negative.


To simplify things a little, we will set the damping term γ to zero, and set the phase φ in the driving function to zero. Also, we will set m, k, and F all equal to 1. We will focus on varying only the driving frequency ω.

We need initial conditions for our differential equations, so let’s pick u(0) = 1 and u‘(0) = 0.

When we drive the system at its natural frequency, i.e. with ω = 1, we get the same kind of resonance from a sine wave and a square wave.

Square wave resonance

Update: There are more resonant frequencies [1].

When we lower the driving frequency to ω = 0.5 we see more of a difference.

Square beats omega = 0.5

In general we see more difference as ω gets smaller, and more difference when the period 1/ω is not an integer. For example, here is ω = 0.25, period T= 4:

Square wave beats omega = 0.25

And here is ω = 0.3, period T = 1.66.

Square wave beats, omega = 0.3Here’s an example of driving a system with a square wave at a frequency higher than the natural frequency.

Square wave beats, omega = 1.7

Here the difference between the solutions for the square wave and sine wave are closer together. Apparently the higher frequency makes more difference than the non-integer period.

In the examples above, the solution with a square wave forcing function starts out flat. That’s because of our initial conditions u(0) = 1 and u‘(0) = 0. If we change the initial velocity condition to u‘(0) = 1, we get smoother solutions.

Here are the solutions with u‘(0) = 1 and ω = 0.25

Square wave beats, omega = 0.25, initial velocity 1

and ω = 0.3.

Square wave beats, omega = 0.30, initial velocity 1

Changing the initial velocity made more of a difference when the period was an integer.

See the next post for systems driven by a sawtooth wave rather than a square wave.

Related posts

[1] Sine-driven systems exhibit resonance if and only if the driving frequency matches the natural frequency. While writing the next post on sawtooth forcing functions, I accidentally discovered resonance at lower frequencies. The same can happen here with square wave-driven systems if ω = 1/(2n + 1). For example, here’s a plot with ω = 1/5. The reason resonance shows up for odd periods is that the square wave has Fourier components at every odd frequency.

There’s more going on here

At a new faculty orientation, a professor encouraged us rookies to teach intro courses and to keep coming back to teach them periodically. I didn’t fully appreciate what he said at the time, though I remembered it, even though I left academia a couple years later.

Now I think I have an idea what he was referring to. There’s a LOT of stuff swept under the rug, out of necessity, when teaching intro courses. The students think they’re starting at the beginning, and maybe junior faculty think the same thing, but they’re really starting in medias res.

For example, Michael Spivak’s Physics for Mathematicians makes explicit many of the implicit assumptions in a freshman mechanics class. Hardly anyone could learn physics if they had to start with Spivak. Instead, you do enough homework problems that you intuitively get a feel for things you can’t articulate and don’t fully understand. But it’s satisfying to read Spivak later and feel justified in thinking that things didn’t quite add up.

When you learn to read English, you’re told a lot of half-truths or quarter-truths. You’re told, for example, that English has 10 vowel sounds, when in reality it has more. Depending on how you count them, there are more than 20 vowel sounds in English. A child learning to read shouldn’t be burdened with a college-level course in phonetics, so it’s appropriate not to be too candid about the complexities of language at first.

It would have been easier for me to teach statistics when I was fresh out of college rather than teaching a few courses while I was working at MD Anderson. As a fresh graduate I could have taught out of a standard textbook in good conscience. By the time I did teach statistics classes, I was aware of how much material was not completely true or not practical.

I was thinking this morning about how there’s much more going on in a simple change of coordinates than is apparent at first. Tensor calculus is essentially the science of changing coordinates. It points out hidden structure, and creates conventions for making calculations manageable and for reducing errors. That’s not to say tensor calculus is easy but rather to say that changes of coordinates are hard.

Related post: Coming full circle


The factorial of a positive integer n is the product of the numbers from 1 up to and including n:

n! = 1 × 2 × 3 × … × n.

The superfactorial of n is the product of the factorials of the numbers from 1 up to and including n:

S(n) = 1! × 2! × 3! × … × n!.

For example,

S(5) = 1! 2! 3! 4! 5! = 1 × 2 × 6 × 24 × 120 = 34560.

Here are three examples of where superfactorial pops up.

Vandermonde determinant

If V is the n by n matrix whose ij entry is ij-1 then its determinant is S(n-1). For instance,

\begin{vmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \\ 1 & 3 & 9 & 27 \\ 1 & 4 & 16& 64 \end{vmatrix} = S(3) = 3!\, 2!\, 1! = 12

V is an example of a Vandermonde matrix.

Permutation tensor

One way to define the permutation symbol uses superfactorial:

\epsilon_{i_1 i_2 \cdots i_n} = \frac{1}{S(n-1)} \prod_{1 \leq k < j \leq n} (i_j - i_k)

Barnes G-function

The Barnes G-function extends superfactorial to the complex plane analogously to how the gamma function extends factorial. For positive integers n,

G(n) = \prod_{k=1}^{n-1}\Gamma(k) = S(n-2)

Here’s plot of G(x)

produced by

    Plot[BarnesG[x], {x, -2, 4}]

in Mathematica.

More posts related to factorial

Symplectic Euler

This post will look at simple numerical approaches to solving the predator-prey (Lotka-Volterra) equations. It turns out that the simplest approach does poorly, but a slight variation does much better.

Following [1] we will use the equations

u‘ = u (v – 2)
v‘ = v (1 – u)

Here u represents the predator population over time and v represents the prey population. When the prey v increase, the predators u increase, leading to a decrease in prey, which leads to a decrease in predators, etc. The exact solutions are periodic.

Euler’s method replaces the derivatives with finite difference approximations to compute the solution in increments of time of size h. The explicit Euler method applied to our example gives

u(th) = u(t) + h u(t) (v(t) – 2)
v(th) = v(t) + h v(t) (1 – u(t)).

The implicit Euler method gives

u(th) = u(t) + h u(t + h) (v(t + h) – 2)
v(th) = v(t) + h v(t + h) (1 – u(t + h)).

This method is implicit because the unknowns, the value of the solution at the next time step, appear on both sides of the equation. This means we’d either need to do some hand calculations first, if possible, to solve for the solutions at time t + h, or use a root-finding method at each time step to solve for the solutions.

Implicit methods are more difficult to implement, but they can have better numerical properties. See this post on stiff equations for an example where implicit Euler is much more stable than explicit Euler. I won’t plot the implicit Euler solutions here, but the implicit Euler method doesn’t do much better than the explicit Euler method in this example.

It turns out that a better approach than either explicit Euler or implicit Euler in our example is a compromise between the two: use explicit Euler to advance one component and use implicit Euler on the other. This is known as symplectic Euler for reasons I won’t get into here but would like to discuss in a future post.

If we use explicit Euler on the predator equation but implicit Euler on the prey equation we have

u(th) = u(t) + h u(t) (v(t + h) – 2)
v(th) = v(t) + h v(t + h) (1 – u(t)).

Conversely, if we use implicit Euler on the predator equation but explicit Euler on the prey equation we have

u(th) = u(t) + h u(t + h) (v(t) – 2)
v(th) = v(t) + h v(t) (1 – u(t + h)).

Let’s see how explicit Euler compares to either of the symplectic Euler methods.

First some initial setup.

    import numpy as np

    h  = 0.08  # Step size
    u0 = 6     # Initial condition
    v0 = 2     # Initial condition
    N  = 100   # Numer of time steps

    u = np.empty(N)
    v = np.empty(N)
    u[0] = u0
    v[0] = v0

Now the explicit Euler solution can be computed as follows.

    for n in range(N-1):
        u[n+1] = u[n] + h*u[n]*(v[n] - 2)
        v[n+1] = v[n] + h*v[n]*(1 - u[n])

The two symplectic Euler solutions are

    for n in range(N-1):
        v[n+1] = v[n]/(1 + h*(u[n] - 1))
        u[n+1] = u[n] + h*u[n]*(v[n+1] - 2)


    for n in range(N-1):
        u[n+1] = u[n] / (1 - h*(v[n] - 2))
        v[n+1] = v[n] + h*v[n]*(1 - u[n+1])

Now let’s see what our solutions look like, plotting (u(t), v(t)). First explicit Euler applied to both components:

And now the two symplectic methods, applying explicit Euler to one component and implicit Euler to the other.

Next, let’s make the step size 10x smaller and the number of steps 10x larger.

Now the explicit Euler method does much better, though the solutions are still not quite periodic.

The symplectic method solutions hardly change. They just become a little smoother.

More differential equations posts

[1] Ernst Hairer, Christian Lubich, Gerhard Wanner. Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations.

Identifying someone from their heart beat

electrocardiogram of a toddler

How feasible would it be to identify someone based from electrocardiogram (EKG, ECG) data? (Apparently the abbreviation “EKG” is more common in America and “ECG” is more common in the UK.)

Electrocardiograms are unique, but unique doesn’t necessarily mean identifiable. Unique data isn’t identifiable without some way to map it to identities. If you shuffle a deck of cards, you will probably produce an arrangement that has never occurred before. But without some sort of registry mapping card deck orders to their shufflers, there’s no chance of identification. (For identification, you’re better off dusting the cards for fingerprints, because there are registries of fingerprints.)

According to one survey [1], researchers have tried a wide variety of methods for identifying people from electrocardiograms. They’ve used time-domain features such as peak amplitudes, slopes, variances, etc., as well as a variety of frequency-domain (DFT) features. It seems that all these methods work moderately well, but none are great, and there’s no consensus regarding which approach is best.

If you have two EKGs on someone, how readily can you tell that they belong to the same person? The answer depends on the size of the set of EKGs you’re comparing it to. The studies surveyed in [1] do some sort of similarity search, comparing a single EKG to tens of candidates. The methods surveyed had an overall success rate of around 95%. But these studies were based on small populations; at least at the time of publication no one had looked at matching an single EKG against thousands of possible matches.

In short, an electrocardiogram can identify someone with high probability once you know that they belong to a relatively small set of people for which you you have electrocardiograms.

More identification posts

[1] Antonio Fratini et al. Individual identification via electrocardiogram analysis. Biomed Eng Online. 2015; 14: 78. doi 10.1186/s12938-015-0072-y

Overestimating the number of idiots

A comment on one of my recent blog posts on Gray codes led me to an article by Mark Dominus Gray code at the pediatrician’s office, which led me to his article explaining why the pediatrician used what was apparently an unnecessarily sophisticated piece of equipment.

Mark segues from appreciating the pediatrician’s stadiometer purchase to appreciating source code that he initially thought was idiotic.

Time and time again people would send me perfectly idiotic code, and when I asked why they had done it that way the answer was not that they were idiots, but that there was some issue I had not appreciated, some problem they were trying to solve that was not apparent. … These appeared at first to be insane, but on investigation turned out to be sane but clumsy. … [A]ssume that bad technical decisions are made rationally, for reasons that are not apparent.

The last sentence deserves to be widely used. I’d suggest calling it Dominus’s law, but unfortunately Mark’s name ends in “s”, and that lowers the probability of a possessive form of his name catching on as an eponymous law. However, there is a Gauss’s law and a few other similar examples, so maybe the name will catch on.