Solving hard problems

We help companies solve hard problems in mathematics, statistics, and computing. Let’s explore how we might work together.

Solving Van der Pol equation with ivp_solve

Posted on 22 December 2019 by John

Van der Pol’s differential equation is

${d^2x \over dt^2}-\mu(1-x^2){dx \over dt}+x= 0$

The equation describes a system with nonlinear damping, the degree of nonlinearity given by μ. If μ = 0 the system is linear and undamped, but as μ increases the strength of the nonlinearity increases. We will plot the phase portrait for the solution to Van der Pol’s equation in Python using SciPy’s new ODE solver ivp_solve.

The function ivp_solve does not solve second-order systems of equations directly. It solves systems of first-order equations, but a second-order differential equation can be recast as a pair of first-order equations by introducing the first derivative as a new variable.

$\begin{align*} {dx \over dt} &= y \\ {dy \over dt}&= \mu(1-x^2)y -x \\ \end{align*}$

Since y is the derivative of x, the phase portrait is just the plot of (x, y).

Phase portait of Van der Pol oscillator

If μ = 0, we have a simple harmonic oscillator and the phase portrait is simply a circle. For larger values of μ the solutions enter limiting cycles, but the cycles are more complicated than just circles. These limiting cycles are periodic attractors: every non-trivial solution converges to the limit cycle.

Here’s the Python code that made the plot.

from scipy import linspace
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt

def vdp(t, z):
    x, y = z
    return [y, mu*(1 - x**2)*y - x]

a, b = 0, 10

mus = [0, 1, 2]
styles = ["-", "--", ":"]
t = linspace(a, b, 500)

for mu, style in zip(mus, styles):
    sol = solve_ivp(vdp, [a, b], [1, 0], t_eval=t)
    plt.plot(sol.y[0], sol.y[1], style)
  
# make a little extra horizontal room for legend
plt.xlim([-3,3])    
plt.legend([f"$\mu={m}$" for m in mus])
plt.axes().set_aspect(1)

To plot the solutions as a function of time, rather than plotting phase portraits, change the line

    plt.plot(sol.y[0], sol.y[1], style)

    plt.plot(sol.t, sol.y[0], style)

and comment out the line setting xlim This gives the following plot.

Van der Pol oscillator solutions as a function of time

Exponential Christmas wreath

Posted on 20 December 2019 by John

Today’s exponential sum looks sorta like a Christmas wreath with candles.

Exponential sum 2019-12-20

Let me back up and say what “today’s exponential sum” means. I have a page that plots a graph each day, where the function being graphed takes parameters from the date. It plots the partial sums of

$\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )$

where m is the month, d is the day, and y is the last two digits of the year.

Sometimes by coincidence the plots happen to look like something associated with the date, such as wreaths around Christmas. The plot for Christmas Eve this year looks like a branch of holly, and there’s a nice wreath on New Year’s Eve.

As I blogged about before, the images on Easter sometimes look like crosses. The image for Easter 2018 looked like an Iron Cross, and the image for Easter 2019 was a much more ornate cross. But the image for Easter 2020 looks nothing like a cross.

Convergence rate of random walk on integers mod p

Posted on 17 December 2019 by John

Consider a random walk on the integers mod p. At each step, you either take a step forward or a step back, with equal probability. After just a few steps, you’ll have to be near where you started. After a few more steps, you could be far from your starting point, but you’re probably not. But eventually, every point is essentially equally likely.

How many steps would you have to take before your location is uniformly distributed? There’s a theorem that says you need about p² steps. We’ll give the precise statement of the theorem shortly, but first let’s do a simulation.

We’ll take random walks on the integers mod 25. You could think of this as a walk on a 24-hour clock, with one extra hour squeezed in. Suppose we want to take N steps. We would generate a +1 or -1 at random, add that to our current position, and reduce mod 25, doing that N times. That would give us one outcome of a random walk with N steps. We could do this over and over, keeping track of where each random walk ended up, to get an idea how the end of the walk is distributed.

We will do an optimization to speed up the simulation. Suppose we generated all our +1s and -1s at once. Then we just need to add up the number of +1’s and subtract the number of -1’s. The number n of +1’s is a binomial(N, 1/2) random variable, and the number of +1’s minus the number of -1’s is 2n – N. So our final position will be

(2n – N) mod 25.

Here’s our simulation code.

    from numpy import zeros
    from numpy.random import binomial
    import matplotlib.pyplot as plt

    p = 25
    N = p*p//2
    reps = 100000

    count = zeros(p)

    for _ in range(reps):
        n = 2*binomial(N, 0.5) - N
        count[n%p] += 1

    plt.bar(range(p), count/reps)
    plt.show()

After doing half of p² steps the distribution is definitely not uniform.

Distribution after p^2/2 steps

But after p² steps the distribution is close to uniform, close enough that you start to wonder how much of the lack of uniformity is due to simulation.

Distribution after p^2 steps

Now here’s the theorem I promised, taken from Group Representations in Probability and Statistics by Persi Diaconis.

For n ≥ p² with p odd and greater than 7, the maximum difference between the random walk density after n steps and the uniform density is no greater than exp(- π²n / 2p²).

There is no magic point at which the distribution suddenly becomes uniform. On the other hand, the difference between the random walk density and the uniform density does drop sharply at around p² steps. Between p²/2 steps and p² steps the difference drops by almost a factor of 12.

Stability criteria in a nutshell

Posted on 15 December 2019 by John

A continuous linear system is stable if and only if all its eigenvalues have negative real parts.

There are various other stability criteria, but they boil down to the statement above. For example, there is the Routh stability criterion and the Hurwitz stability criterion. There’s also a continued fraction criterion.

But these criteria are just algorithms for determining whether the roots of the characteristic polynomial are all negative; they’re not independent criteria. They were more important in the days of hand calculations; now it’s easy enough to simply compute the roots of the characteristic polynomial numerically.

A discrete-time linear system is stable if and only if all its eigenvalues are inside the unit circle. The change of variables

w = (z – 1) / (z + 1)

maps the interior of the unit disk to the open left half plane, and so the roots of the characteristic polynomial in z are inside the unit circle if and only if the roots of the corresponding function of w are in the left half plane.

Related post: Cayley-Hamilton theorem

Random sample overlap

Posted on 13 December 2019 by John

Here’s a simple probability problem with an answer you may find surprising. (The statement of the problem and solution are simple. The proof is not as simple.)

Suppose you draw 1,000 serial numbers at random from a set of 1,000,000. Then you make another random sample of 1,000. How likely is it that no numbers will be the same on both lists?

To make the problem slightly more general, take two samples of size n from a population of size n² where n is large.

The probability of no overlap in the two samples is approximately 1/e. That is, in the limit as n approaches infinity, the probability converges to 1/e.

For n = 1,000 the approximation is good to three figures.

Proof

There are Binomial(n², n) ways to choose a sample of size n from a population of size n². After we’ve drawn the first sample, there are Binomial(n² – n, n) ways to draw a new sample that doesn’t overlap with the first one. The probability of such a sample is

$\begin{align*} \left.\binom{n^2 - n}{n} \middle/ \binom{n^2}{n}\right. &= \frac{(n^2 - n)!}{(n^2 - 2n)!} \frac{(n^2 - n)!}{(n^2)!} \\ & = \frac{(n^2 -n)(n^2 - n -1) \cdots (n^2 - 2n+1)}{n^2 (n^2-1) \cdots (n^2 - n+1)} \\ &= \left( \frac{n^2 - n}{n^2} \right) \left( \frac{n^2 - n - 1}{n^2 -1} \right) \cdots \left( \frac{n^2 - 2n + 1}{n^2 - n+1} \right) \\ \end{align*}$

The fractions on the last line are in decreasing order, and so their product is less than the first one raised to the nth power, and greater than the last one raised to the nth power.

By taking logs one can show that

$\lim_{n\to\infty} \left( \frac{n^2 - n}{n^2} \right)^n = \lim_{n\to\infty} \left( \frac{n^2 - 2n + 1}{n^2 - n+1} \right)^n = \frac{1}{e}$

Since upper and lower bounds on the probability converge to 1/e, the probability converges to 1/e.

Comments

This problem is reminiscent of the birthday problem, but it’s a little different because the samples are draw without replacement.

Incidentally, I didn’t make up this problem out of thin air. It came up naturally in the course of a consulting project this week.

Benefits of redundant coordinates

Posted on 11 December 2019 by John

Since you can describe a point in the plane with two numbers, why would you choose to use three numbers? Why would you ever want to use a coordinate system with more coordinates than necessary?

Barycentric coordinates

One way to indicate the location of a point inside a triangle is to give the distance to each of the vertices. These three distances are called barycentric coordinates. Why would you use three numbers when two would do?

Barycentric coordinates make some things much simpler. For example, the coordinates of the three vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1) for any triangle. The points inside are written as convex combinations of the vertices. The coordinates of the center of mass, the barycenter, are (1/3, 1/3, 1/3). The vertices are treated symmetrically, even if the triangle is far from symmetric.

Barycentric coordinates are useful in applications, such as computer graphics and finite element analysis, because they are relative coordinates. When a triangle moves or is rescaled, you only need to keep track of where the vertices went; the coordinates of the points inside relative to the vertices haven’t changed.

This can be generalized to more dimensions. For example, you could describe a point in a tetrahedron with four coordinates, more in higher dimensions you could describe a point in an n-simplex by the convex combination coefficients of the n vertices.

Barycentric coordinates are related to Dirichlet probability distributions. When you have n probabilities that sum to 1, you’ve got n-1 degrees of freedom. But it often simplifies things to work with n variables. As with the discussion of triangles above, the extra variable makes expressions more symmetric.

Quaternions and rotations

A point in three dimensional space can be described with three numbers, but it’s often useful to think of the usual three coordinates as the vector part of a quaternion, a set of four numbers.

Suppose you have a point

a = (x, y, z)

and you want to rotate it by an angle θ around an axis given by a unit vector

b = (u, v, w).

You can compute the rotation by associating the point a with the quaternion

p = (0, x, y, z)

and the axis b with the quaternion

q = (cos(θ/2), sin(θ/2) u, sin(θ/2) v, sin(θ/2) w)

The image of a is then given by the quaternion

q p q^-1.

This quaternion will have zero real part, and so the Euclidean coordinates are given by the vector part, the last three components.

Of course the product above is a quaternion product, which is not commutative. That’s why the q and the q^-1 don’t cancel out.

Using quaternions for rotations has several advantages over using rotation matrices. First, the quaternion representation is more compact, describing a rotation with four real numbers rather than nine. Second, the quaternion calculation can be better behaved numerically. But most importantly, the quaternion approach avoids gimbal lock, a sort of singularity in representing rotations.

Projective planes

In applications of algebra, such as elliptic curve cryptography, you often need to add “points at infinity” to make things work out. To formalize this, you add an extra coordinate. So while an elliptic curve is usually presented as an equation such as

y² = x³ + ax + b,

it’s more formally an equation in three variables

y²z = x³ + axz² + bz³.

Points in the projective plane have coordinates (x, y, z) where points are considered equivalent if they differ by a non-zero multiple, i.e. (x, y, z) is considered the same point as (λx, λy, λz) for any non-zero λ.

You can often ignore the z, choosing λ so that the z coordinate is 1. But when you need to work with the point at infinity in a uniform way, you bring out the full coordinates. Now the “point at infinity” is not some mysterious entity, but simply the point (0, 1, 0).

Common themes

Projective coordinates, like barycentric coordinates, introduce symmetry. With the addition of an extra coordinate, the three coordinates all behave similarly, with no reason to distinguish any coordinate as special. And as with quanternion rotations, projective coordinates make singularities go away, which is consequence of symmetry.

Determining fundamental frequency

Posted on 10 December 2019 by John

My daughter had a homework problem the other day that gave the frequencies of several Fourier components and asked her to find the fundamental frequency. The numbers were nice enough that brute force worked, and I’m sure that’s what students were expected to do. But this could easily be a much more sophisticated problem.

If the frequencies are all integers and exact multiples of a fundamental frequency, you can simply take the greatest common divisor of the frequencies. If you’re told the frequencies are 1760, 2200, and 3080, then the fundamental frequency is apparently 440 since that’s the greatest common divisor.

But what if the data are a little different? Say the highest pitch is 3081. Surely 440 should still be considered the fundamental frequency, even though now the greatest common divisor of the frequencies would be 1 Hz. What if the highest frequency was 3078 + π? Surely the fundamental frequency is still 440 for practical purposes.

And what might these practical purposes be? One purpose might be pitch detection. When several frequencies are combined that are small integer multiples of a fundamental frequency, we perceive the combination as having pitch given by that fundamental.

For something like a guitar string, the frequency components are close to small integer multiples of a fundamental frequency. But for something like a church bell, the frequencies don’t line up so neatly, though there’s still a clearly perceived pitch. For something like a metal mixing bowl, it may be difficulty to predict what pitch a person will hear when something strikes the bowl.

One complication we haven’t addressed yet is that the fundamental frequency will not be unique without some constraint. In the example above, the frequencies were all multiples of 440, but they’re also all multiples of 440/n for every positive integer n. We might get around this by specifying some lower bound on the fundamental frequency. Or we could say that all other things being equal, we want the largest candidate for the fundamental frequency.

We could formulate the problem of finding the fundamental frequency as an optimization problem. For example, we could form a mixed integer program. Suppose we have three frequencies f₁, f₂, and f₃. We could find a fundamental frequency f and integers n₁, n₂, and n₃ that minimize

(f₁ – n₁ f)² + (f₂ – n₂ f)² + (f₃ – n₃ f)²

subject to a lower bound on f.

We can eliminate the explicit dependence on the integer coefficients by minimizing

(f₁/f – [f₁/f])² + (f₂/f – [f₂/f])² + (f₃/f – [f₃/f])² .

where [x] denotes nearest integer to x. The first formulation has a more common form. The latter has a more complicated objective function, but it’s only a function of one variable.

Here’s what the latter looks like for frequencies 1760, 2200, and 3080.

objective function

Clearly there’s a minimum at 440 Hz.

Here’s the same plot with 10% random noise [1] added to each frequency: 1701, 2368, and 3339.

objective function

Now there’s a minimum near 336, but the local minimum at 566 is nearly as good.

[1] There are a couple reasons you might want to solve a problem like this. Maybe your frequencies really are integer multiples of a fundamental frequency, but there is measurement error. Another is that the frequencies are not exactly multiples of a fundamental, as when striking a bell or a mixing bowl. How might you formulate the two cases differently?

Illustrating Cayley-Hamilton with Python

Posted on 9 December 2019 by John

If you take a square matrix M, subtract x from the elements on the diagonal, and take the determinant, you get a polynomial in x called the characteristic polynomial of M. For example, let

$M = \left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right]$

Then

$\left| \begin{matrix} 5-x & -2 \\ 1 & 2-x \end{matrix} \right| = x^2 - 7x + 12$

The characteristic equation is the equation that sets the characteristic polynomial to zero. The roots of this polynomial are eigenvalues of the matrix.

The Cayley-Hamilton theorem says that if you take the original matrix and stick it into the polynomial, you’ll get the zero matrix.

$\left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right]^2 - 7\left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right] + 12\left[ \begin{matrix} 1 & 0 \\ 0 & 1\end{matrix} \right] = \left[ \begin{matrix} 0 & 0 \\ 0 & 0\end{matrix} \right]$

In brief, a matrix satisfies its own characteristic equation. Note that for this to hold we interpret constants, like 12 and 0, as corresponding multiples of the identity matrix.

You could verify the Cayley-Hamilton theorem in Python using scipy.linalg.funm to compute a polynomial function of a matrix.

>>> from scipy import array
>>> from scipy.linalg import funm
>>> m = array([[5, -2], [1, 2]])
>>> funm(m, lambda x: x**2 - 7*x + 12)

This returns a zero matrix.

I imagine funm is factoring M into something like PDP^-1 where D is a diagonal matrix. Then

f(M) = P f(D) P^-1.

This is because f can be applied to a diagonal matrix by simply applying f to each diagonal entry independently. You could use this to prove the Cayley-Hamilton theorem for diagonalizable matrices.

Solving hard problems

Solving Van der Pol equation with ivp_solve

More dynamical system posts

Exponential Christmas wreath

Convergence rate of random walk on integers mod p

More random walk posts

Top privacy-related posts of 2019

Stability criteria in a nutshell

Random sample overlap

Proof

Comments

Benefits of redundant coordinates

Barycentric coordinates

Quaternions and rotations

Projective planes

Common themes

Related posts

Determining fundamental frequency

Related posts

Top command line posts of 2019

Illustrating Cayley-Hamilton with Python

Related posts