ODE to Fisher’s transform

I was calculating a correlation coefficient this afternoon and ran into something interesting.

Suppose you have two uncorrelated random variables X and Y. If you draw, say, a thousand samples each from X and Y and compute Pearson’s correlation coefficient, you almost certainly will not get 0, though you very likely will get a small number.

How do you find a confidence interval around a correlation coefficient?

Sample correlation coefficient values do not follow a normally distribution, though the distribution is approximately normal if the population correlation is small. The distribution gets further from normal as the correlation gets close to 1 or −1.

Enter Fisher’s transformation. If you run the sample correlation coefficient r through the function

½ log((1 + r)/(1 − r)) = arctanh(r)

you get something that has a distribution closer to the normal distribution. You find a confidence interval for the transformed variable, then undo the transformation.

Now where did the Fisher transform come from?

I don’t know whether this was Fisher’s derivation, but Hotelling came up the following derivation. Assume you apply a transform G(r) to the correlation coefficient. Write an asymptotic expansion for the kurtosis κ3 and set the first term equal to zero. This leads to the ordinary differential equation

3(1 − r³) G″(r) − 6r G′(r) = 0

which has the solution G(r) = arctanh(r).

I found this interesting because I’ve worked with differential equations and with statistics, but I’ve rarely seen them overlap.

 

Differential equation on a doughnut

Here’s a differential equation from [1] that’s interesting to play with.

\begin{align*} x^\prime &= -my + nxz \\ y^\prime &= \phantom{-} mx + nyz \\ z^\prime &= (n/2)(1 + z^2 - x^2 - y^2) \\ \end{align*}

Even though it’s a nonlinear system, it has a closed-form solution, namely

\begin{align*} x(t) &= \frac{2a\cos(mt) - 2b \sin(mt)}{\Delta - 2c \sin(n t) + (2 - \Delta) \cos(n t)} \\ y(t) &= \frac{2a \sin(mt) + 2b \cos(mt)} {\Delta - 2 c \sin(n t) + (2 - \Delta) \cos(n t)} \\ z(t) &= \frac{2c \cos(nt) + (2 - \Delta) \sin(nt)}{\Delta - 2c \sin(nt) + (2 - \Delta) \cos(n t)} \\ \end{align*}

where (abc) is the solution at t = 0 and Δ = 1 + a² + b² + c².

The solutions lie on the torus (doughnut). If m and n are coprime integers then the solutions form a closed loop. If the ratio m/n is not rational then the solutions are dense on the torus.

Here’s an example with parameters a = 1, b = 1, c = 3, m = 3, and n = 5.

And now with parameters a = 1, b = 1, c = 0.3, m = 4, and n = 5.

And finally with parameters a = 1, b = 1, c = 0.3, m = π, and n = 5.

Note that when m = 2 and n = 3 the trajectory traces out a trefoil knot.

Related posts

[1] Richard Parris. A Three-Dimensional System with Knotted Trajectories. The American Mathematical Monthly, Vol. 84, No. 6 (Jun. – Jul., 1977), pp. 468–469

The biggest math symbol

The biggest math symbol that I can think of is the Riemann P-symbol

w(z)=P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

The symbol is also known as the Papperitz symbol because Erwin Papperitz invented the symbol for expressing solutions to Bernard Riemann’s differential equation.

Before writing out Riemann’s differential equation, we note that the equation has regular singular points at ab, and c. In fact, that is its defining feature: it is the most general linear second order differential equation with three regular singular points. The parameters ab, and c enter the equation in the as roots of an expression in denominators; that’s as it has to be if these are the singular points.

The way the Greek letter parameters enter Riemann’s equation is more complicated, but there is a good reason for the complication: the notation makes solutions transform as simply as possible under a bilinear transformation. This is important because Möbius transformations are the conformal automorphisms of the Riemann sphere.

To be specific, let

m(z) = \frac{Az + B}{Cz + D}

be a Möbius transformation. Then

P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\} = P \left\{ \begin{matrix} m(a) & m(b) & m(c) & \; \\ \alpha & \beta & \gamma & m(z) \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

Since the parameters on the top row of the P-symbol are the locations of singularities, when you transform a solution, moving the singularities, the new parameters have to be the new locations of the singularities. And importantly the rest of the parameters do not change.

Now with the motivation aside, we’ll write out Riemann’s differential equation in all its glory.

\frac{d^2w}{dz^2} + p(z) \frac{dw}{dz} + q(z) \frac{w}{(z-a)(z-b)(z-c)} = 0

where

p(z) = \frac{1-\alpha-\alpha^\prime}{z-a} + \frac{1-\beta-\beta^\prime}{z-b} + \frac{1-\gamma-\gamma^\prime}{z-c}

and

q(z) = \frac{\alpha\alpha^\prime (a-b)(a-c)} {z-a} +\frac{\beta\beta^\prime (b-c)(b-a)} {z-b} +\frac{\gamma\gamma^\prime (c-a)(c-b)} {z-c}

Related posts

Eigenvalues of the Laplacian on a square

What are the solutions to the equation

uxx + uyy = λu

on the unit square with the requirement that u(x, y) = 0 on the boundary?

It’s easy to see that the functions

u(x, y) = sin(mπx) sin(nπy)

are solutions with

λ = (m² + n²)π²

for non-negative integers m and n. It’s not so obvious that these are the only solutions, but we’ll take that on faith.

The previous post looked at Pólya’s bounds on the eigenvalues of the Laplace operator Δ for a general region D, with only the requirement that copies of D can tile the plane without overlapping. Surely squares satisfy this requirement, so the problem in this post is a special case of the problem in the previous post. So how do they compare?

Pólya gives lower bounds on the kth eigenvalue, so how do we order the numbers of the form (m² + n²)π?

There’s a theorem for counting the number of numbers n up to x where n is the sum of two squares, the Landau-Ramanujan theorem. But it seems to contradict Pólya’s bounds. That’s because Landau-Ramanujan only counts each n once, but eigenvalues need to be counted multiple times if a number can be written as a sum of squares multiple ways.

For example, 25π² should count as an eigenvalue four times, corresponding to (m, n) = (5, 0), (0, 5), (3, 4), and (4, 3).

About how large is the kth eigenvalue? If non-negative integers m and n satisfy

(m² + n²)π² ≤ x

then (mn) is inside a circle of radius √x/π. For large x, the number of such pairs is approximately the area of a circle of radius √x/π in the first quadrant, which is x / 4π. So the kth eigenvalue is approximately 4πk, which matches Pólya’s lower bound of 4πk for a region of area 1.

The sound of drums that tile the plane

The vibration of a thin membrane is often modeled by the PDE

Δu + λu = 0

where u is the height of the membrane and Δ is the Laplacian. Solutions only exist for certain values of λ, the eigenvalues of Δ. You could think of u as giving the height of a vibrating drum head and the λs as frequencies of vibration.

The λs depend on the shape of the drum head D and the boundary conditions. If we clamp down the drumhead on the rim, i.e. specify that u equals 0 on the boundary ∂D, then we call this Dirichlet boundary conditions. If the drumhead is free to vibrate, i.e. we do not specify the height on ∂D, but we do specify that the membrane is flat on ∂D, i.e. that the normal derivative ∂u/∂n equals 0, then we call this Neumann boundary conditions.

George Pólya [1] gives lower bounds on the λs under Dirichlet boundary conditions and upper bounds on the λs under Neumann boundary conditions. His theorems require that D be bounded and that it is possible to tile the plane with congruent copies of D. For example, D could be a rectangle. Or it could have curved sides, like figure in an Escher drawing.

Let A be the area of D. Under Dirichlet boundary conditions the kth eigenvalue is bounded below by

λk ≥ 4πk / A.

Under Neumann boundary conditions, the kth eigenvalue is bounded above by

λk ≤ 4π(k − 1) / A.

Update: See the next post for how the theorem in this post compares to the special case of a square.

Related posts

[1] G. Pólya. On the Eigenvalues of Vibrating Membranes. Proceedings of the London Mathematical Society, Volume s3-11, Issue 1, 1961, Pages 419–433.

My favorite paper: H = W

A paper came out in 1964 with the title “H = W.” The remarkably short title was not cryptic, however. The people for whom the paper was written knew exactly what it meant. There were two families of function spaces, one denoted with H and another denoted with W, that were speculated to be the same, and this paper proved that indeed they were.

This post will explain the significance of the paper. It will be more advanced and less self-contained than most of my posts. If you’re still interested, read on.

Definitions

The derivative of a function might exist in a generalized sense when it doesn’t exist in the classical sense. I give an introduction to this idea in the post How to differentiate a non-differentiable function. The generalized derivative is a linear functional on test functions [1] and may not correspond to a classical function. The delta “function” δ(x) is the classic example.

A regular distribution is a distribution whose action on a test function is equal to multiplying the test function by a locally integrable function and integrating.

Given Ω ⊂ ℝn, the Sobolev space Wk,p(Ω) consists of functions whose partial derivatives of order up to k are regular distributions that lie in the space Lp(Ω).

For example, let I be the interval (−1, 1) and let f(x) = |x|. The function f is not differentiable at 0, but the generalized derivative of f is the sign function sgn(x), which is in Lp(I) for all p. The generalized derivative of sgn(x) is 2δ(x), which is not a regular distribution [2], and so fW1,p(I) but fW2,p(I).

The norm on Wk,p(Ω) is the sum of the Lp norms of the function and each of its partial derivatives up to order k.

The Sobolev space Hk,p(Ω) is the closure of test functions in the norm of the space Wk,p(Ω).

Theorem

It’s not obvious a priori that these two ways of defining a Sobolev space are equivalent, but James Serrin and Normal George Meyers [3] proved in 1964 that for all domains Ω, and for all non-negative integers k, and for all 1 ≤ p in < ∞ we have

Hk,p(Ω) = Wk,p(Ω).

The proof is remarkably brief, less than a page.

Significance

Why does this theorem matter? Sobolev spaces are the machinery of the modern theory of differential equations. I spent a lot of time in my mid 20s working with Sobolev spaces.

The grand strategy of PDE research is to first search for generalized solutions to an equation, solutions belonging to a Sobolev space, then if possible prove that the generalized solution is in fact a classical solution.

This is analogous to first proving an algebraic equation has complex solution, then proving that the complex solution is real, or proving that an equation has a real number solution, then proving that the real solution is in fact an integer. It’s easier to first find a solution in a larger space, then if possible show that the thing you found belongs to a smaller space.

Related posts

[1] A test function in this context is an infinitely differentiable function of compact support. In other contexts a test function is not required to have compact support but is required to go asymptotically approach zero rapidly, faster than the reciprocal of any polynomial.

[2] The classical derivative of sgn(x) is equal to zero almost everywhere. But the derivative as a distribution is not zero. The pointwise derivative may not equal the generalized derivative.

[2] Norman G, Meyers; Serrin, James (1964), “H = W”, Proceedings of the National Academy of Sciences, 51 (6): 1055–1056

Fredholm index

The previous post on kernels and cokernels mentioned that for a linear operator TV → W, the index of T is defined as the difference between the dimension of its kernel and the dimension of its cokernel:

index T = dim ker T − dim coker T.

The index was first called the Fredholm index, because of it came up in Fredholm’s investigation of integral equations. (More on this work in the next post.)

Robustness

The index of a linear operator is robust in the following sense. If V and W are Banach spaces and TV → W is a continuous linear operator, then there is an open set around T in the space of continuous operators from V to W on which the index is constant. In other words, small changes to T don’t change its index.

Small changes to T may alter the dimension of the kernel or the dimension of the cokernel, but they don’t alter their difference.

Relation to Fredholm alternative

The next post discusses the Fredholm alternative theorem. It says that if K is a compact linear operator on a Hilbert space and I is the identity operator, then the Fredholm index of IK is zero. The post will explain how this relates to solving linear (integral) equations.

Analogy to Euler characteristic

We can make an exact sequence with the spaces V and W and the kernel and cokernel of T as follows:

0 → ker TVW → coker T → 0

All this means is that the image of one map is the kernel of the next.

We can take the alternating sum of the dimensions of the spaces in this sequence:

dim ker T − dim V + dim W − dim coker T.

If V and W have the same finite dimension, then this alternating sum equals the index of T.

The Euler characteristic is also an alternating sum. For a simplex, the Euler characteristic is defined by

V − EF

where V is the number of vertices, E the number of edges, and F the number of faces. We can extend this to higher dimensions as the number of zero-dimensional object (vertices), minus the number of one-dimensional objects (edges), plus the number of two-dimensional objects, minus the number of three dimensional objects, etc.

A more sophisticated definition of Euler characteristic is the alternating sum of the dimensions of cohomology spaces. These also form an exact sequence.

The Atiyah-Singer index theorem says that for elliptic operators on manifolds, two kinds of index are equal: the analytical index and the topological index. The analytical index is essentially the Fredholm index. The topological index is derived from topological information about the manifold.

This is analogous to the Gauss-Bonnet theorem that says you can find the Euler characteristic, a topological invariant, by integrating Gauss curvature, an analytic calculation.

Other posts in this series

This is the middle post in a series of three. The first was on kernels and cokernels, and the next is on the Fredholm alternative.

Linear KdV dispersion

The Korteweg–De Vries (KdV) equation

u_t - 6 u\, u_x + u_{xxx} = 0

is a nonlinear PDE used to model shallow water waves. The linear counterpart omits the nonlinear term in the middle.

u_t + u_{xxx} = 0

This variant is useful in itself, but also for understanding the nonlinear KdV equation.

Solitons

Solutions to the linear KdV equation spread out over time. The nonlinear term in the KdV equation counterbalances the dispersion term uxxx so that solutions to the nonlinear PDE behave in some ways like linear solutions.

Solutions to a nonlinear equation don’t add, and yet they sorta add. Here’s a description from [1].

At the heart of these observations is the discovery that these nonlinear waves can interact strongly and then continue thereafter almost as if there had been no interaction at all. This persistence of the wave led Zabusky and Kruskal to coin the term ‘soliton’ (after photon, proton, etc.) to emphasize the particle-like character of these waves which retain their identities in a collision.

I added the emphasis on almost because many descriptions leave out this important qualifier.

Solution to linear KdV

There is a compact expression for the solution to the linear KdP equation if u, ux , and uxx all go to 0 as |x| → ∞. If u(x, 0) = f(x), then the solution is

u(x, t) = (3t)^{-1/3} \int_{-\infty}^\infty f(y) \text{ Ai}\!\left( \frac{x-y}{(3t)^{1/3}} \right) \,dy

Here Ai(z) is the Airy function. This function has come up several times here. For example, I’ve written about the Airy function in the context of triple factorials and in connection with Bessel functions.

Aside on third order equations

Third order differential equations are uncommon. Third order linear ODEs are quite rare. Third order differential equations are usually nonlinear PDEs, like the KdV equation. The linear KdV is an example of a linear third order PDE that arises in applications.

 

[1] P. G. Drazin and R. S. Johnson. Solitons: an introduction. Cambridge University Press. 1989.

Closed-form minimal surface solutions

Differential equations, especially nonlinear differential equations, rarely have a closed-form solution, but sometimes it happens. As I wrote about a year ago

It is unusual for a nonlinear PDE to have a closed-form solution, but it is not unheard of. There are numerous examples of nonlinear PDEs, equations with important physical applications, that have closed-form solutions.

This post will present some closed-form solutions of the minimal surface equation

(1 + |ux|²) uyy − 2 ux uy uxy + (1 + |uy|²) uxx = 0

One trivial class of closed-form solutions are planes.

u(xy) = axbyc.

There are three non-trivial classes of solutions as far as I know. Jean Baptiste Marie Meusnier discovered two of these in 1776, namely the helicoild

u(x, y) = tan−1(y/x)

and the catenoid

u(x, y) = cosh−1(a (x² + y²)½) / a

Heinrich Scherk discovered another closed form solution in 1830:

u(x, y) = log( cos(ay) / cos(ax) ) / a

Here’s a plot.

The surface formed by the graph of the solution is known as Scherk’s surface. You could image that if the edges of this surface were made of wire and the wire was dipped in soapy wanter, it would form a bubble like Sherk’s surface.

Note that the closed-form solutions satisfy the minimal surface PDE itself, but do not satisfy any given boundary conditions, unless the boundary values you’d like to specify happen to be exactly the values this function has.

Fundamental solution

The “fundamental solution” to a PDE solves the equation with the right-hand side set to δ. Intuitively, you can think of the delta function as striking something with a hammer in order to see how it rings.

An aside on rigor

A novice might be OK with the explanation above.

A sophomore might rightly object that this doesn’t make sense. This delta “function” isn’t even a function. How can you set one side of a differential equation to something that isn’t even a function?

An expert would understand that calling δ a function is just a convenient figure of speech for a rigorous construction using distribution theory. You can find a high-level introduction here.

Bell curve meme.

As with many of the bell curve memes, the horizontal axis is really experience rather than intelligence. “Whatever you say” could be an intelligent response to someone talking about things they understand but you don’t. And objecting that something doesn’t make sense (as stated) is an intelligent response when you’re exposed to a metaphor that you didn’t realize was a metaphor. A mature response is to appreciate the value of rigor and the value of metaphor.

Why fundamental

The reason a fundamental solution is called “fundamental” is that once you have the fundamental solution, you can find more solutions by convolving the right-hand side with it.

So if L is a linear differential operator and F is a fundamental solution, i.e.

L F = δ

then the convolution f =  Fh is a solution to

L fh.

Poisson’s equation

The fundamental solution to Poisson’s equation

∇² f = h

depends on dimension.

For dimension d > 2 the solution is proportional to rd−2 where r is the radial distance to the origin.

For dimension d = 2 the solution is proportional to log r.

This is an example of the phenomenon alluded to in the article titled A Zeroth Power Is Often a Logarithm Yearning to Be Free by Sanjoy Mahajan. If we naively stuck d = 2 into the fundamental solution rd−2 for higher dimensions we’d get r0  = 1, which doesn’t work. But we’d read Mahajan’s article, we might guess that log r and then verify that it works.

I give a couple more examples of “logarithms yearning to be free” in this post.