System of Jacobi elliptic functions

Jacobi’s elliptic functions are sorta like trig functions. His functions sn and cn have names that reminiscent of sine and cosine for good reason. These functions come up in applications such as the nonlinear pendulum (i.e. when θ is too
large to assume θ is a good enough approximation to sin θ) and in conformal mapping.

I ran across an article [1] yesterday that shows how Jacobi’s three elliptic functions—sn, cn, and dn—could be defined by one dynamical system

\begin{eqnarray*} x' &=& yz \\ y' &=& -zx \\ z' &=& -k^2 xy \end{eqnarray*}

with initial conditions x(0) = 0, y(0) = 1, and z(0) = 1.

The parameter k is the modulus. (In Mathematica’s notation below, k² is the modulus.) As k decreases to 0, sn converges to sine, cn to cosine, and dn to 1. As k increases to 1, sn converges tanh, and cn and dn converge to sech. So you could think of k as a knob you turn to go from being more like circular functions (ordinary trig functions) to more like hyperbolic functions.

Since we have a dynamical system, let’s plot the solution, varying the modulus each time.The Jacobi functions are periodic (in fact they’re doubly periodic) and so the plots will be closed loops.

f[t_, m_] = {JacobiSN[t, m], JacobiCN[t, m], JacobiDN[t, m]}
ParametricPlot3D[f[t, 0.707], {t, 0, 10}, AspectRatio -> 1]

phase portrait k = 1/2

ParametricPlot3D[f[t, 0.99], {t, 0, 20}, AspectRatio -> 1]

phase portrait k = 1/2

ParametricPlot3D[f[t, 0.01], {t, 0, 20}, AspectRatio -> 1]

phase portrait k = 1/2

Note that this last plot is nearly flat because the modulus is small and so z is nearly constant. The small modulus also makes the phase portrait nearly circular because x is approximately sine and y is approximately cosine.

[1] Kenneth R. Meyer. Jacobi Elliptic Functions from a Dynamical Systems Point of View. The American Mathematical Monthly, Vol. 108, No. 8 (Oct., 2001), pp. 729-737

Duffing equation for nonlinear oscillator

The Duffing equation is an ordinary differential equation describing a nonlinear damped driven oscillator.

\frac{d^2x}{dt} + h\frac{dx}{dt} + \Omega_0^2 x + \mu x^3 = F \cos \omega t

If the parameter μ were zero, this would be a damped driven linear oscillator. It’s the nonlinear x³ term that makes things nonlinear and interesting.

Using an analog computer in 1961, Youshisuke Ueda discovered that this system was chaotic. It was the first discovery of chaos in a second-order non-autonomous periodic system. Lorenz discovered his famous Lorenz attractor around the same time, though his system is third order.

Ueda’s system has been called “Ueda’s strange attractor” or the “Japanese attractor.”

In the linear case, when μ = 0, the phase portrait simply spirals outward from the origin towards its steady state.

But things get much more interesting when μ = 1. Here’s Mathematica code to numerically solve the differential equation and plot the phase portrait.

duff[t_] = 
    NDSolve[{x''[t] + 0.05 x'[t] + x[t] + x[t]^3 == 7.5 Cos[t], 
             x[0] == 0, x'[0] == 1}, x[t], {t, 0, 100}][[1, 1, 2]]
ParametricPlot[{duff[t], duff'[t]}, {t, 0, 50}, AspectRatio -> 1]

Here’s the same system integrated out further, to t = 200 rather 50.

Source: Wanda Szemplińska-Stupnicka. Chaos, Bifurcations and Fractals Around Us.

Asymptotic solution to ODE

Our adventure starts with the following ordinary differential equation:

y' = y = \frac{1}{x}

Analytic solution

We can solve this equation in closed-form, depending on your definition of closed-form, by multiplying by an integrating factor.

e^x y' + e^x y = \frac{e^x}{x}

The left side factors to

(e^x y)'

and so

y = e^{-x} \left( \int^x \frac{e^t}{t}\, dt + c\right)

The indefinite integral above cannot be evaluated in elementary terms, though it can be evaluated in terms of special functions.

We didn’t specify where the integration starts or the constant c. You can make the lower limit of integration whatever you want, and the value of c will be whatever is necessary to satisfy initial conditions.

The initial conditions hardly matter for large x because the constant c is multiplied by a negative exponential. As we’ll see below, the solution y decays like 1/x while the effect of the initial condition decays exponentially.

Asymptotic series solution

I wrote a few months ago about power series solutions for differential equations, but that approach won’t work here, not over a very large range. If y has a Taylor series expansion, so does it’s derivative, and so y‘ + y has a Taylor series. But the right side of our differential equation has a singularity at 0.

What if instead of assuming y has a Taylor series expansion, we assume it has a Laurant series expansion? That is, instead of assuming y is a weighted sum of positive powers of x, we assume it’s a weighted sum of negative powers of x:

y = \sum_{n=1}^\infty a_nx^{-n}

Then formally solving for the coefficients we find that

y = \frac{1}{x} + \frac{1}{x^2} + \frac{2}{x^3} + \cdots + \frac{(n-1)!}{x^n} + \cdots


There’s one problem with this approach: The series diverges for all x because n! grows faster than xn.

And yet the series gives a useful approximation! With a convergent series, we fix x and let n go to infinity. The divergent series above is an asymptotic series, and so we fix n and let x go to infinity.

To see how well the asymptotic solution compares to the analytic solution, we’ll pick the lower limit of integration to be 1 and pick c = 0 in the analytic solution.

The plot above shows the analytic solution, and the first and second order asymptotic solutions. (The nth order solution takes the first n terms of the asymptotic series.) Note that the three curves differ substantially at first but quickly converge together.

Now let’s look at the relative error.

That’s interesting. Eventually the second order approximation is more accurate than the first order, but not at first. In both cases the relative error hits a local minimum, bounces back up, then slowly decreases.

Does this pattern continue if we move on to a third order approximation. Yes, as illustrated below.

Superasymptotic series

The graphs above suggests that higher order approximations are more accurate, eventually, but we might do better than any particular order by letting the order vary, picking the optimal order for each x. That’s the idea behind superasymptotics. For each x, the superasymptotic series sums up terms in the asymptotic series until the terms start getting larger.

When this approach works, it can produce a series that converges exponentially fast, even though each truncated asymptotic series only converges polynomially fast.

Update: To get an idea how accuracy varies jointly with the argument x and the asymptotic approximation order n, consider the following graph. Note that the horizontal axis is now n, not x. The different curves correspond to different values of x, generally moving down as x increases.

Relative error as a function of asymptotic order

Every curve is non-monotone because for every value of x, increasing the order only helps to a point. Then the error gets worse as the series diverges. But as you can see from the graph, you can do quite well by picking the optimal approximation order for a given x.

Related posts

Liouville-Green approximation

Suppose you have a differential equation of the form

\frac{d^2y}{dx^2} = f(x)y

If the function f(x) is constant, the differential equation is easy to solve in closed form. But when it is not constant, it is very unlikely that closed form solutions exist. But there may be useful closed form approximate solutions.

There is no linear term in the equation above, but there is a standard change of variables to eliminate a linear term and put any second order linear equation with variable coefficients in the form above.

The Liouville-Green approximate solutions are linear combinations of the functions

f^{-1/4}(x)\exp\left(\pm\int f^{1/2}(x)\, dx \right )

The constant of integration doesn’t matter. It would only change the solution by a constant multiple, so any indefinite integral will do.

To try out the Liouville-Green approximation, let’s look at the equation

\frac{d^2y}{dx^2} = x^2y

Here f(x) = x² and so the LG approximation solutions are

A\, \frac{\exp\left(x^2/2 \right )}{ \sqrt{x} } + B\, \frac{\exp\left(-x^2/2 \right )}{ \sqrt{x} }

Let’s see how these solutions behave compare to a numerical solution of the equation. To make things more definite, we need initial conditions. Let’s say AB = 1, which corresponds to initial conditions y(1) = 2.25525 and y‘(1) = -0.0854354.

Here’s a plot of the L-G approximation and a numerical solution.

Plot of Liouville-Green approximation and numerical solution

Here’s the Mathematica code that was used to create the plot above.

s = NDSolve[
       y''[x] == x^2 y[x], 
       y[1] == 2.25525, 
       y'[1] == -0.0854354
    {x, 1, 3}
g[x_] = (Exp[-x^2/2] + Exp[x^2/2])/Sqrt[x]
    {Evaluate[y[x] /. s], g[x]}, 
    {x, 1, 3}, 
    PlotLabels -> {"Numerical", "LG"}

For more on Liouville-Green approximations, see Olver’s classic book Asymptotics and Special Functions.

Related postMechanical vibrations

Differential equations and recurrence relations

Series solutions to differential equations can be grubby or elegant, depending on your perspective.

Power series solutions

At one level, there’s nothing profound going on. To find a series solution to a differential equation, assume the solution has a power series, stick the series into the equation, and solve for the coefficients. The mechanics are tedious, but you just follow your nose.

But why should this work? Why should the solution have a power series? And why should it be possible to find the coefficients? After all, there are infinitely many of them!

As for why the solution should have a power series, sometimes it doesn’t. But there is a complete theory to tell you when the solution should have a power series or not. And when it doesn’t have a power series, it may have more general series solution.

Recurrence relations

The tedious steps in the middle of the solution process may obscure what’s happening between the first and last step; the hard part in the middle grabs our attention. But the big picture is that series solution process transforms a differential equation for a function y(x) into a recurrence relation for the coefficients in the power series for y. We turn a continuous problem into a discrete problem, a hard problem into an easy problem.

To look at an example, consider Airy’s equation:

y” – xy = 0.

I won’t go into all the details of the solution process—my claim above is that these details obscure the transformation I want to emphasize—but skip from the first to the last step.

If you assume y has a power series at 0 with coefficients an, then we find that the coefficients satisfy the recurrence relation

(n + 2)(n + 1)an+2 = an-1

for n ≥ 1. Initial conditions determine a0 and a1, and it turns out a2 must be 0. The recurrence relation shows how these three coefficients determine all the other coefficients.

Holonomic functions and sequences

There’s something interesting going on here, a sort of functor mapping differential equations to recurrence relations. If we restrict ourselves to differential equations and recurrence relations with polynomial coefficients, we get into the realm of holonomic functions.

A holonomic function is one that satisfies a linear differential equation with polynomial coefficients. A holonomic sequence is a sequence that satisfies a linear recurrence relation with polynomial coefficients. Holonomic functions are the generating functions for holonomic sequences.

Holonomic functions are interesting for a variety of reasons. They’re a large enough class of functions to include many functions of interest, like most special functions from analysis and a lot of generating functions from combinatorics. And yet they’re a small enough class to have nice properties, such as closure under addition and multiplication.

Holonomic functions are determined by a finite list of numbers, the coefficients of the polynomials in their defining differential equation. This means they’re really useful in computer algebra because common operations ultimately map one finite set of numbers to another.


How to eliminate the first order term from a second order ODE

Authors will often say that “without loss of generality” they will assume that a differential equation has no first order derivative term. They’ll explain that there’s no need to consider

y'' + p(x) y' + q(x) y = 0

because a change of variables can turn the above equation into one of the form

y'' + q(x) y = 0

While this is true, the change of variables is seldom spelled out. I’ll give the change of variables explicitly here in case this is helpful to someone in the future. Define u(x) and r(x) by

u(x) = \exp\left( \frac{1}{2} \int^x p(t)\,dt\right ) y(x)


r(x) = q(x) - \frac{1}{2}p'(x) - \frac{1}{4}p(x)^2

With this change of variables

u'' + r(x) u = 0

Proof: Calculate u” + ru and use the fact that y satisfies the original differential equation. The calculation is tedious but routine.

Example: Suppose we start with

xy'' + y' + x^3 y = 0

Then dividing by x we get

y'' + \frac{1}{x}y' + x^2 y = 0

Now applying the change of variables above gives

u'' + \left(x^2 + \frac{1}{4x^2}\right)u = 0
and our original y is u / √ x.

Making a problem easier by making it harder

In the oral exam for my PhD, my advisor asked me a question about a differential equation. I don’t recall the question, but I remember the interaction that followed.

I was stuck, and my advisor countered by saying “Let me ask you a harder question.” I was still stuck, and so he said “Let me ask you an even harder question.” Then I got it.

By “harder” he meant “more general.” He started with a concrete problem, then made it progressively more abstract until I recognized it. His follow-up questions were logically harder but psychologically easier.

This incident came to mind when I ran across an example in Lawrence Evans’ control theory course notes. He uses the example to illustrate what he calls an example of mathematical wisdom:

It is sometimes easier to solve a problem by embedding it within a larger class of problems and then solving the larger class all at once.

The problem is to evaluate the integral of the sinc function:

\int_0^\infty \frac{\sin(x)}{x}\, dx

He does so by introducing the more general problem of evaluating the function

I(\alpha) = \int_0^\infty \exp(-\alpha x) \frac{\sin(x)}{x}\, dx

which reduces to the sinc integral when α = 0.

We can find the derivative of I(α) by differentiating under the integral sign and integrating by parts twice.

I'(\alpha) &=& \int_0^\infty \frac{\partial}{\partial \alpha} \exp(-\alpha x) \frac{\sin(x)}{x}\, dx \\ &=& \int_0^\infty \exp(-\alpha x) \sin(x) \, dx \\ &=& - \frac{1}{1 + \alpha^2}


I(\alpha) = - \arctan(\alpha) + C

As α goes to infinity, I(α) goes to zero, and so C = π/2 and I(0) = π/2.

Incidentally, note that instead of computing an integral in order to solve a differential equation as one often does, we introduced a differential equation in order to compute an integral.

Most mathematical problem statement

Every so often college students will ask me for advice regarding going into applied math. I’ll tell them the first step in an application, and often the hardest step, is formulating a problem, not solving it. People don’t approach you with mathematical problems per se but problems that can be turned into mathematical problems. Nobody is going to hand you a precisely formulated math problem. That only happens on exams.

Saying “nobody” is a bit of hyperbole. It happens, but not often. Most problems come to you in the language of the client’s business, such as “We want to make sure this machine doesn’t run into that machine” or “We’re trying to predict kidney failure.” [1]

Recently Charles McCreary from CRM Engineering sent me the most specific problem statement I’ve ever seen:

I need to calculate the von Mises yield envelope in the axial force-internal/external pressure domain for Lame’s solution …

That really took me by surprise. Sometimes a lead will mention mathematical terminology like “particle filters” or “finite elements,” though even this level of detail is uncommon. I’ve never seen something so specific.

It’s still the case that a lot of work goes into formulating a problem. I’m sure Charles’ client didn’t approach him with this statement. I’m consulting to a consultant who has already done the hard work of reducing the original application down to a purely mathematical problem. (I’d call it “purely mathematical” rather than “pure mathematical” because it’s definitely applied math.) I look forward to digging into it as soon as I wrap up what I’m currently working on.

* * *

[1] Nobody has come to me wanting to predict kidney failure, but I’ve worked on predicting several other maladies that I’ll not mention to maintain confidentiality.

Mathematical modeling for medical devices

We’re about to see a lot of new, powerful, inexpensive medical devices come out. And to my surprise, I’ve contributed to a few of them.

Growing compute power and shrinking sensors open up possibilities we’re only beginning to explore. Even when the things we want to observe elude direct measurement, we may be able to infer them from other things that we can now measure accurately, inexpensively, and in high volume.

In order to infer what you’d like to measure from what you can measure, you need a mathematical model. Or if you’d like to make predictions about the future from data collected in the past, you need a model. And that’s where I come in. Several companies have hired me to help them create medical devices by working on mathematical models. These might be statistical models, differential equations, or a combination of the two. I can’t say much about the projects I’ve worked on, at least not yet. I hope that I’ll be able to say more once the products come to market.

I started my career doing mathematical modeling (partial differential equations) but wasn’t that interested in statistics or medical applications. Then through an unexpected turn of events, I ended up spending a dozen years working in the biostatistics department of the world’s largest cancer center.

After leaving MD Anderson and starting my consultancy, several companies have approached me for help with mathematical problems associated with their idea for a medical device. These are ideal projects because they combine my earlier experience in mathematical modeling with my more recent experience with medical applications.

If you have an idea for a medical device, or know someone who does, let’s talk. I’d like to help.


Another way to define fractional derivatives

There are many ways to define fractional derivatives, and in general they coincide on nice classes of functions. A long time ago I wrote about one way to define fractional derivatives using Fourier transforms. From that post:

Here’s one way fractional derivatives could be defined. Suppose the Fourier transform of f(x) is g(ξ). Then for positive integer n, the nth derivative of f(x) has Fourier transform (2π i ξ)n g(ξ). So you could take the nth derivative of f(x) as follows: take the Fourier transform, multiply by (2π i ξ)n, and take the inverse Fourier transform. This suggests the same procedure could be used to define the nth derivative when n is not an integer.

Here’s another way to define fractional derivatives that doesn’t use Fourier transforms, the Grünwald-Letnikov derivative. It’s a fairly direct approach.

The starting point is to note that for positive integer n, the nth derivative can be written as

f^{(n)}(x) = \lim_{h\to 0} \frac{(\Delta^n_h f)(x)}{h^n}


(\Delta_h f)(x) = f(x) - f(x - h)

and Δn iterates Δ. For example,

(\Delta_h^2 f)(x) = (\Delta_h (\Delta_h) f)(x) = f(x) - 2 f(x-h) + f(x - 2h).

In general, for positive integer n, we have

(\Delta^n_h f)(x) = \sum_{k=0}^\infty (-1)^k {n \choose k} f(x - kh).

We could set the upper limit of the sum to n, but there’s no harm in setting it to ∞ because the binomial coefficients will be zero for k larger than n. And in this form, we can replace n with the integer n with any positive real number α to define the αth derivative. That is, the Grünwald-Letnikov derivative of f is given by

f^{(\alpha)}(x) = \lim_{h\to 0} \frac{(\Delta^\alpha_h f)(x)}{h^\alpha} = \lim_{h\to 0} h^{-\alpha} \sum_{k=0}^\infty (-1)^k {\alpha \choose k} f(x - kh).

See these notes for the definition of binomial coefficients for possibly non-integer arguments and for an explanation why for integer n the coefficients are eventually zero.

Notice that fractional derivatives require non-local information. Ordinary derivatives at a point are determined by the values of the function in an arbitrarily small neighborhood of that point. But notice how the fractional derivative, as defined in this post, depends on the values of the function at an evenly spaced infinite sequence of points. If we define fractional derivatives via Fourier transform, the non-local nature is more apparent since the Fourier transform at any point depends on the values of the function everywhere.

This non-local feature can be good or bad. If you want to model a phenomena with non-local dependence, fractional differential equations might be a good way to go. But if your phenomena is locally determined, fractional differential equations might be a poor fit.

Related post: Mittag-Leffler functions

Update: Yet another way to define fractional derivatives, this time via fractional integrals, which can be defined in terms of ordinary integrals

ODE solver as a functional fold

One of the most widely used numerical algorithms for solving differential equation is the 4th order Runge-Kutta method. This post shows how the Runge-Kutta method can be written naturally as a fold over the set of points where the solution is needed. These points do not need to be evenly spaced.

Given a differential equation of the form

y' = f(t, y)

with initial condition y(t0) = y0, the 4th order Runge-Kutta method advances the solution by an amount of time h by

y_{n+1} = y_n + \frac{h}{6}\left( k_{n1} + 2k_{n2} + 2k_{n3} + k_{n4}\right)


k_{n1} &=& f(t_n, y_n) \\ k_{n2} &=& f(t_n + 0.5h, y_n + 0.5hk_{n1}) \\ k_{n3} &=& f(t_n + 0.5h, y_n + 0.5hk_{n2}) \\ k_{n4} &=& f(t_n + h, y_n + hk_{n3}) \\

The Haskell code for implementing the accumulator function for foldl looks very much like the mathematical description above. This is a nice feature of the where syntax.

    rk (t, y) t' = (t', y + h*(k1 + 2.0*k2 + 2.0*k3 + k4)/6.0)
            h  = t' - t
            k1 = f t y
            k2 = f (t + 0.5*h) (y + 0.5*h*k1)
            k3 = f (t + 0.5*h) (y + 0.5*h*k2)
            k4 = f (t + 1.0*h) (y + 1.0*h*k3)     

Suppose we want to solve the differential equation y ‘ = (t2y2) sin(y) with initial condition y(0) = -1, and we want to approximate the solution at [0.01, 0.03, 0.04, 0.06]. We would implement the right side of the equation as

     f t y = (t**2 - y**2)*sin(y)

and fold the function rk over our time steps with

    foldl rk (0, -1) [0.01, 0.03, 0.04, 0.06]

This returns (0.06, -0.9527). The first part, 0.06, is no surprise since obviously we asked for the solution up to 0.06. The second part, -0.9527, is the part we’re more interested in.

If you want to see the solution at all the specified points and not just the last one, replace foldl with scanl.

    scanl rk (0, -1) [0.01, 0.03, 0.04, 0.06]

This returns

    [(0.0, -1.0), (0.01, -0.9917), (0.03, -0.9756), (0.042, -0.9678), (0.06, -0.9527)]

As pointed out in the previous post, writing algorithms as folds separates the core of the algorithm from data access. This would allow us, for example, to change rk independently, such as using a different order Runge-Kutta method. (Which hardly anyone does. Fourth order is a kind of sweet spot.) Or we could swap out Runge-Kutta for a different ODE solver entirely just by passing a different function into the fold.

Next see how you could use a fold to compute mean, variance, skewness, and kurtosis in one pass through a stream of data using a fold.

Kalman filters and bottom-up learning

radio antennae

Kalman filtering is a mixture of differential equations and statistics. Kalman filters are commonly used in tracking applications, such as tracking the location of a space probe or tracking the amount of charge left in a cell phone battery. Kalman filters provide a way to synthesize theoretical predictions and actual measurements, accounting for error in both.

Engineers naturally emphasize the differential equations and statisticians naturally emphasize the statistics. Both perspectives are valuable, but in my opinion/experience, the engineering perspective should come first.

From an engineering perspective, a Kalman filtering problem starts as a differential equation. In an ideal world, one would simply solve the differential equation and be done. But the experienced engineer realizes his or her differential equations don’t capture everything. (Unlike the engineer in this post.) Along the road to the equations at hand, there were approximations, terms left out, and various unknown unknowns.

The Kalman filter accounts for some level of uncertainty in the process dynamics and in the measurements taken. This uncertainty is modeled as randomness, but this doesn’t mean that there’s necessarily anything “random” going on. It simply acknowledges that random variables are an effective way of modeling miscellaneous effects that are unknown or too complicated to account for directly. (See Random is as random does.)

The statistical approach to Kalman filtering is to say that it is simply another estimation problem. You start from a probability model and apply Bayes’ theorem. That probability model has a term inside that happens to come from a differential equation in practice, but this is irrelevant to the statistics. The basic Kalman filter is a linear model with normal probability distributions, and this makes a closed-form solution for the posterior possible.

You’d be hard pressed to start from a statistical description of Kalman filtering, such as that given here, and have much appreciation for the motivating dynamics. Vital details have simply been abstracted away. As a client told me once when I tried to understand his problem starting from the top-down, “You’ll never get here from there.”

The statistical perspective is complementary. Some things are clear from the beginning with the statistical formulation that would take a long time to see from the engineering perspective. But while both perspectives are valuable, I believe it’s easier to start on the engineering end and work toward the statistics end rather than the other way around.

History supports this claim. The Kalman filter from the engineering perspective came first and its formulation in terms of Bayesian statistics came later. Except that’s not entirely true.

Rudolf Kálmán published his seminal paper in 1960 and four years later papers started to come out making the connection to Bayesian statistics. But while Kálmán and others were working in the US starting from the engineering end, Ruslan Stratonovich was working in Russia starting from the statistical end. Still, I believe it’s fair to say that most of the development and application of Kalman filters has proceeded from the engineering to the statistics rather than the other way around.

RelatedMore on Kalman filters


Bring out your equations!

Nice discussion from Fundamentals of Kalman Filtering: A Practical Approach by Paul Zarchan and Howard Musoff:

Often the hardest part in Kalman filtering is the subject that no one talks about—setting up the problem. This is analogous to the quote from the recent engineering graduate who, upon arriving in industry, enthusiastically says, “Here I am, present me with your differential equations!” As the naive engineering graduate soon found out, problems in the real world are frequently not clear and are subject to many interpretations. Real problems are seldom presented in the form of differential equations, and they usually do not have unique solutions.

Whether it’s Kalman filters, dynamical systems, or anything else, setting up the problem is the hard part, or at least a hard part.

On the other hand, it’s about as impractical to only be able to set up problems as it is to only be able to solve them. You have to know what kinds of problems can be solved, and how accurately, so you can formulate a problem in a tractable way. There’s a feedback loop: provisional problem formulation, attempted solution, revised formulation, etc. It’s ideal when one person can set up and solve a problem, but it’s enough for the formulators and solvers to communicate well and have some common ground.

Related posts:

Personal growth and discrete harmonic functions

“You are the average of the five people you spend the most time with.”

A Google search says this quote is by Jim Rohn. I think other people have said similar things. I’ve heard it quoted many times. The implication is usually that you can improve your life by hanging around better people.

Here are three things it makes me think of.

  1. It sounds approximately true.
  2. It can’t be literally true.
  3. It reminds me of harmonic functions.

There are numbers to back up the assertion that things like your income and even your weight approximately match the average of the people around you. Cause and effect go both ways: people like to hang out with people like themselves, and we become like the people we hang around.

It’s an aphorism, not meant to be taken literally. But a moment’s thought shows that it can’t be literally true for everybody. In any social network, someone has the most money, for example. That person’s net worth cannot be the average of the net worth of others in the group, unless everyone has the exact same net worth. The same applies to the poorest person in the network.

The reason I say that this reminds me of harmonic functions is the mean value theorem. If a function satisfies Laplace’s equation in some region, at any point in the interior of the region, the value of the function equals the average of the function over a spherical region centered at the point. But this is only true in the interior. On the boundary, you might have a maximum or minimum. If the boundary is compact, you will have a maximum and a minimum, provided the function extends continuously to its boundary.

I think of the continuous case first because I spent years thinking about such things. But there’s a discrete analog of harmonic functions that applies directly to the example above. If you have some network, such as a social network, and assign number to each node, you have a discrete harmonic function if the value at every node is the average of the values at its neighboring nodes. For a finite network, a function cannot be harmonic at every point unless it is constant, for reasons given above. But a function could be harmonic at all but two nodes of a graph, or approximately harmonic at all nodes.

Related posts:

Types of nonlinearity in PDEs

My advisor in grad school used to say

“Nonlinear” is not a hypothesis but the lack of a hypothesis.

To say something positive about nonlinear equations, you have to replace linearity with some specific property. You want to partially remove the restriction of linearity without letting just anything in.

In partial differential equations, one pattern of nonlinearity is to replace linear with monotone.

We say a function on the real line is monotone if x ≥ y implies f(x) ≥ f(y). Strictly speaking this is the definition of monotone non-decreasing, but in this context the “non-decreasing” qualifier is dropped. Now suppose f is a linear transformation on Rn. What could it mean for f to be monotone when statements like x > y don’t make sense? We could rewrite the one dimensional definition as saying

(f(x) – f(y))(xy) ≥ 0

for all x and y. This form generalizes to linear transformations if we interpret the multiplication above as inner product. More generally we say that an operator A from a Banach space V to its dual V* is monotone if

(A(x) – A(y))(xy) ≥ 0

where now instead of an inner product we have more generally the action of the linear functional A(x) – A(y) on the vector x – y. If the space V is a Hilbert space, then this is just an inner product, but in general it doesn’t have to be.

In applications to PDEs, the operator A would represent an operator between function spaces so that the PDE has the form Auf where u is a solution in V and the right hand side f is in V*. The operator A represents some weak form of the PDE and the space V is some sort of Sobolev space with the necessary boundary value assumptions baked in.

Monotonicity alone isn’t enough to prove existence or uniqueness. We need a few other properties.

We say the operator A from V to V* is coercive if Au(u) / ||u|| goes to infinity as ||u|| goes to infinity.

We say A is Type M if whenever un converges weakly to u, Aun converges weakly to f, and the lim sup of Aunf(u) all apply, then Au = f.

Here’s an example of the kind of theorems you can prove with these definitions.

If A is Type M, bounded, and coercive on a separable reflexive Banach space V to its dual V*, then A is surjective.

In application, the Banach space in the theorem is some sort of Sobolev space, functions in some Lp space whose generalized derivatives are in the same space, along with some boundary conditions. (You might wonder how boundary conditions can be defined for functions in such a space. They can’t directly, but they can indirectly via trace operators. Generalized boundary values for generalized functions. It’s all very generalized!)

Saying that A is surjective means the equation Au = f has a solution for any f in V*. So we reduce the problem of showing that the equation has a solution to verifying that A is Type M, bounded, and coercive. Type M is a form of continuity; bounded and coercive follow from a priori estimates.

Related posts: