Making a problem easier by making it harder

In the oral exam for my PhD, my advisor asked me a question about a differential equation. I don’t recall the question, but I remember the interaction that followed.

I was stuck, and my advisor countered by saying “Let me ask you a harder question.” I was still stuck, and so he said “Let me ask you an even harder question.” Then I got it.

By “harder” he meant “more general.” He started with a concrete problem, then made it progressively more abstract until I recognized it. His follow-up questions were logically harder but psychologically easier.

This incident came to mind when I ran across an example in Lawrence Evans’ control theory course notes. He uses the example to illustrate what he calls an example of mathematical wisdom:

It is sometimes easier to solve a problem by embedding it within a larger class of problems and then solving the larger class all at once.

The problem is to evaluate the integral of the sinc function:

\int_0^\infty \frac{\sin(x)}{x}\, dx

He does so by introducing the more general problem of evaluating the function

I(\alpha) = \int_0^\infty \exp(-\alpha x) \frac{\sin(x)}{x}\, dx

which reduces to the sinc integral when α = 0.

We can find the derivative of I(α) by differentiating under the integral sign and integrating by parts twice.

I'(\alpha) &=& \int_0^\infty \frac{\partial}{\partial \alpha} \exp(-\alpha x) \frac{\sin(x)}{x}\, dx \\ &=& \int_0^\infty \exp(-\alpha x) \sin(x) \, dx \\ &=& - \frac{1}{1 + \alpha^2}


I(\alpha) = - \arctan(\alpha) + C

As α goes to infinity, I(α) goes to zero, and so C = π/2 and I(0) = π/2.

Incidentally, note that instead of computing an integral in order to solve a differential equation as one often does, we introduced a differential equation in order to compute an integral.

Most mathematical problem statement

Every so often college students will ask me for advice regarding going into applied math. I’ll tell them the first step in an application, and often the hardest step, is formulating a problem, not solving it. People don’t approach you with mathematical problems per se but problems that can be turned into mathematical problems. Nobody is going to hand you a precisely formulated math problem. That only happens on exams.

Saying “nobody” is a bit of hyperbole. It happens, but not often. Most problems come to you in the language of the client’s business, such as “We want to make sure this machine doesn’t run into that machine” or “We’re trying to predict kidney failure.” [1]

Recently Charles McCreary from CRM Engineering sent me the most specific problem statement I’ve ever seen:

I need to calculate the von Mises yield envelope in the axial force-internal/external pressure domain for Lame’s solution …

That really took me by surprise. Sometimes a lead will mention mathematical terminology like “particle filters” or “finite elements,” though even this level of detail is uncommon. I’ve never seen something so specific.

It’s still the case that a lot of work goes into formulating a problem. I’m sure Charles’ client didn’t approach him with this statement. I’m consulting to a consultant who has already done the hard work of reducing the original application down to a purely mathematical problem. (I’d call it “purely mathematical” rather than “pure mathematical” because it’s definitely applied math.) I look forward to digging into it as soon as I wrap up what I’m currently working on.

* * *

[1] Nobody has come to me wanting to predict kidney failure, but I’ve worked on predicting several other maladies that I’ll not mention to maintain confidentiality.

Mathematical modeling for medical devices

We’re about to see a lot of new, powerful, inexpensive medical devices come out. And to my surprise, I’ve contributed to a few of them.

Growing compute power and shrinking sensors open up possibilities we’re only beginning to explore. Even when the things we want to observe elude direct measurement, we may be able to infer them from other things that we can now measure accurately, inexpensively, and in high volume.

In order to infer what you’d like to measure from what you can measure, you need a mathematical model. Or if you’d like to make predictions about the future from data collected in the past, you need a model. And that’s where I come in. Several companies have hired me to help them create medical devices by working on mathematical models. These might be statistical models, differential equations, or a combination of the two. I can’t say much about the projects I’ve worked on, at least not yet. I hope that I’ll be able to say more once the products come to market.

I started my career doing mathematical modeling (partial differential equations) but wasn’t that interested in statistics or medical applications. Then through an unexpected turn of events, I ended up spending a dozen years working in the biostatistics department of the world’s largest cancer center.

After leaving MD Anderson and starting my consultancy, several companies have approached me for help with mathematical problems associated with their idea for a medical device. These are ideal projects because they combine my earlier experience in mathematical modeling with my more recent experience with medical applications.

If you have an idea for a medical device, or know someone who does, let’s talk. I’d like to help.


Another way to define fractional derivatives

There are many ways to define fractional derivatives, and in general they coincide on nice classes of functions. A long time ago I wrote about one way to define fractional derivatives using Fourier transforms. From that post:

Here’s one way fractional derivatives could be defined. Suppose the Fourier transform of f(x) is g(ξ). Then for positive integer n, the nth derivative of f(x) has Fourier transform (2π i ξ)n g(ξ). So you could take the nth derivative of f(x) as follows: take the Fourier transform, multiply by (2π i ξ)n, and take the inverse Fourier transform. This suggests the same procedure could be used to define the nth derivative when n is not an integer.

Here’s another way to define fractional derivatives that doesn’t use Fourier transforms, the Grünwald-Letnikov derivative. It’s a fairly direct approach.

The starting point is to note that for positive integer n, the nth derivative can be written as

f^{(n)}(x) = \lim_{h\to 0} \frac{(\Delta^n_h f)(x)}{h^n}


(\Delta_h f)(x) = f(x) - f(x - h)

and Δn iterates Δ. For example,

(\Delta_h^2 f)(x) = (\Delta_h (\Delta_h) f)(x) = f(x) - 2 f(x-h) + f(x - 2h).

In general, for positive integer n, we have

(\Delta^n_h f)(x) = \sum_{k=0}^\infty (-1)^k {n \choose k} f(x - kh).

We could set the upper limit of the sum to n, but there’s no harm in setting it to ∞ because the binomial coefficients will be zero for k larger than n. And in this form, we can replace n with the integer n with any positive real number α to define the αth derivative. That is, the Grünwald-Letnikov derivative of f is given by

f^{(\alpha)}(x) = \lim_{h\to 0} \frac{(\Delta^\alpha_h f)(x)}{h^\alpha} = \lim_{h\to 0} h^{-\alpha} \sum_{k=0}^\infty (-1)^k {\alpha \choose k} f(x - kh).

See these notes for the definition of binomial coefficients for possibly non-integer arguments and for an explanation why for integer n the coefficients are eventually zero.

Notice that fractional derivatives require non-local information. Ordinary derivatives at a point are determined by the values of the function in an arbitrarily small neighborhood of that point. But notice how the fractional derivative, as defined in this post, depends on the values of the function at an evenly spaced infinite sequence of points. If we define fractional derivatives via Fourier transform, the non-local nature is more apparent since the Fourier transform at any point depends on the values of the function everywhere.

This non-local feature can be good or bad. If you want to model a phenomena with non-local dependence, fractional differential equations might be a good way to go. But if your phenomena is locally determined, fractional differential equations might be a poor fit.

Related post: Mittag-Leffler functions

Update: Yet another way to define fractional derivatives, this time via fractional integrals, which can be defined in terms of ordinary integrals

ODE solver as a functional fold

One of the most widely used numerical algorithms for solving differential equation is the 4th order Runge-Kutta method. This post shows how the Runge-Kutta method can be written naturally as a fold over the set of points where the solution is needed. These points do not need to be evenly spaced.

Given a differential equation of the form

y' = f(t, y)

with initial condition y(t0) = y0, the 4th order Runge-Kutta method advances the solution by an amount of time h by

y_{n+1} = y_n + \frac{h}{6}\left( k_{n1} + 2k_{n2} + 2k_{n3} + k_{n4}\right)


k_{n1} &=& f(t_n, y_n) \\ k_{n2} &=& f(t_n + 0.5h, y_n + 0.5hk_{n1}) \\ k_{n3} &=& f(t_n + 0.5h, y_n + 0.5hk_{n2}) \\ k_{n4} &=& f(t_n + h, y_n + hk_{n3}) \\

The Haskell code for implementing the accumulator function for foldl looks very much like the mathematical description above. This is a nice feature of the where syntax.

    rk (t, y) t' = (t', y + h*(k1 + 2.0*k2 + 2.0*k3 + k4)/6.0)
            h  = t' - t
            k1 = f t y
            k2 = f (t + 0.5*h) (y + 0.5*h*k1)
            k3 = f (t + 0.5*h) (y + 0.5*h*k2)
            k4 = f (t + 1.0*h) (y + 1.0*h*k3)     

Suppose we want to solve the differential equation y ‘ = (t2y2) sin(y) with initial condition y(0) = -1, and we want to approximate the solution at [0.01, 0.03, 0.04, 0.06]. We would implement the right side of the equation as

     f t y = (t**2 - y**2)*sin(y)

and fold the function rk over our time steps with

    foldl rk (0, -1) [0.01, 0.03, 0.04, 0.06]

This returns (0.06, -0.9527). The first part, 0.06, is no surprise since obviously we asked for the solution up to 0.06. The second part, -0.9527, is the part we’re more interested in.

If you want to see the solution at all the specified points and not just the last one, replace foldl with scanl.

    scanl rk (0, -1) [0.01, 0.03, 0.04, 0.06]

This returns

    [(0.0, -1.0), (0.01, -0.9917), (0.03, -0.9756), (0.042, -0.9678), (0.06, -0.9527)]

As pointed out in the previous post, writing algorithms as folds separates the core of the algorithm from data access. This would allow us, for example, to change rk independently, such as using a different order Runge-Kutta method. (Which hardly anyone does. Fourth order is a kind of sweet spot.) Or we could swap out Runge-Kutta for a different ODE solver entirely just by passing a different function into the fold.

Next see how you could use a fold to compute mean, variance, skewness, and kurtosis in one pass through a stream of data using a fold.

Kalman filters and bottom-up learning

radio antennae

Kalman filtering is a mixture of differential equations and statistics. Kalman filters are commonly used in tracking applications, such as tracking the location of a space probe or tracking the amount of charge left in a cell phone battery. Kalman filters provide a way to synthesize theoretical predictions and actual measurements, accounting for error in both.

Engineers naturally emphasize the differential equations and statisticians naturally emphasize the statistics. Both perspectives are valuable, but in my opinion/experience, the engineering perspective should come first.

From an engineering perspective, a Kalman filtering problem starts as a differential equation. In an ideal world, one would simply solve the differential equation and be done. But the experienced engineer realizes his or her differential equations don’t capture everything. (Unlike the engineer in this post.) Along the road to the equations at hand, there were approximations, terms left out, and various unknown unknowns.

The Kalman filter accounts for some level of uncertainty in the process dynamics and in the measurements taken. This uncertainty is modeled as randomness, but this doesn’t mean that there’s necessarily anything “random” going on. It simply acknowledges that random variables are an effective way of modeling miscellaneous effects that are unknown or too complicated to account for directly. (See Random is as random does.)

The statistical approach to Kalman filtering is to say that it is simply another estimation problem. You start from a probability model and apply Bayes’ theorem. That probability model has a term inside that happens to come from a differential equation in practice, but this is irrelevant to the statistics. The basic Kalman filter is a linear model with normal probability distributions, and this makes a closed-form solution for the posterior possible.

You’d be hard pressed to start from a statistical description of Kalman filtering, such as that given here, and have much appreciation for the motivating dynamics. Vital details have simply been abstracted away. As a client told me once when I tried to understand his problem starting from the top-down, “You’ll never get here from there.”

The statistical perspective is complementary. Some things are clear from the beginning with the statistical formulation that would take a long time to see from the engineering perspective. But while both perspectives are valuable, I believe it’s easier to start on the engineering end and work toward the statistics end rather than the other way around.

History supports this claim. The Kalman filter from the engineering perspective came first and its formulation in terms of Bayesian statistics came later. Except that’s not entirely true.

Rudolf Kálmán published his seminal paper in 1960 and four years later papers started to come out making the connection to Bayesian statistics. But while Kálmán and others were working in the US starting from the engineering end, Ruslan Stratonovich was working in Russia starting from the statistical end. Still, I believe it’s fair to say that most of the development and application of Kalman filters has proceeded from the engineering to the statistics rather than the other way around.

RelatedMore on Kalman filters


Bring out your equations!

Nice discussion from Fundamentals of Kalman Filtering: A Practical Approach by Paul Zarchan and Howard Musoff:

Often the hardest part in Kalman filtering is the subject that no one talks about—setting up the problem. This is analogous to the quote from the recent engineering graduate who, upon arriving in industry, enthusiastically says, “Here I am, present me with your differential equations!” As the naive engineering graduate soon found out, problems in the real world are frequently not clear and are subject to many interpretations. Real problems are seldom presented in the form of differential equations, and they usually do not have unique solutions.

Whether it’s Kalman filters, dynamical systems, or anything else, setting up the problem is the hard part, or at least a hard part.

On the other hand, it’s about as impractical to only be able to set up problems as it is to only be able to solve them. You have to know what kinds of problems can be solved, and how accurately, so you can formulate a problem in a tractable way. There’s a feedback loop: provisional problem formulation, attempted solution, revised formulation, etc. It’s ideal when one person can set up and solve a problem, but it’s enough for the formulators and solvers to communicate well and have some common ground.

Related posts:

Personal growth and discrete harmonic functions

“You are the average of the five people you spend the most time with.”

A Google search says this quote is by Jim Rohn. I think other people have said similar things. I’ve heard it quoted many times. The implication is usually that you can improve your life by hanging around better people.

Here are three things it makes me think of.

  1. It sounds approximately true.
  2. It can’t be literally true.
  3. It reminds me of harmonic functions.

There are numbers to back up the assertion that things like your income and even your weight approximately match the average of the people around you. Cause and effect go both ways: people like to hang out with people like themselves, and we become like the people we hang around.

It’s an aphorism, not meant to be taken literally. But a moment’s thought shows that it can’t be literally true for everybody. In any social network, someone has the most money, for example. That person’s net worth cannot be the average of the net worth of others in the group, unless everyone has the exact same net worth. The same applies to the poorest person in the network.

The reason I say that this reminds me of harmonic functions is the mean value theorem. If a function satisfies Laplace’s equation in some region, at any point in the interior of the region, the value of the function equals the average of the function over a spherical region centered at the point. But this is only true in the interior. On the boundary, you might have a maximum or minimum. If the boundary is compact, you will have a maximum and a minimum, provided the function extends continuously to its boundary.

I think of the continuous case first because I spent years thinking about such things. But there’s a discrete analog of harmonic functions that applies directly to the example above. If you have some network, such as a social network, and assign number to each node, you have a discrete harmonic function if the value at every node is the average of the values at its neighboring nodes. For a finite network, a function cannot be harmonic at every point unless it is constant, for reasons given above. But a function could be harmonic at all but two nodes of a graph, or approximately harmonic at all nodes.

Related posts:

Types of nonlinearity in PDEs

My advisor in grad school used to say

“Nonlinear” is not a hypothesis but the lack of a hypothesis.

To say something positive about nonlinear equations, you have to replace linearity with some specific property. You want to partially remove the restriction of linearity without letting just anything in.

In partial differential equations, one pattern of nonlinearity is to replace linear with monotone.

We say a function on the real line is monotone if x ≥ y implies f(x) ≥ f(y). Strictly speaking this is the definition of monotone non-decreasing, but in this context the “non-decreasing” qualifier is dropped. Now suppose f is a linear transformation on Rn. What could it mean for f to be monotone when statements like x > y don’t make sense? We could rewrite the one dimensional definition as saying

(f(x) – f(y))(xy) ≥ 0

for all x and y. This form generalizes to linear transformations if we interpret the multiplication above as inner product. More generally we say that an operator A from a Banach space V to its dual V* is monotone if

(A(x) – A(y))(xy) ≥ 0

where now instead of an inner product we have more generally the action of the linear functional A(x) – A(y) on the vector x – y. If the space V is a Hilbert space, then this is just an inner product, but in general it doesn’t have to be.

In applications to PDEs, the operator A would represent an operator between function spaces so that the PDE has the form Auf where u is a solution in V and the right hand side f is in V*. The operator A represents some weak form of the PDE and the space V is some sort of Sobolev space with the necessary boundary value assumptions baked in.

Monotonicity alone isn’t enough to prove existence or uniqueness. We need a few other properties.

We say the operator A from V to V* is coercive if Au(u) / ||u|| goes to infinity as ||u|| goes to infinity.

We say A is Type M if whenever un converges weakly to u, Aun converges weakly to f, and the lim sup of Aunf(u) all apply, then Au = f.

Here’s an example of the kind of theorems you can prove with these definitions.

If A is Type M, bounded, and coercive on a separable reflexive Banach space V to its dual V*, then A is surjective.

In application, the Banach space in the theorem is some sort of Sobolev space, functions in some Lp space whose generalized derivatives are in the same space, along with some boundary conditions. (You might wonder how boundary conditions can be defined for functions in such a space. They can’t directly, but they can indirectly via trace operators. Generalized boundary values for generalized functions. It’s all very generalized!)

Saying that A is surjective means the equation Au = f has a solution for any f in V*. So we reduce the problem of showing that the equation has a solution to verifying that A is Type M, bounded, and coercive. Type M is a form of continuity; bounded and coercive follow from a priori estimates.

Related posts:

Bayesian and nonlinear

Someone said years ago that you’ll know Bayesian statistics has become mainstream when people no longer put “Bayesian” in the titles of their papers. That day has come. While the Bayesian approach is still the preferred approach of a minority of statisticians, it’s no longer a novelty. If you want people to find your paper interesting, the substance needs to be interesting. A Bayesian approach isn’t remarkable alone.

You could say the same about nonlinear differential equations. Differential equations are so often nonlinear that the “nonlinear” qualifier isn’t always necessary to say explicitly. Just as a Bayesian analysis isn’t interesting just because it’s Bayesian, a differential equation isn’t necessarily interesting just because it’s nonlinear.

The analogy between Bayesian statistics and nonlinear differential equations breaks down though. Nonlinear equations are intrinsically more interesting than linear ones. But it’s no longer remarkable to solve a nonlinear differential equation numerically.

When an adjective becomes the default, it drops off and the previous default now requires an adjective. Terms like “electronic” and “digital” are fading from use. If you say you’re going to mail someone something, the default assumption is usually that you are going to email it. What used to be simply “mail” is now “snail mail.” Digital signal processing is starting to sound quaint. The abbreviation DSP is still in common use, but digital signal processing is simply signal processing. Now non-digital signal processing requires a qualifier, i.e. analog.

There was no term for Frequentist statistics when it was utterly dominant. Now of course there is. (Some people use the term “classical,” but that’s an odd term given that Bayesian analysis is older.) The term linear has been around a long time. Even when nearly all analysis was linear, people were aware that linearity was a necessary simplification.


Click to learn more about Bayesian statistics consulting


Related posts:

Maximum principles and bounds for initial value problems

The previous post looked at using maximum principles to bound the solution to a boundary value problem. This is a similar post, focusing on an initial value problem. As before we start with the differential operator

L[u] = u” + g(x)u’ + h(x)

where g and h are bounded and h is non-positive. We are interested in solutions to the equation L[u] = f, but this time we want to specify the initial conditions u(0) = u0 and u(0) = u0. We will use the same maximum principle as before, but a slightly different approach to constructing upper and lower bounds on solutions. Our conclusion will also be a little different: this time we’ll get upper and lower bounds not only on the solution u but also on its derivative u‘.

We look for a function z(x) satisfying the differential inequality

L[u] ≥ f

and the two initial inequalities z(0) ≥ u0 and z(0) ≥ u0. We also look for a function w(x) satisfying the same inequalities in the opposite direction, i.e. replacing ≥ with ≤. (Note that replacing equality with inequality in the differential equation and initial conditions makes it much easier to find solutions.)

The maximum principle tells us that z and w provide upper and lower bounds respectively on u, and their derivatives provide bounds on the derivative of u. That is,

w(x) ≤ u(x) ≤ z(x)


w‘(x) ≤ u‘(x) ≤ z‘(x).

We illustrate this technique on a variation on Bessel’s equation:

u”(x) + u‘(x)/xu(x) = 0

with initial conditions u(0) = 1 and u‘(0) = 0. We wish to bound the solution u on the inverval [0, 1].

As candidates, we try zc1x2 + 1 and wc2x2 + 1. (Why not a general quadratic polynomial? The initial conditions tell us that the linear coefficient will be 0 and the constant term will be 1.)


L[cx2 + 1] = c(4 – x2) – 1

it suffices to pick c1 = 1/3 and c2 = 1/4. This tells us that

x/4 + 1 ≤ u(x) ≤ x/3 + 1.

To see how well these bounds work, we compare with the exact solution I0, the so-called modified Bessel function of the first kind.

The lower bound is very good, though a comparably good upper bound would take a more careful choice of z.

Source: Maximum Principles in Differential Equations by Protter and Weinberger

Related posts:

Maximum principle and approximating boundary value problems

Solutions to differential equations often satisfy some sort of maximum principle, which can in turn be used to construct upper and lower bounds on solutions.

We illustrate this in one dimension, using a boundary value problem for an ordinary differential equation (ODE).

Maximum principles

If the second derivative of a function is positive over an open interval (ab), the function cannot have a maximum in that interval. If the function has a maximum over the closed interval [ab] then it must occur at one of the ends, at a or b.

This can be generalized, for example, to the following maximum principle. Let L be the differential operator

L[u] = u” + g(x)u’ + h(x)

where g and h are bounded functions on some interval [a, b] and h is non-positive. Suppose L[u] ≥ 0 on (a, b). If u has an interior maximum, then u must be constant.

Boundary value problems

Now suppose that we’re interested in the boundary value problem L[u] = f where we specify the values of u at the endpoints a and b, i.e. u(a) = ua and u(b) = ub. We can construct an upper bound on u as follows.

Suppose we find a function z such that L[z] ≤ f and z(a) ≥ ua and z(b) ≥ ub. Then by applying the maximum principle to u – z, we see that u – z must be ≤ 0, and so z is an upper bound for u.

Similarly, suppose we find a function w such that L[w] ≥ f and w(a) ≤ ua and w(b) ≤ ub. Then by applying the maximum principle to w – u, we see that w – u must be ≤ 0, and so w is an lower bound for u.

Note that any functions z and w that satisfy the above requirements give upper and lower bounds, though the bounds may not be very useful. By being clever in our choice of z and w we may be able to get tighter bounds. We might start by choosing polynomials, exponentials, etc. Any functions that are easy to work with and see how good the resulting bounds are.

Tomorrow’s post is similar to this one but looks at bounds for an initial value problem rather than a boundary value problem.

Airy equation example

The following is an elaboration on an example from [1]. Suppose we want to bound solutions to

u”(x) – x u(x) = 0

where u(0) = 0 and u(1) = 1. (This is a well-known equation, but for purposes of illustration we’ll pretend at first that we know nothing about its solutions.)

For our upper bound, we can simply use z(x) = x. We have L[z] ≤ 0 and z satisfies the boundary conditions exactly.

For our lower bound, we use w(x) = x – βx(1 – x). Why? The function z already satisfies the boundary condition. If we add some multiple of x(1 – x) we’ll maintain the boundary condition since x(1 – x) is zero at 0 and 1. The coefficient β gives us some room to maneuver. Turns out L[w] ≥ 0 if β ≥ 1/2. If we choose β = 1/2 we have

(xx2)/2 ≤ u(x) ≤ x

In general, you don’t know the function you’re trying to bound. That’s when bounds are most useful. But this is a sort of toy example because we do know the solution. The equation in this example is well known and is called Airy’s equation. The Airy functions Ai and Bi are independent solutions. Here’s a plot of the solution with its upper and lower bounds.

Here’s the Python code I used to solve for the coefficients of Ai and Bi and make the plot.

import numpy as np
from scipy.linalg import solve
from scipy.special import airy
import matplotlib.pyplot as plt

# airy(x) returns (Ai(x), Ai'(x), Bi(x), Bi'(x))
def Ai(x):
    return airy(x)[0]

def Bi(x):
    return airy(x)[2]

M = np.matrix([[Ai(0), Bi(0)], [Ai(1), Bi(1)]])
c = solve(M, [0, 1])

t = np.linspace(0, 1, 100)
plt.plot(t, (t + t**2)/2, 'r-', t, c[0]*Ai(t) + c[1]*Bi(t), 'k--', t, t, 'b-',)
plt.legend(["lower bound $(x + x^2)/2$", 
    "exact solution $c_0Ai + c_1Bi$", 
    "upper bound $x$"], loc="upper left")

SciPy’s function airy has an optimization that we waste here. The function computes Ai and Bi and their first derivatives all at the same time. We could take advantage of that to remove some redundant computations, but that would make the code harder to read. We chose instead to wait an extra nanosecond for the plot.

Help with differential equations

* * *

[1] Murray Protter and Hans Weinberger. Maximum Principles in Differential Equations.

Structure in jazz and math

Last night I went to a concert by the Branford Marsalis Quartet. One of the things that impressed me about the quartet was how creative they are while also being squarely within a tradition. People who are not familiar with jazz may not realize how structured it is and how much it respects tradition. The spontaneous and creative aspects of jazz are more obvious than the structure. In some ways jazz is more tightly structured than classical music. To use Francis Schaeffer’s phrase, there is form and freedom, freedom within form.

Every field has its own structure, its tropes, its traditions. Someone unfamiliar with the field can be overwhelmed, not having the framework that an insider has to understand things. They may think something is completely original when in fact the original portion is small.

In college I used to browse the journals in the math library and be completely overwhelmed. I didn’t learn until later that usually very little in a journal article is original, and even the original part isn’t that original. There’s a typical structure for a paper in PDEs, for example, just as there are typical structures for romantic comedies, symphonies, or space operas. A paper in partial differential equations might look like this:

  1. Motivation / previous work
  2. Weak formulation of PDE
  3. Craft function spaces and PDE as operator
  4. A priori estimates imply operator properties
  5. Well posedness results
  6. Regularity

An expert knows these structures. They know what’s boilerplate, what’s new, and just how new the new part is. When I wrote something up for my PhD advisor I remember him saying “You know what I find most interesting?” and pointing to one inequality. The part he found interesting, the only part he found interesting, was not that special from my perspective. It was all hard work for me, but only one part of it stood out as slightly original to him. An expert in partial differential equations sees a PDE paper the way a professional musician listens to another or the way a chess master sees a chess board.

While a math journal article may look totally incomprehensible, an expert in that specialization might see 10% of it as somewhat new. An interesting contrast to this is the “abc conjecture.” Three and a half years ago Shinichi Mochizuki proposed a proof of this conjecture. But his approach is so entirely idiosyncratic that nobody has been able to understand it. Even after a recent conference held for the sole purpose of penetrating this proof, nobody but Mochizuki really understands it. So even though most original research is not that original, once in a while something really new comes out.


Mathematics of medical plastics

In this interview, I talk with Ray Rilling about applying mathematics to manufacturing medical plastics.

Photo of Ray Rilling

JC: Ray, could you start by saying a little about yourself?

RR: Sure. My name is Ray Rilling, and I am the Director of Technology at Putnam Plastics.

My initial training was cellular biology with an emphasis on epidemiology, but I decided to move to a more applied (and lucrative) field. I’ve been in medical plastics for the last 23 years.

When I saw that epidemiology was a crowded field, I pursued substantial amounts of additional coursework that allowed me to get into the medical engineering plastics space. In lots of ways I think this pick-and-choose approach was much better than a traditional bio-medical engineering degree would have been. It allowed me to focus on exactly what I needed to learn.

JC: Thanks. And tell me a bit about where you work.

RR: Putnam Plastics is a 30 year old medical plastics manufacturer, specializing in polymer-based medical components (i.e. plastic tubing for medical usage).

JC: I did some consulting work for a company that manufactures stents, so I imagine your companies have a few things in common. What is your day-to-day mathematical toolkit?

RR: We cross a lot of boundaries and there are many layers of computational needs. The majority of the computations are basic algebraic formulas to calculate simple things like annular tooling draw down ratios or the ultimate tensile of a tube. From there, you may find that calculating the burst pressure of a composite or may require the solving of a linear system.

The most challenging computations that we perform are for our complex polymer extrusion tools. These are based on Computational Fluid Dynamics (CFD) and Finite or Boundary Element Methods (FEM / BEM). Many complexities emerge because polymer flow is compressible, non-isothermal, and viscoelastic in nature. It does not allow us to use conventional Navier-Stokes or other common equations. In all reality, we rely on proprietary equations to point us in the right direction. No model has ever provided a perfect solution. We combine our experience and expertise with the applied calculations.

The most complex of the computations are performed with third party software packages and often require the use of complex meshing, materials testing for inputs, and can even require us to create modified code to suit the specific application. We work with some very creative minds and there is never a shortage of new challenges.

Aside from the design engineering mathematics, we think statistically about our results. This can be to analyze the process capability, equivalency, and even product risk analysis. This statistical analysis can be intertwined with the design engineering aspects and deal with a range of biological and medical constants (i.e. devices might need to handle liquid nitrogen or remain in bodies for years or decades).

One of my mentors used to say, “The best equation is teamwork.” Teamwork is a sort of expectation, a crowd-sourced form of expert opinion. It allows you bring a focus on what the real variables are.

Some calculations can take weeks, so having a good appreciation of how to approach the challenge is important. You can get away with not knowing what’s under the hood. But that’s dangerous. It is much better to get things done more quickly, and with better understanding. Especially when a client is waiting on results.

JC: What is some math you wish you learned more of?

Vector and tensor-based physics. As the bar in the medical device industry increases, so do the expectations of the products that we make. Being able take large number of variables or inputs and transform them into a workable model is extremely valuable. My dream is to be able to reliably calculate extrusion tooling, product failure modes, and performance properties before I start cutting steel. But in our time and age, we still rely on some development iterations and calculate what we can.

I also wish I had more applied materials science. When I am developing something in the lab, sometimes the product does not perform the way you want it to. Everytime I start learning something new, like applied surface chemistry or the effects of radiation on polymers, I think of 100 things that I could have done in previous projects.

JC: Did you learn any math you haven’t had much use for yet? I’ve used most of the math I learned, though sometimes there’s been a gap of twenty years between learning something and applying it.

RR: I actually use pretty much all of the math I’ve learned. But that’s likely because I was able to pick and choose because I went back and did supplemental studies. Even the epidemiology is useful.

JC: I wanted to pick up on what you said earlier about physical models. Could you say more about the interplay of physical and mathematical models at Putnan?

RR: The models are never completely right. We use them as a tool. They often fail in three key ways:

  1. We find additional variables we need to account for or constrain. In many cases our first attempt failed because the product did not perform properly in an unaccounted for way. At the end of the day it might take many iterations before we have all of the performance properties that are needed.
  2. We are translating soft or subjective numbers from physicians or engineers. A model will provide concrete numbers. It is often important to create a baseline product that can take the customers subjective description and correlate it to a model.
  3. We need to additionally constrain for cost effectiveness (i.e. minimize the amount of expensive wire in a catheter).

Those three trade offs mean that we are never able to just take a model print out and run with it. There always has to be some back and forth.

JC: It’s interesting that you consider shortcomings besides inaccuracy as failures. Someone new to applying math in the real world might think of your first point but the other points. The latter two aren’t failures from an academic perspective, but they certainly are from a business perspective.

* * *

Other interviews:

It all boils down to linear algebra

When I was in college, my view of applied math was something like the following.

Applied math is mostly mathematical physics. Mathematical physics is mostly differential equations. Numerical solution of differential equations boils down to linear algebra. Therefore the heart of applied math is linear algebra.

I still think there’s a lot of truth in the summary above. Linear algebra is very important, and a great deal of applied math does ultimately depend on efficient solutions of large linear systems. The weakest link in the argument may be the first one: there’s a lot more to applied math than mathematical physics. Mathematical physics hasn’t declined, but other areas have grown. Still, areas of applied math outside of mathematical physics and outside of differential equations often depend critically on linear algebra.

I’d certainly recommend that someone interested in applied math become familiar with numerical linear algebra. If you’re going to be an expert in differential equations, or optimization, or many other fields, you need to be at leas familiar with numerical linear algebra if you’re going to compute anything. As Stephen Boyd points out in his convex optimization class, many of the breakthroughs in optimization over the last 20 years have at their core breakthroughs in numerical linear algebra. Improved algorithms have sped up the solution of very large systems more than Moore’s law has.

It may seem questionable to say that linear algebra is at the heart of applied math because it’s linear. What about nonlinear applications, such as nonlinear PDEs? Nonlinear differential equations lead to nonlinear algebraic equations when discretized. But these nonlinear systems are solved via iterations of linear systems, so we’re back to linear algebra.

Click to find out more about consulting for numerical computing