# Fourier transform of a function on a graph

What is a Fourier transform at its core? An expansion of function in terms of eigenfunctions of the Laplacian. For a function on the real line, the Laplacian is simply the second derivative. The functions mapped to multiples of themselves by taking second derivatives are sines and cosines of various frequencies. A Fourier series is a change of basis, using as basis vectors those functions who behave the simplest under the second derivative.

The Fourier transform of a function on a graph is also a change of basis, expanding a discrete function in terms of eigenvalues of the Laplacian, in this case the graph Laplacian.

The Fourier transform of a function f, evaluated at a frequency ω, is the inner product of f with the eigenfunction exp(2πiωt).

The inner product of two complex functions f and g is the integral of the product of f and the conjugate of g. Conjugation is why exp(2πiωt) became exp(-2πiωt).

The Fourier transform of a discrete function f on a graph, evaluated at an eigenvalue λi, is the inner product of f (i.e. the vector of values of f at each node) with the eigenvector associated with λi.

Here the inner product is a discrete sum rather than an integral. As before, we take the complex conjugate of the second item in the product.

The eigenvectors associated with the smallest eigenvalues of the graph Laplacian are analogous to low frequency sines and cosines. The eigenvalue components corresponding to nearly vertices in a graph should be close together. This analogy explains why spectral coordinates work so well.

Related:

# Connection between hypergeometric distribution and series

What’s the connection between the hypergeometric distributions, hypergeometric functions, and hypergeometric series?

The hypergeometric distribution is a probability distribution with parameters NM, and n. Suppose you have an urn containing N balls, M red and the rest, N – M blue and you select n balls at a time. The hypergeometric distribution gives the probability of selecting k red balls.

The probability generating function for a discrete distribution is the series formed by summing over the probability of an outcome k and xk. So the probability generating function for a hypergeometric distribution is given by

The summation is over all integers, but the terms are only non-zero for k between 0 and M inclusive. (This may be more general than the definition of binomial coefficients you’ve seen before. If so, see these notes on the general definition of binomial coefficients.)

It turns out that f is a hypergeometric function of x because it is can be written as a hypergeometric series. (Strictly speaking,  f is a constant multiple of a hypergeometric function. More on that in a moment.)

A hypergeometric function is defined by a pattern in its power series coefficients. The hypergeometric function F(a, bcx) has a the power series

where (n)k is the kth rising power of n. It’s a sort of opposite of factorial. Start with n and multiply consecutive increasing integers for k terms. (n)0 is an empty product, so it is 1. (n)1 = n, (n)2 = n(n+1), etc.

If the ratio of the k+1st term to the kth term in a power series is a polynomial in k, then the series is a (multiple of) a hypergeometric series, and you can read the parameters of the hypergeometric series off the polynomial. This ratio for our probability generating function works out to be

and so the corresponding hypergeometric function is F(-M, –nN – M – n + 1; x). The constant term of a hypergeometric function is always 1, so evaluating our probability generating function at 0 tells us what the constant is multiplying F(-M, –nN – M – n + 1; x). Now

and so

The hypergeometric series above gives the original hypergeometric function as defined by Gauss, and may be the most common form in application. But the definition has been extended to have any number of rising powers in the numerator and denominator of the coefficients. The classical hypergeometric function of Gauss is denoted 2F1 because it has two falling powers on top and one on bottom. In general, the hypergeometric function pFq has p rising powers in the denominator and q rising powers in the denominator.

The CDF of a hypergeometric distribution turns out to be a more general hypergeometric function:

where a = 1, bk+1-M, ck+1-n, d = k+2, and eN+k+2-Mn.

Thanks to Jan Galkowski for suggesting this topic via a comment on an earlier post, Hypergeometric bootstrapping.

# Structure in jazz and math

Last night I went to a concert by the Branford Marsalis Quartet. One of the things that impressed me about the quartet was how creative they are while also being squarely within a tradition. People who are not familiar with jazz may not realize how structured it is and how much it respects tradition. The spontaneous and creative aspects of jazz are more obvious than the structure. In some ways jazz is more tightly structured than classical music. To use Francis Schaeffer’s phrase, there is form and freedom, freedom within form.

Every field has its own structure, its tropes, its traditions. Someone unfamiliar with the field can be overwhelmed, not having the framework that an insider has to understand things. They may think something is completely original when in fact the original portion is small.

In college I used to browse the journals in the math library and be completely overwhelmed. I didn’t learn until later that usually very little in a journal article is original, and even the original part isn’t that original. There’s a typical structure for a paper in PDEs, for example, just as there are typical structures for romantic comedies, symphonies, or space operas. A paper in partial differential equations might look like this:

1. Motivation / previous work
2. Weak formulation of PDE
3. Craft function spaces and PDE as operator
4. A priori estimates imply operator properties
5. Well posedness results
6. Regularity

An expert knows these structures. They know what’s boilerplate, what’s new, and just how new the new part is. When I wrote something up for my PhD advisor I remember him saying “You know what I find most interesting?” and pointing to one inequality. The part he found interesting, the only part he found interesting, was not that special from my perspective. It was all hard work for me, but only one part of it stood out as slightly original to him. An expert in partial differential equations sees a PDE paper the way a professional musician listens to another or the way a chess master sees a chess board.

While a math journal article may look totally incomprehensible, an expert in that specialization might see 10% of it as somewhat new. An interesting contrast to this is the “abc conjecture.” Three and a half years ago Shinichi Mochizuki proposed a proof of this conjecture. But his approach is so entirely idiosyncratic that nobody has been able to understand it. Even after a recent conference held for the sole purpose of penetrating this proof, nobody but Mochizuki really understands it. So even though most original research is not that original, once in a while something really new comes out.

# Consulting for consultants

They say that doctors make terrible patients, but in my experience consultants make great consulting clients. The best are confident in their own specialization and respect you in yours. They get going quickly and pay quickly. (I’ve only worked for consultants who have small companies. I imagine large consulting companies are as slow as other companies the same size.)

Sometimes consultants working in software development will ask me to help out with some mathematical part of their projects. And sometimes math/stat folks will ask me to help out with some computational part of their projects.

I started my consulting business three years ago. Since then I’ve gotten to know a few other consultants well. This lets me offer a broader range of services to a client by bringing in other people, and sometimes it helps me find projects.

If you’re a consultant and interested in working together, please send me an email introducing yourself. I’m most interested in meeting consultants who have some overlap with what I do but who also have complementary strengths.

# Dilogarithm, polylogarithm, and related functions

The functions dilogarithm, trilogarithm, and more generally polylogarithm are meant to be generalizations of the logarithm. I first came across the dilogarithm in college when I was evaluating some integral with Mathematica, and they’ve paid a visit occasionally ever since.

Unfortunately polylogarithms are defined in several slightly different and incompatible ways. I’ll start by following An Atlas of Functions and then mention differences in A&S, SciPy, and Mathematica.

## Polylogarithms

According to Atlas,

Polylogarithms are themselves special cases of Lerch’s function. Also known as Jonquière’s functions (Ernest Jean Philippe Fauque de Jonquières, 1820–1901, French naval officer and mathematician), they appear in the Feynman diagrams of particle physics.

The idea is to introduce an extra parameter ν in the power series for natural log:

When ν = 1 we get the ordinary logarithm, i.e. polyln1 = log. Then polyln2 is the dilogarithm diln, and polyln3 is the trilogarithm triln.

One advantage of the definition in Atlas is that the logarithm is a special case of the polylogarithm. Other conventions don’t have this property.

## Other conventions

The venerable A&S defines dilogarithms in a way that’s equivalent to the negative of the definition above and does not define polylogarithms of any other order. SciPy’s special function library follows A&S. SciPy uses the name spence for the dilogarithm for reasons we’ll get to shortly.

Mathematica has the function PolyLog[ν, x] that evaluates to

So polylnν above corresponds to -PolyLog[ν, -x] in Mathematica. Matlab’s polylog is the same as Mathematica’s PolyLog.

## Relation to other functions

Spence’s integral is the function of x given by the integral

and equals diln(x). Note that the SciPy function spence returns the negative of the integral above.

The Lerch function mentioned above is named for Mathias Lerch (1860–1922) and is defined by the integral

The connection with polylogarithms is easier to see from the series expansion:

The connection with polylogarithms is then

Note that the Lerch function also generalizes the Hurwitz zeta function, which in turn generalizes the Riemann zeta function. When x = 1, the Lerch function reduces to ζ(ν, u).

# Regular expression to match any chemical element

Here’s a frivolous exercise in regular expressions: Write a regex to match any chemical element symbol.

Here’s one solution.

A[cglmrstu]|B[aehikr]?|C[adeflmnorsu]?|D[bsy]|E[rsu]|F[elmr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airuv]|M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|U(u[opst])?|V|W|Xe|Yb?|Z[nr]


Here’s the same expression in more readable form:

/
A[cglmrstu]     |
B[aehikr]?      |
D[bsy]          |
E[rsu]          |
F[elmr]?        |
H[efgos]?       |
I[nr]?          |
Kr?             |
L[airuv]        |
M[dgnot]        |
N[abdeiop]?     |
Os?             |
P[abdmortu]?    |
R[abefghnu]     |
S[bcegimnr]?    |
T[abcehilm]     |
U(u[opst])?     |
V               |
W               |
Xe              |
Yb?             |
Z[nr]
/x


The /x option in Perl says to ignore white space. Other regular expression implementations have something similar. Python has two such options, X for similarity with Perl, and VERBOSE for readability. Both have the same behavior.

## Regex syntax

The regular expression says that a chemical element symbol may start with A, followed by c, g, l, m, r, s, t, or u; or a B, optionally followed by a, e, h, i, k, or r; or …

The most complicated part of the regex is the part for symbols starting with U. There’s Uranium whose symbols is simply U, and there are the elements who have temporary names based on their atomic numbers: Ununtrium, Ununpentium, Ununseptium, and Ununoctium. These are just Latin for one-one-three, one-one-five, one-one-seven, and one-one-eight. The symbols are U, Uut, Uup, Uus, and Uuo. The regex U(u[opst])? can be read “U, optionally followed by u and one of o, p, s, or t.”

Note that the regex will match any string that contains a chemical element symbol, but it could match more. For example, it would match “I’ve never been to Boston in the fall” because that string contains B, the symbol for boron. Exercise: Modify the regex to only match chemical element symbols.

## Regex golf

There may be clever ways to use fewer characters at the cost of being more obfuscated. But this is for fun anyway, so we’ll indulge in a little regex golf.

There are five elements whose symbols start with I or Z: I, In, Ir, Zn, and Zr. You could write [IZ][nr] to match four of these. The regex I|[IZ][nr] would represent all five with 10 characters, while I[nr]?|Z[nr] uses 12. Two characters saved! Can you cut out any more?

# Discrete harmonic functions

A (continuous) harmonic function f on an open set a function that is twice differentiable and satisfied Laplace’s equation: ∇2 f = 0. Such functions have a mean value property: at a point x interior to the set, f(x) equals the average value of f over any ball around x that fits inside the region. It turns out that the converse is true: a locally-integrable function satisfying the mean value property is harmonic.

We can take the mean value property from the continuous case as the definition in the discrete case. A function on a graph is harmonic at a node x if f(x) equals the average of f‘s values on the nodes connected to x. A function is said to have a singularity at points of a graph where it is not harmonic.

A non-constant function on a finite connected graph cannot be harmonic at every node. It has to have at least two singularities, namely the points where it takes on its maximum and its minimum. This is analogous to Liouville’s theorem for (continuous) harmonic functions: a bounded harmonic function on all of Euclidean space must be constant. Some infinite graphs have non-constant functions that are harmonic everywhere. For example, linear functions on the integers are harmonic.

In the continuous case, we want to find solutions to Laplace’s equation satisfying a boundary condition. We want to find a function that has the specified values on the boundary and is harmonic in the interior. With the right regularity conditions, this solution exists and is unique.

We can do the same for discrete harmonic functions. If we specify the values of a function on some non-empty subset of nodes, there is a unique discrete harmonic function that extends the function to the rest of a graph.

Discrete harmonic functions come up in applications such as random walks and electrical networks.

# Mathematics of medical plastics

In this interview, I talk with Ray Rilling about applying mathematics to manufacturing medical plastics.

JC: Ray, could you start by saying a little about yourself?

RR: Sure. My name is Ray Rilling, and I am the Director of Technology at Putnam Plastics.

My initial training was cellular biology with an emphasis on epidemiology, but I decided to move to a more applied (and lucrative) field. I’ve been in medical plastics for the last 23 years.

When I saw that epidemiology was a crowded field, I pursued substantial amounts of additional coursework that allowed me to get into the medical engineering plastics space. In lots of ways I think this pick-and-choose approach was much better than a traditional bio-medical engineering degree would have been. It allowed me to focus on exactly what I needed to learn.

JC: Thanks. And tell me a bit about where you work.

RR: Putnam Plastics is a 30 year old medical plastics manufacturer, specializing in polymer-based medical components (i.e. plastic tubing for medical usage).

JC: I did some consulting work for a company that manufactures stents, so I imagine your companies have a few things in common. What is your day-to-day mathematical toolkit?

RR: We cross a lot of boundaries and there are many layers of computational needs. The majority of the computations are basic algebraic formulas to calculate simple things like annular tooling draw down ratios or the ultimate tensile of a tube. From there, you may find that calculating the burst pressure of a composite or may require the solving of a linear system.

The most challenging computations that we perform are for our complex polymer extrusion tools. These are based on Computational Fluid Dynamics (CFD) and Finite or Boundary Element Methods (FEM / BEM). Many complexities emerge because polymer flow is compressible, non-isothermal, and viscoelastic in nature. It does not allow us to use conventional Navier-Stokes or other common equations. In all reality, we rely on proprietary equations to point us in the right direction. No model has ever provided a perfect solution. We combine our experience and expertise with the applied calculations.

The most complex of the computations are performed with third party software packages and often require the use of complex meshing, materials testing for inputs, and can even require us to create modified code to suit the specific application. We work with some very creative minds and there is never a shortage of new challenges.

Aside from the design engineering mathematics, we think statistically about our results. This can be to analyze the process capability, equivalency, and even product risk analysis. This statistical analysis can be intertwined with the design engineering aspects and deal with a range of biological and medical constants (i.e. devices might need to handle liquid nitrogen or remain in bodies for years or decades).

One of my mentors used to say, “The best equation is teamwork.” Teamwork is a sort of expectation, a crowd-sourced form of expert opinion. It allows you bring a focus on what the real variables are.

Some calculations can take weeks, so having a good appreciation of how to approach the challenge is important. You can get away with not knowing what’s under the hood. But that’s dangerous. It is much better to get things done more quickly, and with better understanding. Especially when a client is waiting on results.

JC: What is some math you wish you learned more of?

Vector and tensor-based physics. As the bar in the medical device industry increases, so do the expectations of the products that we make. Being able take large number of variables or inputs and transform them into a workable model is extremely valuable. My dream is to be able to reliably calculate extrusion tooling, product failure modes, and performance properties before I start cutting steel. But in our time and age, we still rely on some development iterations and calculate what we can.

I also wish I had more applied materials science. When I am developing something in the lab, sometimes the product does not perform the way you want it to. Everytime I start learning something new, like applied surface chemistry or the effects of radiation on polymers, I think of 100 things that I could have done in previous projects.

JC: Did you learn any math you haven’t had much use for yet? I’ve used most of the math I learned, though sometimes there’s been a gap of twenty years between learning something and applying it.

RR: I actually use pretty much all of the math I’ve learned. But that’s likely because I was able to pick and choose because I went back and did supplemental studies. Even the epidemiology is useful.

JC: I wanted to pick up on what you said earlier about physical models. Could you say more about the interplay of physical and mathematical models at Putnan?

RR: The models are never completely right. We use them as a tool. They often fail in three key ways:

1. We find additional variables we need to account for or constrain. In many cases our first attempt failed because the product did not perform properly in an unaccounted for way. At the end of the day it might take many iterations before we have all of the performance properties that are needed.
2. We are translating soft or subjective numbers from physicians or engineers. A model will provide concrete numbers. It is often important to create a baseline product that can take the customers subjective description and correlate it to a model.
3. We need to additionally constrain for cost effectiveness (i.e. minimize the amount of expensive wire in a catheter).

Those three trade offs mean that we are never able to just take a model print out and run with it. There always has to be some back and forth.

JC: It’s interesting that you consider shortcomings besides inaccuracy as failures. Someone new to applying math in the real world might think of your first point but the other points. The latter two aren’t failures from an academic perspective, but they certainly are from a business perspective.

* * *

Other interviews:

# Hypergeometric bootstrapping: implement one, get seven free

Suppose you have software to compute one hypergeometric function. How many more hypergeometric functions can you compute from it?

Hypergeometric functions satisfy a lot of identities, so you can bootstrap one such function into many more. That’s one reason they’re so useful in applications. For this post, I want to focus on just three formulas, the so-called linear transformation formulas that relate one hypergeometric function to another. (There are more linear formulas that relate one hypergeometric function to a combination of two others. I may consider those in a future post, but not today.)

## Linear transformations

The classical hypergeometric function has three parameters. The linear transformations discussed above relate the function with parameters (abc) to those with parameters

• (c – ac – bc)
• (ac – bc)
• (bc – ac)

Here are the linear transformations in detail:

## How many more?

The three transformations above are the ones listed in Handbook of Mathematical Functions by Abramowitz and Stegun. How many more transformations can we create by combining them? To answer this, we represent each of the transformations with a matrix. The transformations correspond to multiplying the matrix by the column vector  (abc).

The question of how many transformations we can come up with can be recast as asking what is the order of the group generated by AB, and C. (I’m jumping ahead a little by presuming these matrices generate a group. Conceivably the don’t have inverses, but in fact they do. The inverses of AB, and C are AB, and AC respectively.)

A little experimentation reveals that there are eight transformations: I, ABCDABEACFBC, and GCB. So in addition to the identity I, the do-nothing transformation, we have found four more: DEF, and G. These last four correspond to taking  (abc) to

• (c – abc)
• (ac – bc)
• (bac)
• (c – bc – ac)

This means that if we have software to compute the hypergeometric function with one fixed set of parameters, we can bootstrap that into computing the hypergeometric function with up to seven more sets of parameters. (Some of the seven new combinations could be repetitive if, for example, ab.)

## Identifying the group

We have uncovered a group of order 8, and there are only 5 groups that size. We should be able to find a familiar group isomorphic to our group of transformations.

The five groups of order 8 are Z8 (cyclic group), Z4×Z2 (direct product), E8 (elementary Abelian group), D8 (dihedral group), and the quaternion group. The first three are Abelian, and our group is not, so our group must be isomorphic to either the quaternion group or the dihedral group. The quaternion group has only 1 element of order 2, and the dihedral group has 5 elements of order 2. Our group has five elements of order 2 (ABDFG) and so it must be isomorphic to the dihedral group of order 8, the rotations and reflections of a square. (Some authors call this D8, because the group has eight elements. Others call it D4, because it is the group based on a 4-sided figure.)

# Can you hear the shape of a network?

Mark Kac asked in 1966 whether you can hear the shape of a drum. In other words, can two drums, made of the same material, produce the exact same sound but have different shapes? More formally, Kac asked whether the eigenvalues of the Laplace’s equation with zero boundary conditions uniquely determine the shape of a region in the plane. The question remained open until 1992. No, you can’t always hear the shape of a drum. But often you can. With some restrictions on the regions, the shape is uniquely determined by the sound, i.e., the Laplace spectrum.

Ten years before Kac asked about hearing the shape of a drum, Günthard and Primas asked the analogous question about graphs. Can you hear the shape of a graph? That is, can two different graphs have the same eigenvalues? At the time, the answer was believed to be yes, but a year later it was found to be no, not always [1]. So the next natural question is when can you hear the shape of a graph, i.e. under what conditions is a graph determined by its eigenvalues? It depends on which matrix you’re taking the eigenvalues of, but under some conditions some matrix spectra uniquely determine graphs. We don’t know in general how common it is for spectra to uniquely determine graphs.

Here are two graphs that have the same adjacency matrix spectra, first published in [2]:

Both have adjacency spectra [-2, 0, 0, 0, 2]. But the graphs are not cospectral as far as the Laplacian is concerned. Their Laplace spectra are [0, 0, 2, 2, 4] and [0, 1, 1, 1, 5] respectively. We could tell that the Laplace spectra would be different before computing them because the second smallest Laplace eigenvalue is positive if and only if a graph is connected.

The graphs below are cospectral for the adjacency, Laplacian, and unsigned Laplacian matrices.

Both graphs have the same number of nodes and edges, and every node has degree 4 in both graphs. But the graph on the left contains more triangles than the one on the right, so they cannot be isomorphic.

One way to test whether two graphs are isomorphic is to compute their spectra. If the spectra are different, the graphs are not isomorphic. So spectral analysis gives a way to show that two graphs are not isomorphic in polynomial time, though the test may be inconclusive.

If two graphs do have the same spectra, what is the probability that they are isomorphic? In [1] the authors answer this question empirically for graphs of order up to 11. Very roughly, there’s about an 80% chance graphs with the same adjacency matrix spectrum are isomorphic. The chances go up to 90% for the Laplacian and 95% for the signless Laplacian.

* * *

[1] Edwin R. van Dam, Willem H. Haemers. Which graphs are determined by their spectrum? Linear Algebra and its Applications 373 (2003) 241–272

[2] D.M. Cvetkovi´c, Graphs and their spectra, Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Mat. Fiz. 354–356 (1971) 1–50.

# Discrete Laplace transform

The relationship between the discrete Laplace transform and discrete Fourier transform is not quite the same as that between their continuous counterparts.

## Continuous Fourier and Laplace transforms

The continuous versions of the Fourier and Laplace transforms are given as follows.

Fourier transform:

Laplace transform:

The Fourier transform is defined several ways, and I actually prefer the convention that puts a factor of 2π in the exponential, but the convention above makes the analogy with Laplace transform simpler. There are two differences between the Fourier and Laplace transforms. The Laplace transform integrates over only half the real line, compared to the entire real line for Fourier. But a variation on the Laplace transform, the Bilateral Laplace transform integrates over the entire real line. The Bilateral Laplace transform at s is simply the Fourier transform at xis. And of course the same is true for the (one-sided) Laplace transform if the function f is only non-zero for positive values.

I’ve encountered the Fourier transform more in application, and the Laplace transform more in teaching. This is not to say the Laplace transform isn’t used in practice; it certainly is used in applications. But the two transforms serve similar purposes, and the Laplace transform is easier to teach. Because the factor exp(-sx) decays rapidly, the integral defining the Laplace transform converges for functions where the integral defining the Fourier transform would not. Such functions may still have Fourier transforms, but the transforms require distribution theory whereas the Laplace transforms can be computed using basic calculus.

## Discrete Fourier and Laplace Transforms

There’s more difference between the discrete versions of the Fourier and Laplace transforms than between the continuous versions.

The discrete Fourier transform (DFT) approximates the integral defining the (continuous) Fourier transform with a finite sum. It discretizes the integral and truncates its domain. The discrete Laplace transform is an infinite sum. It discretizes the integral defining the Laplace transform, but it does not truncate the domain. Given a step size η > 0, the discrete Laplace transform of f is

The discrete Laplace transform isn’t “as discrete” as the discrete Fourier transform. The latter takes a finite sequence and returns a finite sequence. The former evaluates a function at an infinite number of points and produces a continuous function.

The discrete Laplace transform is used in applications such as signal processing, as well as in the theory of analytic functions.

## Connection with the z-transform and generating functions

If η = 1 and z = exp(-s), the discrete Laplace transform becomes the z-transform of the values of f at non-negative integers. And if we replace z with 1/z, or equivalently set z = exp(s) instead of z = exp(-s), we get the generating function of the values of f at non-negative integers.

z-transforms are common in digital signal processing, while generating functions are common in combinatorics. They are essentially the same thing.

# Reproducible randomized controlled trials

“Reproducible” and “randomized” don’t seem to go together. If something was unpredictable the first time, shouldn’t it be unpredictable if you start over and run it again? As is often the case, we want incompatible things.

But the combination of reproducible and random can be reconciled. Why would we want a randomized controlled trial (RCT) to be random, and why would we want it to be reproducible?

One of the purposes in randomized experiments is the hope of scattering complicating factors evenly between two groups. For example, one way to test two drugs on a 1000 people would be to gather 1000 people and give the first drug to all the men and the second to all the women. But maybe a person’s sex has something to do with how the drug acts. If we randomize between two groups, it’s likely that about the same number of men and women will be in each group.

The example of sex as a factor is oversimplified because there’s reason to suspect a priori that sex might make a difference in how a drug performs. The bigger problem is that factors we can’t anticipate or control may matter, and we’d like them scattered evenly between the two treatment groups. If we knew what the factors were, we could assure that they’re evenly split between the groups. The hope is that randomization will do that for us with things we’re unaware of. For this purpose we don’t need a process that is “truly random,” whatever that means, but a process that matches our expectations of how randomness should behave. So a pseudorandom number generator (PRNG) is fine. No need, for example, to randomize using some physical source of randomness like radioactive decay.

Another purpose in randomization is for the assignments to be unpredictable. We want a physician, for example, to enroll patients on a clinical trial without knowing what treatment they will receive. Otherwise there could be a bias, presumably unconscious, against assigning patients with poor prognosis if the physicians know the next treatment be the one they hope or believe is better. Note here that the randomization only has to be unpredictable from the perspective of the people participating in and conducting the trial. The assignments could be predictable, in principle, by someone not involved in the study.

And why would you want an randomization assignments to be reproducible? One reason would be to test whether randomization software is working correctly. Another might be to satisfy a regulatory agency or some other oversight group. Still another reason might be to defend your randomization in a law suit. A physical random number generator, such as using the time down to the millisecond at which the randomization is conducted would achieve random assignments and unpredictability, but not reproducibility.

Computer algorithms for generating random numbers (technically pseudo-random numbers) can achieve reproducibility, practically random allocation, and unpredictability. The randomization outcomes are predictable, and hence reproducible, to someone with access to the random number generator and its state, but unpredictable in practice to those involved in the trial. The internal state of the random number generator has to be saved between assignments and passed back into the randomization software each time.

Random number generators such as the Mersenne Twister have good statistical properties, but they also carry a large amount of state. The random number generator described here has very small state, 64 bits, and so storing and returning the state is simple. If you needed to generate a trillion random samples, Mersenne Twitster would be preferable, but since RCTs usually have less than a trillion subjects, the RNG in the article is perfectly fine. I have run the Die Harder random number generator quality tests on this generator and it performs quite well.

Related:

Image by Ilmicrofono Oggiono, licensed under Creative Commons

# Joukowsky transformation

The Joukowsky transformation, or Joukowsky map, is a simple function that comes up in aerospace and electrical engineering calculations.

(Here z is a complex variable.) Despite its simplicity, it’s interesting to look at in detail.

## Mapping lines and circles

Let zr exp(iθ) and let w = uiv be its image. Writing the Joukowsky transformation in terms of its real and complex parts makes it easier to understand how it transforms lines and circles.

We can see how circles are mapped by holding r constant and letting θ vary. The unit circle gets crushed to the interval [-1, 1] on the real axis, traversed twice. Circles of radius ρ ≠ 1 get mapped to ellipses

where

Next we hold θ constant and let r vary. If sin θ = 0 and cos θ > 0 then the image is [1, ∞). If sin θ = 0 and cos θ < 0 then the image is (-∞, -1]. In both cases the image is traced out twice. The image of the line with cos θ = 0 is the vertical axis. Otherwise lines through the origin are mapped to hyperbolas with equation

## Inverse functions

If (z + 1/z)/2 = w then z2 -2wz + 1 = 0. The discriminant of this equation is 4(w2 – 1) and so the Joukowsky transformation is 2-to-1 unless w = ± 1, in which case z = ± 1. The product of two roots of a quadratic equation equals the constant term divided by the leading coefficient. In this case, the product is 1. This says the Joukowski transformation is 1-to-1 in any region that doesn’t contain both z and 1/z. This is the case for the interior or exterior of the unit circle, or of the upper or lower half planes. In any of those four regions, one can invert the Joukowski transformation by solving a quadratic equation and choosing the correct root.

# General birthday problem

The birthday problem, sometimes called the birthday paradox, says that it’s more likely than you’d expect that two people in a group have the same birthday. Specifically, in a random sample of 23 people, there’s about a 50-50 chance that two people share the same birthday.

The birthday problem makes a nice party trick, but generalizations of the problem come up frequently in applications. I wrote in the previous post how it comes up in seeding distributed Monte Carlo simulations. In computer science, it’s a concern in hashing algorithms.

If you have a set of N things to choose from, such as N = 365 birthdays, and take r samples, the probability that all r samples are unique is

and the probability that at least two of the samples are the same is 1 – p. (This assumes that N is at least as big as r. Otherwise the denominator is undefined, but in that case we know p is 0.)

With moderately large values of N and r the formula is likely to overflow if implemented directly. So as usual the trick is to use logarithms to avoid overflow or underflow. Here’s how you could compute the probability above in Python. SciPy doesn’t have a log factorial function, but does have a log gamma function, so we use that instead.

    from scipy import exp, log
from scipy.special import gammaln

def prob_unique(N, r):
return exp( gammaln(N+1) - gammaln(N-r+1) - r*log(N) )


# Random number generator seed mistakes

## Long run or broken software?

I got a call one time to take a look at randomization software that wasn’t randomizing. My first thought was that the software was working as designed, and that the users were just seeing a long run. Long sequences of the same assignment are more likely than you think. You might argue, for example, that the chances of flipping five heads in a row would be (1/2)5 = 1/32, but that underestimates the probability because a run could start at any time. The chances that the first five flips are heads would indeed be 1/32. But the probability of seeing five heads in a row any time during a series of flips is higher.

Most of the times that I’ve been asked to look at randomization software that “couldn’t be right,” the software was fine. But in this case, there wasn’t simply a long run of random results that happened to be equal. The results were truly constant. At least for some users. Some users would get different results from time to time, but others would get the same result every single time.

The problem turned out to be how the software set the seed in its random number generator. When the program started up it asked the user “Enter a number.” No indication of what kind of number or for what purpose. This number, unbeknownst to the user, was being used as the random number generator seed. Some users would enter the same number every time, and get the same randomization result, every time. Others would use more whimsy in selecting numbers and get varied output.

How do you seed a random number generator in a situation like this? A better solution would be to seed the generator with the current time, though that has drawbacks too. I write about that in another post.

## Seeding many independent processes

A more subtle problem I’ve seen with random number generator seeding is spawning multiple processes that each generate random numbers. In a well-intentioned attempt to give each process a unique seed, the developers ended up virtually assuring that many of the processes would have exactly the same seed.

If you parallelize a huge simulation by spreading out multiple copies, but two of the processes use the same seed, then their results will be identical. Throwing out the redundant simulation would reduce your number of samples, but not noticing and keeping the redundant output would be worse because it would cause you to underestimate the amount of variation.

To avoid duplicate seeds, the developers used a random number generator to assign the RNG seeds for each process. Sounds reasonable. Randomly assigned RNG seeds give you even more random goodness. Except they don’t.

The developers had run into a variation on the famous birthday problem. In a room of 23 people, there’s a 50% chance that two people share the same birthday. And with 50 people, the chances go up to 97%. It’s not certain that two people will have the same birthday until you have 367 people in the room, but the chances approach 1 faster than you might think.

Applying the analog of the birthday problem to the RNG seeds explains why the project was launching processes with the same seed. Suppose you seed each process with an unsigned 16-bit integer. That means there are 65,536 possible seeds. Now suppose you launch 1,000 processes. With 65 times as many possible seeds as processes, surely every process should get its own seed, right? Not at all. There’s a 99.95% chance that two processes will have the same seed.

In this case it would have been better to seed each process with sequential seeds: give the first process seed 1, the second seed 2, etc. The seeds don’t have to be random; they just have to be unique. If you’re using a good random number generator, the outputs of 1,000 processes seeded with 1, 2, 3, …, 1000 will be independent.