Function on cover of Abramowitz & Stegun

Someone mailed me this afternoon asking if I knew what function was graphed on the cover of Abramowitz and Stegun’s famous Handbook of Mathematical Functions.

Here’s a close-up of the graph from a photo of my copy of A&S.

This was my reply.

It looks like a complex function of a complex variable. I assume the height is the magnitude and the markings on the graph are the phase. That would make it an elliptic function because it’s periodic in two directions.

It has one pole and one zero in each period. I think elliptic functions are determined, up to a constant, by their periods, zeros, and poles, so it should be possible to identify the function.

In fact, I expect it’s the Weierstrass p function. More properly, the Weierstrass ℘ function, sometimes called Weierstass’ elliptic function. (Some readers will have a font installed that will properly render ℘ and some not. More on the symbol ℘ here.)

Related posts:

Bessel series for a constant

Fourier series express functions as a sum of sines and cosines of different frequencies. Bessel series are analogous, expressing functions as a sum of Bessel functions of different orders.

Fourier series arise naturally when working in rectangular coordinates. Bessel series arise naturally when working in polar coordinates.

The Fourier series for a constant is trivial. You can think of a constant as a cosine with frequency zero.

The Bessel series for a constant is not as simple, but more interesting. Here we have

1 = J_0(x) + 2J_2(x) + 2J_4(x) + 2J_6(x) + \cdots

Since

J_{-n}(x) = (-1)^n J_n(x)

we can write the series above more symmetrically as

1 = \sum_{n=-\infty}^\infty J_{2n}(x)

Related posts:

Approximate inverse of the gamma function

The other day I ran across a blog post by Brian Hayes that linked to an article by David Cantrell on how to compute the inverse of the gamma function. Cantrell gives an approximation in terms of the Lambert W function.

In this post we’ll write a little Python code to kick the tires on Cantrell’s approximation. The post also illustrates how to do some common tasks using SciPy and matplotlib.

Here are the imports we’ll need.

      import matplotlib.pyplot as plt
      from scipy import pi, e, sqrt, log, linspace
      from scipy.special import lambertw, gamma, psi
      from scipy.optimize import root

First of all, the gamma function has a local minimum k somewhere between 1 and 2, and so it only makes sense to speak of its inverse to the left or right of this point. Gamma is strictly increasing for real values larger than k.

To find k we look for where the derivative of gamma is zero. It’s more common to work with the derivative of the logarithm of the gamma function than the derivative of the gamma function itself. That works just as well because gamma has a minimum where its log has a minimum. The derivative of the log of the gamma function is called ψ and is implemented in SciPy as scipy.special.psi. We use the function scipy.optimize.root to find where ψ is zero.

The root function returns more information than just the root we’re after. The root(s) are returned in the arrayx, and in our case there’s only one root, so we take the first element of the array:

      k = root(psi, 1.46).x[0]

Now here is Cantrell’s algorithm:

      c = sqrt(2*pi)/e - gamma(k)
      
      def L(x):
          return log((x+c)/sqrt(2*pi))
      
      def W(x):
          return lambertw(x)
      
      def AIG(x):
          return L(x) / W( L(x) / e) + 0.5

Cantrell uses AIG for Approximate Inverse Gamma.

How well goes this algorithm work? For starters, we’ll see how well it does when we do a round trip, following the exact gamma with the approximate inverse.

      x = linspace(5, 30, 100)
      plt.plot(x, AIG(gamma(x)))
      plt.show()

This produces the following plot:

We get a straight line, as we should, so next we do a more demanding test. We’ll look at the absolute error in the approximate inverse. We’ll use a log scale on the x-axis since gamma values get large quickly.

      y = gamma(x)
      plt.plot(y, x- AIG(y))
      plt.xscale("log")
      plt.show()

This shows the approximation error is small, and gets smaller as its argument increases.

Cantrell’s algorithm is based on an asymptotic approximation, so it’s not surprising that it improves for large arguments.

Related posts:

Mittag-Leffler function and probability distribution

The Mittag-Leffler function is a generalization of the exponential function. Since k!= Γ(k + 1), we can write the exponential function’s power series as

\exp(x) = \sum_{k=0}^\infty \frac{x^k}{\Gamma(k+1)}

and we can generalize this to the Mittag=Leffler function

E_{\alpha, \beta}(x) = \sum_{k=0}^\infty \frac{x^k}{\Gamma(\alpha k+\beta)}

which reduces to the exponential function when α = β = 1. There are a few other values of α and β for which the Mittag-Leffler function reduces to more familiar functions. For example,

E_{2,1}(x) = \cosh(\sqrt{x})

and

E_{1/2, 1}(x) = \exp(x^2) \, \mbox{erfc}(-x)

where erfc(x) is the complementary error function.

History

Mittag-Leffler was one person, not two. When I first saw the Mittag-Leffler theorem in complex analysis, I assumed it was named after two people, Mittag and Leffler. But the theorem and the function discussed here are named after one man, the Swedish mathematician Magnus Gustaf (Gösta) Mittag-Leffler (1846–1927).

The function that Mr. Mittag-Leffler originally introduced did not have a β parameter; that generalization came later. The function Eα is Eα, 1.

Mittag-Leffler probability distributions

Just as you can make a couple probability distributions out of the exponential function, you can make a couple probability distributions out of the Mittag-Leffler function.

Continuous Mittag-Leffler distribution

The exponential function exp(-x) is positive over [0, ∞) and integrates to 1, so we can define a probability distribution whose density (PDF) function is f(x) = exp(-x) and whose distribution function (CDF) is F(x) = 1 – exp(-x). The Mittag-Leffler distribution has CDF is 1 – Eα(-xα) and so reduces to the exponential distribution when α = 1. For 0 < α < 1, the Mittag-Leffler distribution is a heavy-tailed generalization of the exponential. [1]

Discrete Mittag-Leffler distribution

The Poisson distribution comes from taking the power series for exp(λ), normalizing it to 1, and using the kth term as the probability mass for k. That is,

P(X = k) = \frac{1}{\exp(\lambda)} \frac{\lambda^k}{k!}

The analogous discrete Mittag-Leffler distribution [2] has probability mass function

P(X = k) = \frac{1}{E_{\alpha, \beta}(\lambda)} \frac{\lambda^k}{\Gamma(\alpha k + \beta)}.

Fractional differential equations

In addition to probability and statistics, the the Mittag-Leffler function comes up in fractional calculus. It plays a role analogous to that of the exponential distribution in classical calculus. Just as the solution to the simply differential equation

\frac{d}{dx} f(x) = a\, f(x)

is exp(ax), for 0 < μ < 1, the soltuion to the fractional differential equation

\frac{d^\mu}{dx^\mu} f(x) = a\, f(x)

is axμ-1 Eμ, μ(a xμ). Note that this reduces to exp(ax) when μ = 1. [3]

References

[1] Gwo Dong Lin. Journal of Statistical Planning and Inference 74 (1998) 1–9, On the Mittag–Leffler distributions

[2] Subrata Chakraborty, S. H. Ong. Mittag-Leffler function distribution: A new generalization of hyper-Poisson distribution. arXiv:1411.0980v1

[3] Keith Oldham, Jan Myland, Jerome Spanier. An Atlas of Functions. Springer.

Cornu’s spiral

Cornu’s spiral is the curve parameterized by

x(t) = C(t) = \int_0^t \cos \left( \frac{\pi}{2} s \right) \, ds \\ y(t) = S(t) = \int_0^t \sin \left( \frac{\pi}{2} s \right) \, ds

where C and S are the Fresnel functions, sometimes called the Fresnel cosine integral and Fresnel sine integral. Here’s a plot of the spiral.

Cornu's spiral

Both Fresnel functions approach ½ as t → ∞ and so the curve slowly spirals toward (½, ½) in the first quadrant. And by symmetry, because both functions are odd, the curve spirals toward (-½, -½) in the third quadrant.

Here’s the Python code used to make the plot.

    from scipy.special import fresnel
    from scipy import linspace
    import matplotlib.pyplot as plt

    t = linspace(-7, 7, 1000)
    y, x = fresnel(t)

    plt.plot(x, y)
    plt.axes().set_aspect("equal")
    plt.show()

The SciPy function fresnel returns both Fresnel functions at the same time. It returns them in the order (S, C) so the code reverses the order of these to match the Cornu curve.

One interesting feature of Cornu’s spiral is that its curvature increases linearly with time. This is easy to verify: because of the fundamental theorem of calculus, the Fresnel functions reduce to sines and cosines when you take derivatives, and you can show that the curvature at time t equals πt.

How fast does the curve spiral toward (½, ½)? Since the curvature at time t is πt, that says that at time t the curve is instantaneously bending like a circle of radius 1/πt. So the radius of the spiral is decreasing like 1/πt.

Cornu’s spiral was actually discovered by Euler. Cornu was an engineer who independently discovered the curve much later. Perhaps because Cornu used the curve in applications, his name is more commonly associated with the curve. At least I’ve more often seen it named after Cornu. This is an example of Stigler’s law that things are usually not named after the first person to discover them.

* * *

For daily posts on analysis, follow @AnalysisFact on Twitter.

AnalysisFact twitter icon

Energy in frequency modulated signals

In an earlier post we proved that if you modulate a cosine carrier by a sine signal you get a signal whose sideband amplitudes are given by Bessel functions. Specifically:

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) ) = \sum_{k=-\infty}^\infty J_n(\beta) \cos(2\pi(f_c + nf_m)t)

When β = 0, we have the unmodulated carrier, cos(2π fct), on both sides. When β is positive but small, J0(β) is near 1, and so the frequency component corresponding to the carrier is only slightly diminished. Also, the sideband amplitudes, the values of Jn(β) for n ≠ 0, are small and decay rapidly as |n| increases. As β increases, the amplitude of the carrier component decreases, the sideband amplitudes increase, and the sidebands decay more slowly.

We can be much more precise: the energy in the modulated signal is the same as the energy in the unmodulated signal. As β increases, more enery transfers to the sidebands, but the total energy stays the same. This conservation of energy result applies to more complex signals than just pure sine waves, but it’s easier to demonstrate in the case of a simple signal.

Proof

To prove the energy stays constant, we show that the sum of the squares of the coefficients of the cosine components is the same for the modulated and unmodulated signal.The unmodulated signal is just cos(2π fct), and so the only coefficient is 1. That means we have to prove

 \sum_{n=-\infty}^\infty J_n(\beta)^2 = 1

This is a well-known result. For example, it is equation 9.1.76 in Abramowitz and Stegun. We’ll show how to prove it from first principles. We’ll actually prove a more general result, the Newmann-Schläffi addition formula, then show our result follows easily from that.

Newmann-Schläffi addition formula

Whittaker and Watson define the Bessel functions by their generating function:

\exp\left(\frac{z}{2}\left(t - \frac{1}{t}\right)\right) = \sum_{n=-\infty}^\infty t^n J_n(z)

This means that when you expand the expression on the left as a power series in t, whatever is multiplied by tn is Jn(z) by definition. (There are other ways of defining the Bessel functions, but this way leads quickly to what we want to prove.)

We begin by factoring the Bessel generating function applied to zw.

\exp\left(\frac{z+w}{2}\left(t - \frac{1}{t}\right)\right) = \exp\left(\frac{z}{2}\left(t - \frac{1}{t}\right)\right) \exp\left(\frac{w}{2}\left(t - \frac{1}{t}\right)\right)

Next we expand both sides as power series.

\sum_{n=-\infty}^\infty t^n J_n(z+w) = \sum_{j=-\infty}^\infty t^j J_j(z) \sum_{k=-\infty}^\infty t^k J_k(w)

and look at the terms involving tn on both sides. On the left this is Jn(zw). On the right, we multiply two power series. We will get a term containing tn whenever we multiply terms tj and tk where j and k sum to n.

 J_n(z+w) = \sum_{j+k = n} J_j(z) J_k(w) = \sum_{m=-\infty}^\infty J_m(z) J_{n-m} J(w)

The equation above is the Newmann-Schläffi addition formula.

Sum of squared coefficients

To prove that the sum of the squared sideband coefficients is 1,  we apply the addition formula with n = 0, z = β, and w = -β.

1 = J_0(\beta - \beta) = \sum_{m=-\infty}^\infty J_m(\beta) J_{-m}(-\beta) = \sum_{m=-\infty}^\infty J_m(\beta)^2

This proves what we were after:

 \sum_{n=-\infty}^\infty J_n(\beta)^2 = 1

We used a couple facts in the last step that we haven’t discussed. The first was that J0(0) = 1. This follows from the generating function by setting z to 0 and taking the limit as t → 0. The second was that Jm(-β) = Jm(β). You can also see this from the generating function since negating z has the same effect as swapping t and 1/t.

Click to learn more about consulting help with signal processing

Related posts

Analyzing an FM signal

Frequency modulation combines a signal with a carrier wave by changing (modulating) the carrier wave’s frequency.

Starting with a cosine carrier wave with frequency fc Hz and adding a signal with amplitude β and frequency fm Hz results in the combination

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) )

The factor β is known as the modulation index.

We’d like to understand this signal in terms of cosines without any frequency modulation. It turns out the result is a set of cosines weighted by Bessel functions of β.

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) ) = \sum_{k=-\infty}^\infty J_n(\beta) \cos(2\pi(f_c + nf_m)t)

Component amplitudes

We will prove the equation above, but first we’ll discuss what it means for the amplitudes of the cosine components.

For small values of β, Bessel functions decay quickly, which means the first cosine component will be dominant. For larger values of β, the Bessel function values increase to a maximum then decay like one over the square root of the index. To see this we compare the coefficients for modulation index β = 0.5 and β = 5.0.

First, β = 0.5:

and now for β = 5.0:

For fixed β and large n we have

J_n(\beta) \approx \frac{\beta^n}{2^n \, n!}

and so the sideband amplitudes eventually decay very quickly.

Update: See this post for what the equation above says about energy moving from the carrier to sidebands.

Proof

To prove the equation above, we need three basic trig identities

\cos(A + B) &=& \cos A \cos B - \sin A \sin B \\ 2\cos A \cos B &=& \cos(A-B) + \cos(A+B) \\ 2\sin A \sin B &=& \cos(A-B) + \cos(A-B)

and a three Bessel function identities

\cos( z \sin \theta) &=& J_0(z) + 2\sum{k=1}^\infty J_{k}(z) \cos(2k\theta) \\ \sin( z \sin \theta) &=& 2\sum{k=1}^\infty J_{2k+1}(z) \cos((2k+1)\theta) \\ J_{-n}(z) &=& (-1)^n J_n(z)

The Bessel function identities above can be found in Abramowitz and Stegun as equations 9.1.42, 9.1.43, and 9.1.5.

And now the proof. We start with

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) )

and apply the sum identity for cosines to get

\cos(2\pi f_c t) \cos(\beta \sin(2\pi f_m t)) - \sin(2\pi f_c t) \sin(\beta \sin(2\pi f_m t))

Now let’s take the first term

 \cos(2\pi f_c t) \cos(\beta \sin(2\pi f_m t))

and apply one of our Bessel identities to expand it to

J_0(\beta) \cos(2\pi f_c t) + \sum_{k=1}^\infty J_{2k}(\beta) \left\{ \cos(2\pi (f_c - 2k f_m)t) + \cos(2\pi(f_c + 2k f_m)t) \right\}

which can be simplified to

\sum_{n \,\, \mathrm{even}} J_n(\beta) \cos(2\pi(f_c + nf_m)t)

where the sum runs over all even integers, positive and negative.

Now we do the same with the second half of the cosine sum. We expand

\sum_{n \,\, \mathrm{even}} J_n(\beta) \cos(2\pi(f_c + nf_m)t)

to

\sum_{k=1}^\infty J_{2k+1}(\beta) \left\{ \cos(2\pi (f_c - (2k+1) f_m)t) - \cos(2\pi(f_c + (2k+1) f_m)t) \right\}

which simplifies to

\sum_{k=1}^\infty J_{2k+1}(\beta) \left\{ \cos(2\pi (f_c - (2k+1) f_m)t) - \cos(2\pi(f_c + (2k+1) f_m)t) \right\}

where again the sum is over all (odd this time) integers. Combining the two halves gives our result

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) ) = \sum_{k=-\infty}^\infty J_n(\beta) \cos(2\pi(f_c + nf_m)t)

Click to learn more about consulting help with signal processing

Related post: Visualizing Bessel functions

Connection between hypergeometric distribution and series

What’s the connection between the hypergeometric distributions, hypergeometric functions, and hypergeometric series?

The hypergeometric distribution is a probability distribution with parameters NM, and n. Suppose you have an urn containing N balls, M red and the rest, N – M blue and you select n balls at a time. The hypergeometric distribution gives the probability of selecting k red balls.

The probability generating function for a discrete distribution is the series formed by summing over the probability of an outcome k and xk. So the probability generating function for a hypergeometric distribution is given by

f(x) = \sum \frac{{M\choose k}{N-M \choose n-k}}{{N \choose n}} x^k

The summation is over all integers, but the terms are only non-zero for k between 0 and M inclusive. (This may be more general than the definition of binomial coefficients you’ve seen before. If so, see these notes on the general definition of binomial coefficients.)

It turns out that f is a hypergeometric function of x because it is can be written as a hypergeometric series. (Strictly speaking,  f is a constant multiple of a hypergeometric function. More on that in a moment.)

A hypergeometric function is defined by a pattern in its power series coefficients. The hypergeometric function F(a, bcx) has a the power series

F(a, b; c) = \frac{(a)_k (b)_k}{(c)_k} \frac{x^k}{k!}

where (n)k is the kth rising power of n. It’s a sort of opposite of factorial. Start with n and multiply consecutive increasing integers for k terms. (n)0 is an empty product, so it is 1. (n)1 = n, (n)2 = n(n+1), etc.

If the ratio of the k+1st term to the kth term in a power series is a polynomial in k, then the series is a (multiple of) a hypergeometric series, and you can read the parameters of the hypergeometric series off the polynomial. This ratio for our probability generating function works out to be

\frac{P(X=k+1)}{P(X=k)} = \frac{(k-M)(k-n)}{(k+N-M-n+1)(k+1)}

and so the corresponding hypergeometric function is F(-M, –nN – M – n + 1; x). The constant term of a hypergeometric function is always 1, so evaluating our probability generating function at 0 tells us what the constant is multiplying F(-M, –nN – M – n + 1; x). Now

f(0) = P(X = 0) = \frac{{N-M \choose n}}{{N \choose n}}

and so

f(x) = \frac{{N-M \choose n}}{{N \choose n}} F(-M, -n; N - M - n + 1; x)

The hypergeometric series above gives the original hypergeometric function as defined by Gauss, and may be the most common form in application. But the definition has been extended to have any number of rising powers in the numerator and denominator of the coefficients. The classical hypergeometric function of Gauss is denoted 2F1 because it has two falling powers on top and one on bottom. In general, the hypergeometric function pFq has p rising powers in the denominator and q rising powers in the denominator.

The CDF of a hypergeometric distribution turns out to be a more general hypergeometric function:

P(X \leq k) = 1 - \frac{{n\choose k+1}{N-n\choose M-k-1}}{{N\choose M}} \phantom{ }_3F_2(1, k+1-M, k+1-n; k+2, N+k+2-M-n; 1)

where a = 1, bk+1-M, ck+1-n, d = k+2, and eN+k+2-Mn.

Thanks to Jan Galkowski for suggesting this topic via a comment on an earlier post, Hypergeometric bootstrapping.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

Dilogarithm, polylogarithm, and related functions

The functions dilogarithm, trilogarithm, and more generally polylogarithm are meant to be generalizations of the logarithm. I first came across the dilogarithm in college when I was evaluating some integral with Mathematica, and they’ve paid a visit occasionally ever since.

Unfortunately polylogarithms are defined in several slightly different and incompatible ways. I’ll start by following An Atlas of Functions and then mention differences in A&S, SciPy, and Mathematica.

Polylogarithms

According to Atlas,

Polylogarithms are themselves special cases of Lerch’s function. Also known as Jonquière’s functions (Ernest Jean Philippe Fauque de Jonquières, 1820–1901, French naval officer and mathematician), they appear in the Feynman diagrams of particle physics.

The idea is to introduce an extra parameter ν in the power series for natural log:

\mbox{polyln}_\nu(x) = -\sum_{j=1}^\infty \frac{(1-x)^j}{j^\nu}

When ν = 1 we get the ordinary logarithm, i.e. polyln1 = log. Then polyln2 is the dilogarithm diln, and polyln3 is the trilogarithm triln.

One advantage of the definition in Atlas is that the logarithm is a special case of the polylogarithm. Other conventions don’t have this property.

Other conventions

The venerable A&S defines dilogarithms in a way that’s equivalent to the negative of the definition above and does not define polylogarithms of any other order. SciPy’s special function library follows A&S. SciPy uses the name spence for the dilogarithm for reasons we’ll get to shortly.

Mathematica has the function PolyLog[ν, x] that evaluates to

\mbox{Li}_\nu(x) = \sum_{j=1}^\infty \frac{x^j}{j^\nu}

So polylnν above corresponds to -PolyLog[ν, -x] in Mathematica. Matlab’s polylog is the same as Mathematica’s PolyLog.

Relation to other functions

Spence’s integral is the function of x given by the integral

\int_1^x \frac{\log t}{t-1}\, dt

and equals diln(x). Note that the SciPy function spence returns the negative of the integral above.

The Lerch function mentioned above is named for Mathias Lerch (1860–1922) and is defined by the integral

\Phi(x, \nu, u) = \frac{1}{\Gamma(\nu)} \int_0^\infty \frac{t^{\nu-1} \exp(-ut)}{1 - x\exp(-t)} \, dt

The connection with polylogarithms is easier to see from the series expansion:

\Phi(x, \nu, u) = \sum_{j=0}^\infty \frac{x^j}{(j+u)^\nu}

The connection with polylogarithms is then

\Phi(x, \nu,1) = -\frac{1}{x} \mbox{polyln}_\nu(1-x)

Note that the Lerch function also generalizes the Hurwitz zeta function, which in turn generalizes the Riemann zeta function. When x = 1, the Lerch function reduces to ζ(ν, u).

Related: Applied complex analysis

Hypergeometric bootstrapping: implement one, get seven free

Suppose you have software to compute one hypergeometric function. How many more hypergeometric functions can you compute from it?

Hypergeometric functions satisfy a lot of identities, so you can bootstrap one such function into many more. That’s one reason they’re so useful in applications. For this post, I want to focus on just three formulas, the so-called linear transformation formulas that relate one hypergeometric function to another. (There are more linear formulas that relate one hypergeometric function to a combination of two others. I may consider those in a future post, but not today.)

Linear transformations

The classical hypergeometric function has three parameters. The linear transformations discussed above relate the function with parameters (abc) to those with parameters

  • (c – ac – bc)
  • (ac – bc)
  • (bc – ac)

Here are the linear transformations in detail:

\begin{eqnarray*} F(a, b; c; z) &=& (1-z)^{c -a-b} F(c-a, c-b; c; z) \\ &=& (1-z)^{-a} F\left(a, c-b; c; \frac{z}{z-1}\right) \\ &=& (1-z)^{-b} F\left(b, c-a; c; \frac{z}{z-1}\right) \end{eqnarray*}

How many more?

The three transformations above are the ones listed in Handbook of Mathematical Functions by Abramowitz and Stegun. How many more transformations can we create by combining them? To answer this, we represent each of the transformations with a matrix. The transformations correspond to multiplying the matrix by the column vector  (abc).

A &=& \left( \begin{array}{rrr} -1 & 0 & \phantom{-}1 \\ 0 & -1 & 1 \\ 0 & 0 & 1 \end{array} \right) \\ B &=& \left( \begin{array}{rrr} \phantom{-}1 & 0 & 0 \\ 0 & -1 & \phantom{-}1 \\ 0 & 0 & 1 \end{array} \right) \\ C &=& \left( \begin{array}{rrr} 0 & \phantom{-}1 & 0 \\ -1 & 0 & \phantom{-}1 \\ 0 & 0 & 1 \end{array} \right)

The question of how many transformations we can come up with can be recast as asking what is the order of the group generated by AB, and C. (I’m jumping ahead a little by presuming these matrices generate a group. Conceivably the don’t have inverses, but in fact they do. The inverses of AB, and C are AB, and AC respectively.)

A little experimentation reveals that there are eight transformations: I, ABCDABEACFBC, and GCB. So in addition to the identity I, the do-nothing transformation, we have found four more: DEF, and G. These last four correspond to taking  (abc) to

  • (c – abc)
  • (ac – bc)
  • (bac)
  • (c – bc – ac)

This means that if we have software to compute the hypergeometric function with one fixed set of parameters, we can bootstrap that into computing the hypergeometric function with up to seven more sets of parameters. (Some of the seven new combinations could be repetitive if, for example, ab.)

Identifying the group

We have uncovered a group of order 8, and there are only 5 groups that size. We should be able to find a familiar group isomorphic to our group of transformations.

The five groups of order 8 are Z8 (cyclic group), Z4×Z2 (direct product), E8 (elementary Abelian group), D8 (dihedral group), and the quaternion group. The first three are Abelian, and our group is not, so our group must be isomorphic to either the quaternion group or the dihedral group. The quaternion group has only 1 element of order 2, and the dihedral group has 5 elements of order 2. Our group has five elements of order 2 (ABDFG) and so it must be isomorphic to the dihedral group of order 8, the rotations and reflections of a square. (Some authors call this D8, because the group has eight elements. Others call it D4, because it is the group based on a 4-sided figure.)

Dihedral group of order 8

Related: Applied complex analysis

Special function resources

This week’s resource post: some pages of notes on special functions:

See also blog posts tagged special functions.

I tweet about special functions occasionally on AnalysisFact

Last week: HTML, LaTeX, and Unicode

Next week: Python resources

For daily posts on analysis, follow @AnalysisFact on Twitter.

AnalysisFact twitter icon

Uses for orthogonal polynomials

When I interviewed Daniel Spielman at this year’s Heidelberg Laureate Forum, we began our conversation by looking for common mathematical ground. The first thing that came up was orthogonal polynomials. (If you’re wondering what it means for two polynomials to be orthogonal, see here.)

JC: Orthogonal polynomials are kind of a lost art, a topic that was common knowledge among mathematicians maybe 50 or 100 years ago and now they’re obscure.

DS: The first course I taught I spent a few lectures on orthogonal polynomials because they kept coming up as the solutions to problems in different areas that I cared about. Chebyshev polynomials come up in understanding solving systems of linear equations, such as if you want to understand how the conjugate gradient method behaves. The analysis of error correcting codes and sphere packing has a lot of orthogonal polynomials in it. They came up in a course in multi-linear algebra I had in grad school. And they come up in matching polynomials of graphs, which is something people don’t study much anymore. … They’re coming back. They come up a lot in random matrix theory. … There are certain things that come up again and again and again so you got to know what they are.

* * *

More from my interview with Daniel Spielman:

For daily posts on analysis, follow @AnalysisFact on Twitter.

AnalysisFact twitter icon

 

Multiple zeta

The Riemann zeta function, introduced by Leonard Euler, is defined by

\zeta(k) = \sum n^{-k}

where the sum is over all positive integers n.

Euler also introduced a multivariate generalization of the zeta function

\zeta(k_1, \ldots, k_r) = \sum n_1^{-k_1}\cdots n_r^{-k_r}

where the sum is over all decreasing k-tuples of positive integers. This generalized zeta function satisfies the following beautiful identity:

 \zeta(a)\,\zeta(b) = \zeta(a, b) + \zeta(b, a) + \zeta(a+b)

The multivariate zeta function and identities such as the one above are important in number theory and are the subject of open conjectures.

Source

Relating Airy and Bessel functions

The Airy functions Ai(x) and Bi(x) are independent solutions to the differential equation

y'' - xy = 0

For negative x they act something like sin(x) and cos(x). For positive x they act something like exp(x) and exp(-x). This isn’t surprising if you look at the differential equation. If you replace x with a negative constant, you get sines and cosines, and if you replace it with a positive constant, you get positive and negative exponentials.

The Airy functions can be related to Bessel functions as follows:

\mathrm{Ai}(x) = \left\{ \begin{array}{ll} \frac{1}{3}\sqrt{\phantom{-}x} \left(I_{-1/3}(\hat{x}) - I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br /><br /><br /><br /> \\<br /><br /><br /><br /> \frac{1}{3}\sqrt{-x} \left(J_{-1/3}(\hat{x}) + J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 \end{array} \right.

and

\mathrm{Bi}(x) = \left\{ \begin{array}{ll} \sqrt{\phantom{-}x/3} \left(I_{-1/3}(\hat{x}) + I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br /> \\<br /> \sqrt{-x/3} \left(J_{-1/3}(\hat{x}) - J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 \end{array} \right.

Here J is a “Bessel function of the first kind” and I is a “modified Bessel function of the first kind.” Also

\hat{x} = \frac{2}{3} \left(\sqrt{|x|}\right)^3

To verify the equations above, and to show how to compute these functions in Python, here’s some code.

The SciPy function airy computes both functions, and their first derivatives, at once. I assume that’s because it doesn’t take much longer to compute all four functions than to compute one. The code for Ai2 and Bi2 below uses np.where instead of if … else so that it can operate on NumPy vectors all at once. You can plot Ai and Ai2 and see that the two curves lie on top of each other. The same holds for Bi and Bi2.

 

from scipy.special import airy, jv, iv
from numpy import sqrt, where

def Ai(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return ai

def Bi(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return bi

def Ai2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        third*sqrt( x)*(iv(-third, hatx) - iv(third, hatx)),
        third*sqrt(-x)*(jv(-third, hatx) + jv(third, hatx)))

def Bi2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        sqrt( x/3.0)*(iv(-third, hatx) + iv(third, hatx)),
        sqrt(-x/3.0)*(jv(-third, hatx) - jv(third, hatx)))

 

There is a problem with Ai2 and Bi2: they return nan at 0. A more careful implementation would avoid this problem, but that’s not necessary since these functions are only for illustration. In practice, you’d simply use airy and it does the right thing at 0.

Related links:

Fibonacci numbers and orthogonal polynomials

The famous Fibonacci numbers are defined by the initial conditions

F0 = 0,  F1 = 1

and the recurrence relation

Fn = Fn-1 + Fn-2

for n > 1.

The Fibonacci polynomials are defined similarly. The have the same initial conditions

F0(x) = 0, F1(x) = 1

but a slightly different recurrence relation

Fn(x) = x Fn-1(x) + Fn-2(x)

for n > 1.

Several families of orthogonal polynomials satisfy a similar recurrence relationship

Fn(x) = p(x) Fn-1(x) + c(n) Fn-2(x).

The table below gives the values of p(x) and c(n) for a few families of polynomials.

Familyp(x)c(n)P0P1(x)
Fibonaccix101
Chebyshev T2x-11x
Chebyshev U2x-112x
Hermite2x2 – 2n12x

There are two common definitions of the Hermite polynomials, sometimes called the physicist convention and the probabilist convention. The table above gives the physicist convention. For the probabilist definition, change the 2’s to 1’s in the last row and leave the 1 alone.

(If you haven’t heard of orthogonal polynomials, you might be wondering “orthogonal to what?”. These polynomials are orthogonal in the sense of an inner product formed by multiplying two polynomials and a weight, then integrating. Orthogonal polynomials differ by the the weight and the limits of integration. For example, the (physicist) Hermite polynomials are orthogonal with weight function exp(-x2) and integrating over the entire real line. The probabilist Hermite polynomials are very similar, but use the standard normal distribution density exp(-x2/2)/√(2π) as the weight instead of exp(-x2).)

* * *

For daily posts on analysis, follow @AnalysisFact on Twitter.

AnalysisFact twitter icon