Python resources

Each Wednesday I post a list of some of the resources on this site. This week: Python notes.

See also blog posts tagged Python, SciPy, and SymPy.

Last week: Special functions

Next week: Probability resources

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

Ellipsoid surface area

How much difference does the earth’s equatorial bulge make in its surface area?

To first approximation, the earth is a sphere. The next step in sophistication is to model the earth as an ellipsoid.

The surface area of an ellipsoid with semi-axes abc is

A = 2\pi \left( c^2 + \frac{ab}{\sin\phi} \left( E(\phi, k) \sin^2\phi + F(\phi, k) \cos^2 \phi\right)\right)




m = k^2 = \frac{a^2(b^2 - c^2)}{b^2(a^2 - c^2)}

The functions E and F are incomplete elliptic integrals

 F(\phi, k) = \int_0^\phi \frac{d\theta}{\sqrt{1 - k^2 \sin^2\theta}}


E(\phi, k) = \int_0^\phi \sqrt{1 - k^2 \sin^2\theta}\,d\theta

implemented in SciPy as ellipeinc and ellipkinc. Note that the SciPy functions take m as their second argument rather its square root k.

For the earth, a = b and so m = 1.

The following Python code computes the ratio of earth’s surface area as an ellipsoid to its area as a sphere.

from scipy import pi, sin, cos, arccos
from scipy.special import ellipkinc, ellipeinc

# values in meters based on GRS 80
equatorial_radius = 6378137
polar_radius = 6356752.314140347

a = b = equatorial_radius
c = polar_radius

phi = arccos(c/a)
# in general, m = (a**2 * (b**2 - c**2)) / (b**2 * (a**2 - c**2))
m = 1

temp = ellipeinc(phi, m)*sin(phi)**2 + ellipkinc(phi, m)*cos(phi)**2
ellipsoid_area = 2*pi*(c**2 + a*b*temp/sin(phi))

# sphere with radius equal to average of polar and equatorial
r = 0.5*(a+c)
sphere_area = 4*pi*r**2


This shows that the ellipsoid model leads to 0.112% more surface area relative to a sphere.

Source: See equation 19.33.2 here.

Update: It was suggested in the comments that it would be better to compare the ellipsoid area to that of a sphere of the same volume. So instead of using the average of the polar and equatorial radii, one would take the geometric mean of the polar radius and two copies of the equatorial radius. Using that radius, the ellipsoid has 0.0002% more area than the sphere.

For daily posts on analysis, follow @AnalysisFact on Twitter.

AnalysisFact twitter icon

Relating Airy and Bessel functions

The Airy functions Ai(x) and Bi(x) are independent solutions to the differential equation

y'' - xy = 0

For negative x they act something like sin(x) and cos(x). For positive x they act something like exp(x) and exp(-x). This isn’t surprising if you look at the differential equation. If you replace x with a negative constant, you get sines and cosines, and if you replace it with a positive constant, you get positive and negative exponentials.

The Airy functions can be related to Bessel functions as follows:

\mathrm{Ai}(x) = \left\{ \begin{array}{ll} \frac{1}{3}\sqrt{\phantom{-}x} \left(I_{-1/3}(\hat{x}) - I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br /><br /><br /><br /> \\<br /><br /><br /><br /> \frac{1}{3}\sqrt{-x} \left(J_{-1/3}(\hat{x}) + J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 \end{array} \right.


\mathrm{Bi}(x) = \left\{ \begin{array}{ll} \sqrt{\phantom{-}x/3} \left(I_{-1/3}(\hat{x}) + I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br /> \\<br /> \sqrt{-x/3} \left(J_{-1/3}(\hat{x}) - J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 \end{array} \right.

Here J is a “Bessel function of the first kind” and I is a “modified Bessel function of the first kind.” Also

\hat{x} = \frac{2}{3} \left(\sqrt{|x|}\right)^3

To verify the equations above, and to show how to compute these functions in Python, here’s some code.

The SciPy function airy computes both functions, and their first derivatives, at once. I assume that’s because it doesn’t take much longer to compute all four functions than to compute one. The code for Ai2 and Bi2 below uses np.where instead of if … else so that it can operate on NumPy vectors all at once. You can plot Ai and Ai2 and see that the two curves lie on top of each other. The same holds for Bi and Bi2.


from scipy.special import airy, jv, iv
from numpy import sqrt, where

def Ai(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return ai

def Bi(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return bi

def Ai2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        third*sqrt( x)*(iv(-third, hatx) - iv(third, hatx)),
        third*sqrt(-x)*(jv(-third, hatx) + jv(third, hatx)))

def Bi2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        sqrt( x/3.0)*(iv(-third, hatx) + iv(third, hatx)),
        sqrt(-x/3.0)*(jv(-third, hatx) - jv(third, hatx)))


There is a problem with Ai2 and Bi2: they return nan at 0. A more careful implementation would avoid this problem, but that’s not necessary since these functions are only for illustration. In practice, you’d simply use airy and it does the right thing at 0.

Related links:

Ramanujan approximation for circumference of an ellipse

There’s no elementary formula for the circumference of an ellipse, but there is an elementary approximation that is extremely accurate.

An ellipse has equation (x/a)² + (y/b)² = 1. If a = b, the ellipse reduces to a circle and the circumference is simply 2πa. But the general formula for circumference requires the hypergeometric function 2F1:

\setlength\arraycolsep{1pt} \pi (a + b) \,  {}_2 F_1\left(\begin{matrix}-1/2& &-1/2 \\&1& \end{matrix}\middle;\lambda^2\right)

where λ = (ab)/(a + b).

However, if the ellipse is anywhere near circular, the following approximation due to Ramanujan is extremely good:

\pi (a + b) \left(1 + \frac{3\lambda^2}{10 + \sqrt{4 - 3\lambda^2}}\right)

To quantify what we mean by extremely good, the error is O(λ10). When an ellipse is roughly circular, λ is fairly small, and the error is on the order of λ to the 10th power.

To illustrate the accuracy of the approximation, I tried the formula out on some planets. The error increases with ellipticity, so I took the most most elliptical orbit of a planet or object formerly known as a planet. That distinction belongs to Pluto, in which case λ = 0.016. If Pluto’s orbit were exactly elliptical, you could use Ramanujan’s approximation to find the circumference of its orbit with an error less than one micrometer.

Next I tried it on something with a much more elliptical orbit: Halley’s comet. Its orbit is nearly four times longer than it is wide. For Halley’s comet, λ = 0.59 and Ramanujan’s approximation agrees with the exact result to seven significant figures. The exact result is 11,464,319,022 km and the approximation is 11,464,316,437 km.

Here’s a video showing how elliptical the comet’s orbit is.

If you’d like to experiment with the approximation, here’s some Python code:

from scipy import pi, sqrt
from scipy.special import hyp2f1

def exact(a, b):
    t = ((a-b)/(a+b))**2
    return pi*(a+b)*hyp2f1(-0.5, -0.5, 1, t)

def approx(a, b):
    t = ((a-b)/(a+b))**2
    return pi*(a+b)*(1 + 3*t/(10 + sqrt(4 - 3*t)))

# Semimajor and semiminor axes for Halley's comet orbit
a = 2.667950e9 # km
b = 6.782819e8 # km

print exact(a, b)
print approx(a, b)

Rolling dice for normal samples: Python version

A handful of dice can make a decent normal random number generator, good enough for classroom demonstrations. I wrote about this a while ago.

My original post included Mathematica code for calculating how close to normal the distribution of the sum of the dice is. Here I’d like to redo the code in Python to show how to do the same calculations using SymPy. [Update: I’ll also give a solution that does not use SymPy and that scales much better.]

If you roll five dice and add up the spots, the probability of getting a sum of k is the coefficient of xk in the expansion of

(x + x2 + x3 + x4 + x5 + x6)5 / 65.

Here’s code to find the probabilities by expanding the polynomial and taking coefficients.

from sympy import Symbol

sides = 6
dice = 5
rolls = range( dice*sides + 1 )

# Tell SymPy that we want to use x as a symbol, not a number
x = Symbol('x')

# p(x) = (x + x^2 + ... + x^m)^n
# where m = number of sides per die
# and n = number of dice
p = sum([x**i for i in range(1, sides + 1)])**dice

# Extract the coefficients of p(x) and divide by sides**dice
pmf = [sides**(-dice) * p.expand().coeff(x, i) for i in rolls]

If you’d like to compare the CDF of the dice sum to a normal CDF you could add this.

from scipy import array, sqrt
from scipy.stats import norm

cdf = array(pmf).cumsum()

# Normal CDF for comparison
mean = 0.5*(sides + 1)*dice
variance = dice*(sides**2 -1)/12.0
temp = [norm.cdf(i, mean, sqrt(variance)) for i in roles]
norm_cdf = array(temp)

diff = abs(cdf - norm_cdf)
# Print the maximum error and where it occurs
print diff.max(), diff.argmax()

Question: Now suppose you want a better approximation to a normal distribution. Would it be better to increase the number of dice or the number of sides per dice? For example, would you be better off with 10 six-sided dice or 5 twelve-sided dice? Think about it before reading the solution.

Update: The SymPy code does not scale well. When I tried the code with 50 six-sided dice, it ran out of memory. Based on Andre’s comment, I rewrote the code using polypow. SymPy offers much more symbolic calculation functionality than NumPy, but in this case NumPy contains all we need. It is much faster and it doesn’t run out of memory.

from numpy.polynomial.polynomial import polypow
from numpy import ones

sides = 6
dice = 100

# Create an array of polynomial coefficients for
# x + x^2 + ... + x^sides
p = ones(sides + 1)
p[0] = 0

# Extract the coefficients of p(x)**dice and divide by sides**dice
pmf = sides**(-dice) * polypow(p, dice)
cdf = pmf.cumsum()

That solution works for up to 398 dice. What’s up with that? With 399 dice, the largest polynomial coefficient overflows. If we divide by the number of dice before raising the polynomial to the power dice, the code becomes a little simpler and scales further.

p = ones(sides + 1)
p[0] = 0
p /= sides
pmf = polypow(p, dice)
cdf = pmf.cumsum()

I tried this last approach on 10,000 dice with no problem.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

New introduction to SciPy

The Python stack for scientific computing is more modular than say R or Mathematica. Python is a general-purpose programming language that has libraries for scientific computing. R and Mathematica are statistical and mathematical programming languages that have general-purpose features. The Python approach has its advantages — I’d rather do math in a general language than do general programming in a mathematical language — but it takes longer to learn. The components of the Python stack work well together, but someone new to Python has to discover what components they’ll need.

Several books have come out recently to help someone learn Python and the components for numerical computing. The latest is Learning SciPy for Numerical and Scientific Computing by Francisco J. Blanco-Silva.

This book covers the things you’d expect, including SciPy, NumPy, and Matplotlib. The only exception may be IPython. But no book can cover everything. And since IPython is an environment more than a library, it makes some sense to leave it out.

In addition to the usual topics, the book includes several important topics that are not as commonly covered. For example, it devotes a good amount of space to special functions and integration in a chapter on numerical analysis. I share the author’s assessment that this is “one of the most interesting chapters in the book.”

There are three chapters on more specific applications: signal processing, data mining, and computational geometry. These chapters give an introduction to their topics as well as how to carry out computations in SciPy.

The final chapter of the book is on integration with other programming languages.

Learning SciPy for Numerical and Scientific Computing covers the material you need to get going. It’s easy to read, and still fairly small: 150 pages total, about 130 pages of content. This is the right size for such a book in my opinion. There’s plenty of online documentation for more details, but it helps to have a good overview such as this book before diving into reference material.

* * *

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

Three new Python books

This post reviews three Python books that have come out recently:

SciPy and NumPy (ISBN 1449305466) by Eli Bressert is the smallest book I’ve seen from O’Reilly, aside from books in their pocket guide series. The SciPy and NumPy libraries are huge, and it can be hard to know where to start. This book gives a good, brisk overview.  In addition to SciPy and NumPy, the it also gives a brief introduction to SciKit, in particular scikit-learn for machine learning and scikit-image for image processing.

(Eli told me that he is working on supplementary material for the book. Everyone who bought the book electronically will automatically receive the new material when it is available.)

Python for Kids (ISBN 1593274076) by Jason R. Briggs is an introduction to programming aimed at kids. It starts with with an introduction to Python and moves to developing a simple game. It seems to me that kids would find the book interesting. It’s about seven times longer than the SciPy and NumPy book. It moves at a slow pace, has many illustrations, and has a casual tone.

NumPy Cookbook by Ival Idris contains around 70 small recipes, about three pages each. Many of these are about NumPy itself, but the book covers much more than its title would imply. Out of 10 chapters, four are strictly about NumPy. The first chapter of the book is about IPython. Another chapter is about “connecting NumPy with the rest of the world,” i.e. interfacing with Java, R, Matlab, and cloud services. Two chapters are devoted to profiling, debugging, and optimizing performance. There is a chapter on quality assurance (static analysis, unit testing, and documentation). And the final chapter is about Scikits and Pandas.

Winston Churchill, Bessie Braddock, and Python

Last night I was talking with someone about the pros and cons of various programming languages and frameworks for data analysis. One of the pros of Python is its elegance. The primary con is that it can be slow.

The conversation reminded me of an apocryphal exchange between Winston Churchill and Bessie Braddock.

Braddock: Winston, you are drunk.

Churchill: Yes I am. And you, Bessie, are ugly. But I shall be sober in the morning, and you will still be ugly.

Python can be slow, though there are ways to improve its performance. But ugly code is just ugly, and there’s nothing you can do about it.

Python for data analysis

I recommend using Python for data analysis, and I recommend Wes McKinney’s book Python for Data Analysis.

I prefer Python to R for mathematical computing because mathematical computing doesn’t exist in a vacuum; there’s always other stuff to do. I find doing mathematical programming in a general-purpose language is easier than doing general-purpose programming in a mathematical language. Also, general-purpose languages like Python have larger user bases, are better designed, have better tool support, etc.

Python per se doesn’t have everything you need for mathematical computing. You need to combine several tools and libraries, typically at least SciPy, matplotlib, and IPython. Because there are different pieces involved, it’s hard to find one source to explain using them all together. Also, even with the three additional components mentioned before, there is a need for additional software for working with structured data.

Wes McKinney developed the pandas library to give Python “rich data structures and functions designed to make working with structured data fast, easy, and expressive.” And now he has addressed the need for unified exposition by writing a single book that describes how to use the Python mathematical computing stack. Importantly, the book covers two recent developments that make Python more competitive with other environments for data analysis: enhancements to IPython and Wes’ own pandas project.

Python for Data Analysis is available for pre-order. I don’t know when the book will be available but Amazon lists the publication date as October 29. My review copy was a PDF, but at least one paper copy has been spotted in the wild:

Wes McKinney holding his book at O’Reilly’s Strata Conference. Photo posted on Twitter yesterday.

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

Computing log gamma differences

Statistical computing often involves working with ratios of factorials. These factorials are often too big to fit in a floating point number, and so we work with logarithms. So if we need to compute log(a! / b!), we call software that calculates log(a!) and log(b!) directly without computing a! or  b! first. More on that here. But sometimes this doesn’t work.

Suppose a = 10^40 and b = a – 10^10. Our first problem is that we cannot even compute b directly. Since ab is 30 orders of magnitude smaller than a, we’d need about 100 bits of precision to begin to tell a and b apart, about twice as much as a standard floating point number. (Why 100? That’s the log base 2 of 10^30. And how much precision does a floating point number have? See here.)

Our next problem is that even if we could accurately compute b, the log gamma function is going to be approximately the same for a and b, and so the difference will be highly inaccurate.

So what do we do? We could use some sort of extended precision package, but that is not necessary. There’s an elegant solution using ordinary precision.

Let f(x) = log(Γ(x)). The mean value theorem says that

f(a+1) – f(b+1) = (ab) f‘(c+1)

for some c between a+1 and b+1. We don’t need to compute b, only ab, which we know is 10^10. The derivative of the log of the gamma function is called the digamma function, written ψ(x). So

log(a! / b!) = 10^10 ψ(c+1).

But what is c? The mean value theorem only says that the equation above is true for some c. What c do we use? It doesn’t matter. Since a and b are so relatively close, the digamma function will take on essentially the same value for any value between a+1 and b+1. Therefore

log(a! / b!) ≈ 10^10 ψ(a).

We could compute this in two lines of Python:

from scipy.special import digamma
print 1e10*digamma(1e40)

Related posts:

Click to find out more about consulting for numerical computing

Using SciPy with IronPython

Three years ago I wrote a post about my disappointment using SciPy with IronPython. A lot has changed since then, so I thought I’d write a short follow-up post.

To install NumPy and SciPy for use with IronPython, follow the instructions here. [Update: no longer available.] After installation, NumPy works as expected.

There is one small gotcha with SciPy. To use SciPy with IronPython, start ipy with the command line argument -X:Frames. Then you can use SciPy as you would from CPython. For example.

c:> ipy -X:Frames
>>> import scipy as sp
>>> sp.pi

Without the -X:Frames option you’ll get an error when you try to import scipy.

AttributeError: 'module' object has no attribute '_getframe'

According to this page,

The issue is that SciPy makes use of the CPython API for inspecting the current stack frame which IronPython doesn’t enable by default because of a small runtime performance hit. You can turn on this functionality by passing the command line argument “-X:Frames” to on the command line.

* * *

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

Math languages vs. application languages

Last Friday I posted on @SciPyTip a summary of why I like SciPy, the scientific programming library for Python.

I’d rather do math in a general-purpose language than try to do general-purpose programming in a math language.

Mathematical software is never purely mathematical. The math has to connect to something, and that’s where most of the programming effort may go.

If you’re responsible for the math and for the larger system it’s embedded in, you have three options.

  1. Write the math in an application programming language.
  2. Do application programming in a mathematical language.
  3. Use different languages for the math and the larger system.

My preferred option is #1. My second choice is #3. My last choice, by far, is #2.

Application languages are typically better for math than mathematical languages are for applications. Application languages also have a larger user base, and hence better documentation and tools.

Mixing two languages works well in a team that has someone specialized in the math language, someone specialized in the application language, and someone fluent in plumbing the two languages together. If you have be all three people, you might wonder whether you could do everything in one language.

Related posts:

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

SciPy integration misunderstanding

Today I needed to compute an integral similar to this:

int_{1000}^infty frac{dx}{100, x^3}

I used the following SciPy code to compute the integral:

from scipy.integrate import quad

def f(x):
    return 0.01*x**-3

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6)
print integral, error

My intention was to compute the integral to 6 significant figures. (epsrel is a shortened form of epsilon relative, i.e. relative error.) To my surprise, the estimated error was larger than the value of the integral. Specifically, the integral was computed as 5.15 × 10-9 and the error estimate was 9.07 × 10-9.

What went wrong? The integration routine quad lets you specify either a desired bound on your absolute error (epsabs) or a desired bound on your relative error (epsrel). I assumed that since I specified the relative error, the integration would stop when the relative error requirement was met. But that’s not how it works.

The quad function has default values for both epsabs and epsrel.

def quad(... epsabs=1.49e-8, epsrel=1.49e-8, ...):

I thought that since I did not specify an absolute error bound, the bound was not effective, or equivalently, that the absolute error target was 0. But no! It was as if I’d set the absolute error bound to 1.49 × 10-8. Because my integral is small (the exact value is 5 × 10-9) the absolute error requirement is satisfied before the relative error requirement and so the integration stops too soon.

The solution is to specify an absolute error target of zero. This condition cannot be satisfied, and so the relative error target will determine when the integration stops.

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6, epsabs = 0)

This correctly computes the integral as 5 × 10-9 and estimates the integration error as 4 ×10-18.

It makes some sense for quad to specify non-zero default values for both absolute and relative error, though I imagine most users expect small relative error rather than small absolute error, so perhaps the latter could be set to 0 by default.

Approximating Earth as a sphere

Isaac Newton suggested in 1687 that the earth is not a perfectly round sphere but rather an ellipsoid, and he was right. But since our planet is roughly a sphere, it’s often useful to approximate it by a sphere. So if you’re going to do that, what radius do you use? More generally, what radius do you use when approximating any ellipsoid by a sphere?

This post will discuss the more general problem of finding the radius when approximating any ellipsoid by a sphere. We will give the answer for Earth in particular, and we’ll show how to carry out the calculations. Most of the calculations are easy, but some involve elliptic integrals and we show how to compute these in Python.

First of all, what is an ellipsoid? It is a surface whose (x, y, z) coordinates satisfy

frac{x^2}{a^2} + frac{y^2}{b^2} + frac{z^2}{c^2} = 1

Earth is an oblate spheroid, which means a = b > c. Specifically, a = b = 6,378,137 meters, and c = 6,356,752 meters.

If you wanted to approximate an ellipsoid by a sphere, you could use

r = (a + b + c)/3.

Why? Because the knee-jerk reaction whenever you need to reduce a set of numbers to one number is to average them.

We could do a little better, depending on what property of the ellipsoid we’d like to preserve in our approximation. For example, we might want to create a sphere with the same volume as the ellipsoid. In that case we’d use the geometric mean

r = (abc)1/3.

This is because the volume of an ellipsoid is 4πabc/3 and the volume of a sphere is 4πr3/3.

For the particular case of the earth, we’d use

(a2c)1/3 = 6371000.7

For some applications we might want a sphere with the same surface area as the ellipsoid rather than the same volume.

The surface area of an ellipsoid is considerably more complicated than the volume. For the special case of an oblate spheroid, like earth, the area is given by

2pi a^2 left( 1 + frac{1 - e^2}{e} tanh^{-1}e right)


e^2 = 1 - frac{c^2}{a^2}

The surface area of a sphere is 4 πr2 and so the following code computes r.

from math import sqrt, atanh
e = sqrt(1 - (c/a)**2)
r = a*sqrt(1 + (1-e**2)*atanh(e)/e) / sqrt(2)

This gives r = 6371007.1 for the earth, about 6.4 meters more than the number we got matching volume rather than area.

For a general ellipsoid, the surface area is given by

2pi c^2 + frac{2pi a b}{sin varphi} left( E(varphi, k) sin^2varphi + F(varphi, k) cos^2 varphiright)


cos varphi = frac{c}{a}


k^2 = frac{a^2(b^2 - c^2)}{b^2(a^2 - c^2)}

Here F is the “incomplete elliptic integral of the first kind” and E is the “incomplete elliptic integral of the second kind.” The names are historical artifacts, but the “elliptic” part of name comes from the fact that these functions were discovered in the context of arc lengths with ellipses, so it shouldn’t be too surprising to see them here.

In SciPy, F(φ, k) is given by ellipkinc and E(φ, k) is given by ellipeinc. Both function names start with ellip because they are elliptic functions, and end in inc because they are “incomplete.” In the middle, ellipeinc has an “e” because it computes the mathematical function E(φ, k).

But why does ellipkinc have a “k” in the middle? The “complete” elliptic integral of the first kind is K(k) = F(π/2, k). The “k” in the function name is a reminder that we’re computing the incomplete version of the K function.

Here’s the code for computing the surface area of a general ellipsoid:

from math import sin, cos, acos, sqrt, pi
from scipy.special import ellipkinc, ellipeinc

def area(a, b, c):
    phi = acos(c/a)
    k = a*sqrt(b**2 - c**2)/(b*sqrt(a**2 - c**2))
    E = ellipeinc(phi, k)
    F = ellipkinc(phi, k)
    elliptic = E*sin(phi)**2 + F*cos(phi)**2
    return 2.0*pi*c**2 + 2*pi*a*b*elliptic/sin(phi)

The differences between the various approximation radii are small for Earth. See my next post on elliptical galaxies where the differences are much larger.

Related posts:

Example of not inverting a matrix: optimization

People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function fmin_ncg.

If a matrix A is large and sparse, it may be possible to solve Ax = b but impossible to even store the matrix A-1 because there isn’t enough memory to hold it. Sometimes it’s sufficient to be able to form matrix-vector products Ax. Notice that this doesn’t mean you have to store the matrix A; you have to produce the product Ax as if you had stored the matrix A and multiplied it by x.

Very often there are physical reasons why the matrix A is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of A, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute Ax much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.

Iterative methods of solving Ax = b, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products Ax and updating x as a result. These methods might be very useful for a couple reasons.

  1. You only have to form products of a sparse matrix and a vector.
  2. If don’t need a very accurate solution, you may be able to stop very early.

In Newton’s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton’s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.

The function scipy.optimize.fmin_ncg can take as an argument a function fhess that computes the Hessian matrix H of the objective function. But more importantly, it lets you provide instead a function fhess_p that computes the product of the H with a vector. You don’t have to supply the actual Hessian matrix because the fmin_ncg method doesn’t need it. It only needs a way to compute matrix-vector products Hx to find approximate Newton search directions.

For more information, see the SciPy documentation for fmin_ncg.

More: Applied linear algebra