Posts tagged as:

SciPy

People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function fmin_ncg.

[click to continue...]

{ 2 comments }

How to compute jinc(x)

by John on February 2, 2012

The function jinc(x) that I wrote about yesterday is almost trivial to implement, but not quite. I’ll explain why it’s not quite as easy as it looks and how one might implement it in C and Python. [click to continue...]

{ 8 comments }

Benford’s law and SciPy

by John on October 19, 2011

Imagine you picked up a dictionary and found that the pages with A’s were dirty and the Z’s were clean. In between there was a gradual transition with the pages becoming cleaner as you progressed through the alphabet. You might conclude that people have been looking up a lot of words that begin with letters near the beginning of the alphabet and not many near the end.

That’s what Simon Newcomb did in 1881, only he was looking at tables of logarithms. He concluded that people were most interested in looking up the logarithms of numbers that began with 1 and progressively less interested in logarithms of numbers beginning with larger digits. This sounds absolutely bizarre, but he was right. The pattern he described has been repeatedly observed and is called Benford’s law. (Benford re-discovered the the same principle in 1938, and per Stigler’s law, Newcomb’s observation was named after Benford.)

Benford’s law predicts that for data sets such as collections of physical constants, about 30% of the numbers will begin with 1 down to about 5% starting with 8 or 9. To be precise, it says the leading digit will be d with probability log10(1 + 1/d). For a good explanation of Benford’s law, see TAOCP volume 2.

A couple days ago I blogged about using SciPy’s collection of physical constants to look for values that were approximately factorials. Let’s look at that set of constants again and see whether the most significant digits of these constants follows Benford’s law.

Here’s a bar chart comparing the actual number of constants starting with each digit to the results we would expect from Benford’s law.

Here’s the code that was used to create the data for the chart.

from math import log10, floor
from scipy.constants import codata

def most_significant_digit(x):
    e = floor(log10(x))
    return int(x*10**-e)

# count how many constants have each leading digit
count = [0]*10
d = codata.physical_constants
for c in d:
    (value, unit, uncertainty) = d[c]
    x = abs(value)
    count[ most_significant_digit(x) ] += 1
total = sum(count)

# expected number of each leading digit per Benford's law
benford = [total*log10(1 + 1./i) for i in range(1, 10)]

The chart itself was produced using matplotlib, starting with this sample code.

The actual counts we see in scipy.constants line up fairly well with the predictions from Benford’s law. The results are much closer to Benford’s prediction than to the uniform distribution that you might have expected before hearing of Benford’s law.

Update: See the next post for an explanation of why factorials also follow Benford’s law.

Related posts:

Physical constants and factorials
Slide rules

{ 6 comments }

Physical constants and factorials

by John on October 17, 2011

The previous post mentioned that Avogadro’s constant is approximately 24!. Are there other physical constants that are nearly factorials?

[click to continue...]

{ 11 comments }

Python for high performance computing

by John on March 21, 2011

William Scullin’s talk from PyCon 2011: Python for high performance computing.

At least in our shop [Argonne National Laboratory] we have three accepted languages for scientific computing. In this order they are C/C++, Fortran in all its dialects, and Python. You’ll notice the absolute and total lack of Ruby, Perl, Java.

If you’re interested in Python and HPC, check out SciPyTip.

{ 8 comments }

Scientific Python on Twitter

by John on February 21, 2011

Next week I’m starting a new daily tip Twitter account: @SciPyTip.

This account will post on things related to scientific computing in Python, including the SciPy library, related software, and scientific computing in general.

Full list of daily tip accounts

{ 2 comments }

Ruby, Python, and Science

by John on November 28, 2010

David Jacobs has written a long blog post Ruby is beautiful (but I’m moving to Python). Here’s my summary.

Ruby is much better than Java, but the Ruby community is too focused on web development and the language has no scientific library. Python has a lot of the same advantages as Ruby, is used for more than web programming, and has SciPy.

Update: There is now a fledgling SciRuby project.

Further reading:

Plain Python
Getting started with SciPy
Replacing Mathematica with Python
SciPy and NumPy for .NET

{ 9 comments }

Bug in SciPy’s erf function

by John on September 2, 2010

Last night I produced the plot below and was very surprised at the jagged spike. I knew the curve should be smooth and strictly increasing.

My first thought was that there must be a numerical accuracy problem in my code, but it turns out there’s a bug in SciPy version 0.8.0b1. I started to report it, but I saw there were similar bug reports and one such report was marked as closed, so presumably the fix will appear in the next release.

The problem is that SciPy’s erf function is inaccurate for arguments with imaginary part near 5.8. For example, Mathematica computes erf(1.0 + 5.7i) as -4.5717×1012 + 1.04767×1012 i. SciPy computes the same value as -4.4370×1012 + 1.3652×1012 i. The imaginary component is off by about 30%.

Here is the code that produced the plot.

from scipy.special import erf
from numpy import linspace, exp
import matplotlib.pyplot as plt

def g(y):
    z = (1 + 1j*y) /  sqrt(2)
    temp = exp(z*z)*(1 - erf(z))
    u, v = temp.real, temp.imag
    return -v / u

x = linspace(0, 10, 101)
plt.plot(x, g(x))

{ 4 comments }

Replacing Mathematica with Python

by John on July 9, 2010

Everything I do regularly in Mathematica can be done in Python. Even though Mathematica has a mind-boggling amount of functionality, I only use a tiny proportion of it. I skimmed through some of my Mathematica files to see what functions I use and then looked for Python counterparts. I found I use less of Mathematica than I imagined.

The core mathematical functions I need are in SciPy. The plotting features are in matplotlib. The SymPy library appears to have the symbolic functionality I need, though I’m as not sure about this one.

I don’t have much experience with the Python libraries listed above. I haven’t used SymPy at all; I’ve only browsed its web site. Maybe I’ll find I’d rather work in Mathematica, particularly when I’m just trying out ideas. But I want to experiment with using Python for more tasks.

As I’ve blogged about before, I’d like to consolidate my tools. I started using Emacs again because I was frustrated with using a different editor for every kind of file. One of the things I find promising about Python is that I may be able to do more in Python and reduce the number of programming languages I use regularly.

Related posts:

Languages that are easy to pick back up
Where the Unix philosophy breaks down

{ 17 comments }

SciPy and NumPy for .NET

by John on July 1, 2010

Travis Oliphant announced this morning at the SciPy 2010 conference that Microsoft is partnering with Enthought to produce a version of NumPy and SciPy for .NET. NumPy and SciPy are Python libraries for scientific computing. Oliphant is the president of Enthought and the original developer of NumPy.

It is possible to call NumPy and SciPy from IronPython now by using IronClad. However, going through IronClad can be inefficient.  The new libraries will enable efficient access to NumPy and SciPy from .NET languages and in particular from IronPython.

Here is the official press release from Enthought.

Photo credit: Paul Ivanov

{ 15 comments }

Using py2exe with SciPy

by John on February 12, 2010

py2exe is a program that takes Python code and produces a Windows executable that can run on computers that do not have Python installed. My focus here is in using py2exe on Python code that depends on SciPy. [click to continue...]

{ 1 comment }

A few days ago I wrote a post on finding parameters so that a probability distribution satisfies two percentile conditions. Since then I’ve written Python code to carry out the calculations described in that article and the accompanying technical report.

The article is Finding probability distribution parameters from percentiles posted on CodeProject. The article comes with Python source code and some commentary. The article shows how SciPy and the functools module make it possible for the code to be very succinct.

Related posts:

Probability distribution parameters in SciPy
Numerical computing in IronPython with Ironclad
Getting started with SciPy

{ 0 comments }

Parameterizations are the bane of statistical software. One of the most common errors is to assume that one software package uses the same parameterization as another package. For example, some packages specify the exponential distribution in terms of the mean but others use the rate. [click to continue...]

{ 4 comments }

CodeProject just published my article
Getting Started with SciPy (Scientific Python)

{ 1 comment }

IEEE floating point arithmetic in Python

by John on July 21, 2009

Sometimes a number is not a number. Numeric data types represent real numbers in a computer fairly well most of the time, but sometimes the abstraction leaks. The sum of two numeric types is always a numeric type, but the result might be a special bit pattern that says overflow occurred. Similarly, the ratio of two numeric types is a numeric type, but that type might be a special type that says the result is not a number.

The IEEE 754 standard dictates how floating point numbers work. I’ve talked about IEEE exceptions in C++ before. This post is the Python counterpart. Python’s floating point types are implemented in terms of C’s double type  and so the C++ notes describe what’s going on at a low level. However, Python creates a higher level abstraction for floating point numbers. (Python also has arbitrary precision integers, which we will discuss at the end of this post.)

There are two kinds of exceptional floating point values: infinities and NaNs. Infinite values are represented by inf and can be positive or negative. A NaN, not a number, is represented by nan. Let x = 10200. Then x2 will overflow because 10400 is too big to fit inside a C double. (To understand just why, see Anatomy of a floating point number.) In the following code, y will contain a positive infinity.

x = 1e200; y = x*x

If you’re running Python 3.0 and you print y, you’ll see inf. If you’re running an earlier version of Python, the result may depend on your operating system. On Windows, you’ll see 1.#INF but on Linux you’ll see inf. Now keep the previous value of y and run the following code.

z = y; z /= y

Since z = y/y, you might think z should be 1. But since y was infinite, it doesn’t work that way. There’s no meaningful way to assign a numeric value to the ratio of infinite values and so z contains a NaN. (You’d have to know “how they got there” so you could take limits.) So if you print z you’d see nan or 1.#IND depending on your version of Python and your operating system.

The way you test for inf and nan values depends on your version of Python. In Python 3.0, you can use the functions math.isinf and math.isnan respectively. Earlier versions of Python do not have these functions. However, the SciPy library has corresponding functions scipy.isinf and scipy.isnan.

What if you want to deliberately create an inf or a nan? In Python 3.0, you can use float('inf') or float('nan'). In earlier versions of Python you can use scipy.inf and scipy.nan if you have SciPy installed.

IronPython does not yet support Python 3.0, nor does it support SciPy directly. However, you can use SciPy with IronPython by using Ironclad from Resolver Systems. If you don’t need a general numerical library but just want functions like isinf and isnan you can create your own.


def isnan(x): return type(x) is float and x != x
def isinf(x): inf = 1e5000; return x == inf or x == -inf

The isnan function above looks odd. Why would x != x ever be true? According to the IEEE standard, NaNs don’t equal anything, even each other. (See comments on the function IsFinite here for more explanation.) The isinf function is really a dirty hack but it works.

To wrap things up, we should talk a little about integers in Python. Although Python floating point numbers are essentially C floating point numbers, Python integers are not C integers. Python integers have arbitrary precision, and so we can sometimes avoid problems with overflow by working with integers. For example, if we had defined x as 10**200 in the example above, x would be an integer and so would y = x*x and y would not overflow; a Python integer can hold 10400 with no problem. We’re OK as long as we keep producing integer results, but we could run into trouble if we do anything that produces a non-integer result. For example,

x = 10**200; y = (x + 0.5)*x

would cause y to be inf, and

x = 10**200; y = x*x + 0.5

would throw an OverflowError exception.

Related posts:

Floating point numbers are a leaky abstraction
Anatomy of a floating point number
Overflow and loss of precision

{ 5 comments }