Python code for computing distribution parameters from percentiles

Posted on 3 February 2010 by John

A few days ago I wrote a post on finding parameters so that a probability distribution satisfies two percentile conditions. Since then I’ve written Python code to carry out the calculations described in that article and the accompanying technical report.

The article is Finding probability distribution parameters from percentiles posted on CodeProject. The article comes with Python source code and some commentary. The article shows how SciPy and the functools module make it possible for the code to be very succinct.

Probability distribution parameterizations in SciPy

Posted on 3 February 2010 by John

Parameterizations are the bane of statistical software. One of the most common errors is to assume that one software package uses the same parameterization as another package. For example, some packages specify the exponential distribution in terms of the mean but others use the rate.

Python’s SciPy library has a somewhat unusual approach to parameterization with some advantages. SciPy makes every continuous distribution a location-scale family, even those distributions that typically do not have a location or scale parameter. This eliminates, for example, the question of whether an exponential distribution is parameterized by its mean or its rate. There is no mean or rate parameter per se. But there is a scale parameter, which happens to also be the mean.

Some methods on distribution classes have unusual names. For example, the inverse CDF function, often called the quantile function, is ppf for “percentile point function.” The complementary CDF function, or CCDF, is called sf for “survival function.” (Survival function is not an unusual name, though my preference would have been ccdf since that would make the API more symmetric.)

Discrete distributions in SciPy do not have a scale parameter. Also instead of a pdf method the discrete distributions have a pmf method; continuous functions have a probability density function but discrete methods have a probability mass function.

One surprise with SciPy distributions is that the SciPy implementation of the lognormal distribution does not correspond to the definition I’m more familiar with unless the location is 0. In order to be consistent with other continuous distributions, SciPy shifts the PDF argument x whereas I believe it is more common to shift log(x). This isn’t just a difference in parameterization. It actually amounts to different distributions.

For more details, see these notes on distributions in SciPy. See also these notes on distributions in R and in Mathematica for comparison.

Little programs versus big programs

Posted on 3 February 2010 by John

From You Are Not a Gadget:

Little programs are delightful to write in isolation, but the process of maintaining large-scale software is always miserable. … Technologists wish every program behaved like a brand-new, playful little program, and will use any available psychological strategy to avoid thinking about computers realistically.

Sleep debt and industrial accidents

Posted on 2 February 2010 by John

From The Power of Full Engagement:

… every one of the great industrial disasters of the past twenty years — Chernobyl, the Exxon Valdez, Bhopal, Three Mile Island — occurred in the middle of the night. For the most part, those in charge had worked very long hours and built up considerable sleep debt.

New Python podcast: A little bit of Python

Posted on 1 February 2010 by John

There’s a new Python podcast: A little bit of Python [link rotted] with Michael Foord, Brett Cannon, Jesse Noller, Steve Holden, and Andrew Kuchling.

So far I’ve found the first episode most interesting. It discusses the “moratorium”, the plan to give Python library authors time catch up with Python 3 before extending the core language further. This sounds like a very smart move.

Updating “PowerShell Day 1” for PowerShell version 2.0

Posted on 1 February 2010 by John

Last year I wrote a little 10-page booklet called PowerShell Day 1. It covers many of the things I wish I had known when I started using PowerShell.

How do I configure PowerShell?
How do I make PowerShell launch faster?
How do I get documentation?
Why did PowerShell make some of the design decisions they did?
Once I’ve written some useful functions and scripts, where do I put them?
Where can I find more PowerShell resources?

Now I’ve started updating the booklet to reflect changes in PowerShell version 2.0. I haven’t had a lot of experience with version 2.0 and would appreciate your help updating the booklet. I have put a link to an alpha version of the update for version 2.0 on the download page.

Month: February 2010

Python code for computing distribution parameters from percentiles

Related posts

Probability distribution parameterizations in SciPy

Related posts

Little programs versus big programs

Related posts

Sleep debt and industrial accidents

New Python podcast: A little bit of Python

Related posts

Updating “PowerShell Day 1” for PowerShell version 2.0