<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Endeavour &#187; Python</title>
	<atom:link href="http://www.johndcook.com/blog/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johndcook.com/blog</link>
	<description>The blog of John D. Cook</description>
	<lastBuildDate>Fri, 10 Feb 2012 23:03:26 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Mixing R, Python, and Perl in 14 lines of code</title>
		<link>http://www.johndcook.com/blog/2012/02/09/mixing-r-python-and-perl-in-13-lines-of-code/</link>
		<comments>http://www.johndcook.com/blog/2012/02/09/mixing-r-python-and-perl-in-13-lines-of-code/#comments</comments>
		<pubDate>Thu, 09 Feb 2012 16:25:39 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Emacs]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10703</guid>
		<description><![CDATA[This is a continuation of my previous post, Running Python and R inside Emacs. That post shows how to execute independent code blocks in Emacs org-mode. This post illustrates calling one code block from another, each written in a different language.
The example below computes sin2(x) + cos2(x) by computing the sine function in R, the [...]]]></description>
			<content:encoded><![CDATA[<p>This is a continuation of my previous post, <a href="http://www.johndcook.com/blog/2012/02/09/python-org-mode/">Running Python and R inside Emacs</a>. That post shows how to execute independent code blocks in Emacs org-mode. This post illustrates calling one code block from another, each written in a different language.<span id="more-10703"></span></p>
<p>The example below computes sin<sup>2</sup>(x) + cos<sup>2</sup>(x) by computing the sine function in R, the cosine function in Python, and summing their squares in Perl. As you&#8217;d hope, it returns 1. (Actually, it returns 0.99999999999985 on my machine.)</p>
<p>To execute the code, go to the <code>#+call</code> line and type C-c C-c.</p>
<pre>#+name: sin_r(x=1)
#+begin_src R
sin(x)
#+end_src

#+name: cos_p(x=0)
#+begin_src python
import math
return math.cos(x)
#+end_src

#+name: sum_sq(a = 0, b = 0)
#+begin_src perl
$a*$a + $b*$b;
#+end_src

#+call: sum_sq(sin_r(1), cos_p(1))</pre>
<p>Apparently each function argument has to have a default value. If that&#8217;s documented, I missed it. I gave the sine and cosine functions default values that would cause the call to <code>sum_sq</code> to return more than 1 if the defaults were used.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2012/02/09/mixing-r-python-and-perl-in-13-lines-of-code/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Running Python and R inside Emacs</title>
		<link>http://www.johndcook.com/blog/2012/02/09/python-org-mode/</link>
		<comments>http://www.johndcook.com/blog/2012/02/09/python-org-mode/#comments</comments>
		<pubDate>Thu, 09 Feb 2012 13:00:58 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Emacs]]></category>
		<category><![CDATA[Literate programming]]></category>
		<category><![CDATA[Reproducibility]]></category>
		<category><![CDATA[Rstats]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10685</guid>
		<description><![CDATA[Emacs org-mode lets you manage blocks of source code inside a text file. You can execute these blocks and have the output display in your text file. Or you could export the file, say to HTML or PDF, and show the code and/or the results of executing the code.
Here I&#8217;ll show some of the most [...]]]></description>
			<content:encoded><![CDATA[<p>Emacs org-mode lets you manage blocks of source code inside a text file. You can execute these blocks and have the output display in your text file. Or you could export the file, say to HTML or PDF, and show the code and/or the results of executing the code.</p>
<p>Here I&#8217;ll show some of the most basic possibilities. For much more information, see  <a href="http://orgmode.org/">orgmode.org</a>. And for the use of org-mode in research, see <a href="http://www.jstatsoft.org/v46/i03/paper">A Multi-Language Computing Environment for Literate Programming and Reproducible Research</a>.</p>
<p><span id="more-10685"></span>Source code blocks go between lines of the form</p>
<pre>#+begin_src
#+end_src</pre>
<p>On the <code>#+begin_src</code> line, specify the programming language. Here I&#8217;ll demonstrate Python and R, but org-mode currently supports C++, Java, Perl, etc. for a total of <a href="http://orgmode.org/manual/Languages.html#Languages">35 languages</a>.</p>
<p>Suppose we want to compute &radic;42 using R.</p>
<pre>#+begin_src R
sqrt(42)
#+end_src</pre>
<p>If we put the cursor somewhere in the code block and type C-c C-c, org-mode will add these lines:</p>
<pre>#+results:
: 6.48074069840786</pre>
<p>Now suppose we do the same with Python:</p>
<pre>#+begin_src python
from math import sqrt
sqrt(42)
#+end_src</pre>
<p>This time we get disappointing results:</p>
<pre>#+results:
: None</pre>
<p>What happened? The org-mode manual explains:</p>
<blockquote><p>… code should be written as if it were the body of such a function.  In particular, note that Python does not automatically return a value from a function unless a <code>return</code> statement is present, and so a ‘<code>return</code>’ statement will usually be required in Python.</p></blockquote>
<p>If we change <code>sqrt(42)</code> to <code>return sqrt(42)</code> then we get the same result that we got when using R.</p>
<p>By default, evaluating a block of code returns a single result. If you want to see the output as if you were interactively using Python from the REPL, you can add <code>:results output :session</code> following the language name.</p>
<pre>#+begin_src python :results output :session
print "There are %d hours in a week." % (7*24)
2**10
#+end_src</pre>
<p>This produces the lines</p>
<pre>#+results:
: There are 168 hours in a week.
: 1024</pre>
<p>Without the <code>:session</code> tag, the second line would not appear because there was no <code>print</code> statement.</p>
<p>I had to do a couple things before I could get the examples above to work. First, I had to upgrade org-mode. The version of org-mode that shipped with Emacs 23.3 was quite out of date. Second, the only language you can run by default is Emacs Lisp. You have to turn on support for other languages in your <code>.emacs</code> file. Here&#8217;s the code to turn on support for Python and R.</p>
<pre>(org-babel-do-load-languages
    'org-babel-load-languages '((python . t) (R . t)))</pre>
<p><strong>Update</strong>: My <a href="http://www.johndcook.com/blog/2012/02/09/mixing-r-python-and-perl-in-13-lines-of-code/">next post</a> shows how to call code in written in one language from code written in another language.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/04/15/personal-organization-software/">Personal organization software</a><br />
<a href="http://www.johndcook.com/blog/2008/04/29/preventing-an-unpleasant-sweave-surprise/">Preventing an unpleasant Sweave surprise</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2012/02/09/python-org-mode/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Example of not inverting a matrix: optimization</title>
		<link>http://www.johndcook.com/blog/2012/02/08/newton-conjugate-gradient/</link>
		<comments>http://www.johndcook.com/blog/2012/02/08/newton-conjugate-gradient/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 13:00:56 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10461</guid>
		<description><![CDATA[People are invariably surprised when they hear it&#8217;s hardly ever necessary to invert a matrix. It&#8217;s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate [...]]]></description>
			<content:encoded><![CDATA[<p>People are invariably surprised when they hear it&#8217;s <a href="http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/">hardly ever necessary to invert a matrix</a>. It&#8217;s very often necessary solve linear systems of the form <em>Ax</em> = <em>b</em>, but in practice you almost never do this by inverting <em>A</em>. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function <code>fmin_ncg</code>.</p>
<p><span id="more-10461"></span></p>
<p>If a matrix <em>A</em> is large and sparse, it may be possible to solve <em>Ax</em> = <em>b</em> but impossible to even store the matrix <em>A</em><sup>-1</sup> because there isn&#8217;t enough memory to hold it. Sometimes it&#8217;s sufficient to be able to form matrix-vector products <em>Ax</em>. Notice that this doesn&#8217;t mean you have to store the matrix <em>A</em>; you have to produce the product <em>Ax</em> <span style="text-decoration: underline;">as if</span> you had stored the matrix <em>A</em> and multiplied it by <em>x</em>.</p>
<p>Very often there are physical reasons why the matrix <em>A</em> is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of <em>A</em>, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute <em>Ax</em> much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.</p>
<p>Iterative methods of solving <em>Ax</em> = <em>b</em>, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products <em>Ax</em> and updating <em>x</em> as a result. These methods might be very useful for a couple reasons.</p>
<ol>
<li>You only have to form products of a sparse matrix and a vector.</li>
<li>If don&#8217;t need a very accurate solution, you may be able to stop very early.</li>
</ol>
<p>In Newton&#8217;s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton&#8217;s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.</p>
<p>The function <code>scipy.optimize.fmin_ncg</code> can take as an argument a function <code>fhess</code> that computes the Hessian matrix <em>H</em> of the objective function. But more importantly, it lets you provide instead a function <code>fhess_p</code> that computes the product of the <em>H</em> with a vector. You don&#8217;t have to supply the actual Hessian matrix because the <code>fmin_ncg</code> method doesn&#8217;t need it. It only needs a way to compute matrix-vector products <em>Hx</em> to find approximate Newton search directions.</p>
<p>For more information, see the <a href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_ncg.html#scipy.optimize.fmin_ncg">SciPy documentation</a> for <code>fmin_ncg</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2012/02/08/newton-conjugate-gradient/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to compute jinc(x)</title>
		<link>http://www.johndcook.com/blog/2012/02/02/how-to-compute-jincx/</link>
		<comments>http://www.johndcook.com/blog/2012/02/02/how-to-compute-jincx/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 16:01:04 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10607</guid>
		<description><![CDATA[The function jinc(x) that I wrote about yesterday is almost trivial to implement, but not quite. I&#8217;ll explain why it&#8217;s not quite as easy as it looks and how one might implement it in C and Python.
The function jinc(x) is defined as J1(x) / x, so if you have code to compute J1 then it [...]]]></description>
			<content:encoded><![CDATA[<p>The function <a href="http://www.johndcook.com/blog/2012/02/01/jinc-function/">jinc(x)</a> that I wrote about yesterday is almost trivial to implement, but not quite. I&#8217;ll explain why it&#8217;s not quite as easy as it looks and how one might implement it in C and Python.<span id="more-10607"></span></p>
<p>The function jinc(x) is defined as J<sub>1</sub>(x) / x, so if you have code to compute J<sub>1</sub> then it ought to be a no-brainer. For example, why not use the following C code?</p>
<pre class="brush: plain; title: ; notranslate">
#include &lt;math.h&gt;
double jinc(double x) {
    return j1(x) / x;
}
</pre>
<p>The problem is that if you pass in 0, the code will divide by 0 and return a NaN. The function jinc(x) is defined to be 1/2 at x = 0 because that&#8217;s the limit of J<sub>1(x)</sub>(x) / x as x goes to 0. So we try again:</p>
<pre class="brush: plain; title: ; notranslate">
#include &lt;math.h&gt;
double jinc(double x) {
    return (x == 0.0) ? 0.5 : j1(x) / x;
}
</pre>
<p>Does that work? Technically, it could still fail — we&#8217;ll come back to that at the end — but we&#8217;ll assume for now that it&#8217;s OK.</p>
<p>We could write the analogous Python code, and it would be adequate as long as we&#8217;re only calling the function with scalars and not NumPy arrays.</p>
<pre class="brush: plain; title: ; notranslate">
from scipy.special import j1
def jinc(x):
    if x == 0.0:
        return 0.5
    return j1(x) / x
</pre>
<p>Now suppose you want to plot this function. You create an array of points, say</p>
<pre class="brush: plain; title: ; notranslate">x = np.linspace(-1, 1, 25)</pre>
<p>and plot <code>jinc(x)</code>. You&#8217;ll get a warning: &#8220;ValueError: The truth value of an array with one element is ambiguous. Use a.any() or a.all().&#8221; Incidentally, if we called <code>linspace</code> with an even integer in the last argument, our array of points would avoid zero and the naive implementation of <code>jinc</code> would work.</p>
<p>When Python tries to apply <code>jinc</code> to an array, it doesn&#8217;t know how to interpret the test <code>x == 0</code>. The warning suggests &#8220;Do you mean if any component of x is 0? Or if all components of x are 0?&#8221; Neither option is what we want. We want to apply <code>jinc</code> as written to each element of x. We could do this by calling the <code>vectorize</code> function.</p>
<pre class="brush: plain; title: ; notranslate">jinc = np.vectorize(jinc)</pre>
<p>This replaces our original <code>jinc</code> function with one that handles NumPy arrays correctly.</p>
<p>There is an extremely unlikely scenario in which the code above could fail. The value of J<sub>1</sub>(x) is approximately x/2 for small values of x. If the floating point value <code>x</code> is so small that <code>0.5*x</code> returns 0, our function will return 0, even though it should return 0.5. The C code above works for values of <code>x</code> as small as <code>DBL_MIN</code> and even values much smaller. (<code>DBL_MIN</code> is not the smallest value of a <code>double</code>, only the smallest <em>normalized</em> double.) But if you set</p>
<pre class="brush: plain; title: ; notranslate">x = DBL_MIN / pow(2.0, 52);</pre>
<p>then <code>jinc(x)</code> will return 0. If you want to be absolutely safe, you could change the implementation to</p>
<pre class="brush: plain; title: ; notranslate">
#include &lt;math.h&gt;
double jinc(double x) {
    return (fabs(x) &lt; 1e-8) ? 0.5 : j1(x) / x;
}
</pre>
<p>Why test for whether the absolute value is less than 10<sup>-8</sup> rather than a much smaller number? For small x, the error in approximating jinc(x) with 1/2 is on the order of x<sup>2</sup>/16. So for x as large as 10<sup>-8</sup>, the approximation error is below the resolution of a <code>double</code>. As a bonus, the function <code>jinc(x)</code> will be more efficient for |x| &lt; 10<sup>-8</sup> since it avoids a call to <code>j1</code>.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2012/02/01/jinc-function/">Jinc function</a><br />
<a href="http://www.johndcook.com/blog/2010/07/27/sine-approximation-for-small-x/">Sine approximation for small angles</a><br />
<a href="http://www.johndcook.com/blog/2010/06/07/math-library-functions-that-seem-unnecessary/">Functions in math.h that seem unnecessary</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2012/02/02/how-to-compute-jincx/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The Python ecosystem</title>
		<link>http://www.johndcook.com/blog/2011/12/07/python-ecosystem/</link>
		<comments>http://www.johndcook.com/blog/2011/12/07/python-ecosystem/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 04:06:02 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10145</guid>
		<description><![CDATA[The hard part about getting started with Python is not the language but the ecosystem. It&#8217;s easy to find good references on the Python language, but it&#8217;s harder to learn what packages are available, how to install them, etc. That was my experience, and Miz Nazim started with a similar observation in his article Python [...]]]></description>
			<content:encoded><![CDATA[<p>The hard part about getting started with Python is not the language but the ecosystem. It&#8217;s easy to find good references on the Python language, but it&#8217;s harder to learn what packages are available, how to install them, etc. That was my experience, and Miz Nazim started with a similar observation in his article <a href="http://mirnazim.org/writings/python-ecosystem-introduction/">Python Ecosystem: An Introduction</a>.</p>
<p>Maybe its always harder to learn a language&#8217;s ecosystem than the language itself. But I think this was the case for me with Python more than with other languages I&#8217;ve used. I wish I&#8217;d found Nazim&#8217;s article or something like it when I was learning Python.</p>
<p><strong>Related links</strong>:</p>
<p><a href="http://www.codeproject.com/KB/library/scipy.aspx">Getting started with SciPy</a><br />
<a href="http://www.johndcook.com/blog/2011/07/12/sage-beginners-guide/">Sage beginner&#8217;s guide</a><br />
<a href="http://www.johndcook.com/blog/2011/10/26/python-is-a-voluntary-language/">Python is a voluntary language</a><br />
<a href="https://twitter.com/#SciPyTip">SciPyTip on Twitter</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/12/07/python-ecosystem/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Python is a voluntary language</title>
		<link>http://www.johndcook.com/blog/2011/10/26/python-is-a-voluntary-language/</link>
		<comments>http://www.johndcook.com/blog/2011/10/26/python-is-a-voluntary-language/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 22:01:41 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9658</guid>
		<description><![CDATA[People who write Python choose to write Python.
I don&#8217;t hear people say &#8220;I use Python at work because I have to, but I&#8217;d rather be writing Java.&#8221; But often I do hear people say they&#8217;d like to use Python if their job would allow it. There must be someone out there writing Python who would [...]]]></description>
			<content:encoded><![CDATA[<p>People who write Python choose to write Python.</p>
<p>I don&#8217;t hear people say &#8220;I use Python at work because I have to, but I&#8217;d rather be writing Java.&#8221; But often I do hear people say they&#8217;d like to use Python if their job would allow it. There must be someone out there writing Python who would rather not, but I think that&#8217;s more common with other languages.</p>
<p>My point isn&#8217;t that everyone loves Python, but rather that those who don&#8217;t care for Python simply don&#8217;t write it.</p>
<p>Since Python isn&#8217;t a common choice for <a href="http://www.johndcook.com/blog/2008/02/14/enterprising-software/">enterprise software projects</a>, it can resist the pressure to be all things to all people. Having a &#8220;Benevolent Dictator for Life&#8221; also helps Python maintain <a href="http://www.johndcook.com/blog/2008/03/18/conceptual-integrity/">conceptual integrity</a>. Python is popular enough to have a critical mass of users, but not so popular that it is under pressure to lose its uniqueness.</p>
<p>I don&#8217;t know much about the Ruby world, but I wonder whether the increasing popularity of Ruby for web development has created pressure for Ruby to compromise its original philosophy. And I wonder whether Ruby&#8217;s creator Yukihiro Matsumoto has &#8220;dictatorial&#8221; control over his language analogous to the control Guido van Rossum has over Python.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2009/05/08/plain-python/">Plain Python</a><br />
<a href="http://www.johndcook.com/blog/2010/11/28/ruby-python-and-science/">Ruby, Python, and science</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/10/26/python-is-a-voluntary-language/feed/</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
		<item>
		<title>Leading digits of factorials</title>
		<link>http://www.johndcook.com/blog/2011/10/19/leading-digits-of-factorials/</link>
		<comments>http://www.johndcook.com/blog/2011/10/19/leading-digits-of-factorials/#comments</comments>
		<pubDate>Wed, 19 Oct 2011 14:04:24 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Number theory]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9734</guid>
		<description><![CDATA[Suppose you take factorials of a lot of numbers and look at the leading digit of each result. You could argue that there&#8217;s no apparent reason that any digit would be more common than any other, so you&#8217;d expect each of the digits 1 through 9 would come up 1/9 of the time. Sounds plausible, [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose you take factorials of a lot of numbers and look at the leading digit of each result. You could argue that there&#8217;s no apparent reason that any digit would be more common than any other, so you&#8217;d expect each of the digits 1 through 9 would come up 1/9 of the time. Sounds plausible, but it&#8217;s wrong.</p>
<p>The leading digits of factorials follow Benford&#8217;s law as described in the <a href="http://www.johndcook.com/blog/2011/10/19/benfords-law-and-scipy/">previous post</a>. In fact, factorials follow Benford&#8217;s law even better than physical constants do. Here&#8217;s a graph of the leading digits of the factorials of 1 through 500.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/factorial_leading_digits.png" alt="" width="400" height="271" /></p>
<p>In the remainder of this post, I&#8217;ll explain why Benford&#8217;s law should apply to factorials, make an aside on statistics, and point out an interesting feature of the Python code used to generate the chart above.</p>
<p><strong>Why Benford&#8217;s law applies</strong></p>
<p>Here&#8217;s a hand-waving explanation. One way to justify Benford&#8217;s law is to say that physical constants are uniformly distributed, but on a logarithmic scale. The same is true for factorials, and it&#8217;s easier to see why.</p>
<p>The leading digits of the logarithms depend on on their logarithms in base 10. The gamma function extends the factorial function and it is log-convex. The logarithm of the gamma function is fairly flat (see plot <a href="http://www.johndcook.com/blog/2009/01/13/the-gamma-function/">here</a>), and so the leading digits of the log-gamma function applied to integers are uniformly distributed on a logarithmic scale.  (I&#8217;ve mixed logs base 10 and natural logs here, but that doesn&#8217;t matter. All logarithms are the same up to a multiplicative constant. So if a plot is nearly linear on a log10 scale, it&#8217;s nearly linear on a natural log scale.)</p>
<p>Update: <a href="http://www.johndcook.com/blog/2011/10/19/leading-digits-of-factorials/#comment-108994">Graham</a> gives a <a href="http://web.williams.edu/go/math/sjmiller/public_html/BrownClasses/197/benford/Diaconis_DistrLeadingDigitsAndUnifDistrMod1.pdf">link</a> in the comments below to a paper proving that factorials satisfy Benford&#8217;s law exactly in the limit.</p>
<p><strong>Uniform on what scale?</strong></p>
<p>This example brings up an important principle in statistics. Some say that if you don&#8217;t have a reason to assume anything else, use a uniform distribution. For example, some say that a uniform prior is the ideal uninformative prior for Bayesian statistics. But you have to ask &#8220;Uniform on what scale?&#8221; It turns out that the leading digits of physical constants and factorials are indeed uniformly distributed, but on a logarithmic scale.</p>
<p><strong>Python integers and floating point</strong></p>
<p>I used nearly the same code to produce the chart above as I used in its counterpart in the previous post. However, one thing had to change: I couldn&#8217;t compute the leading digits of the factorials the same way. Python has extended precision integers, so I can compute 500! factorial without overflowing. Using floating point numbers, I could only go up to 170!. But when I used my previous code to find the leading digit, it first tried to apply <code>log10</code> to an integer larger than the largest representable floating point number and failed. Converting numbers such as 500! to floating point numbers will overflow. (See <a href="http://www.johndcook.com/blog/2009/04/06/numbers-are-a-leaky-abstraction/">Floating point numbers are a leaky abstraction</a>.)</p>
<p>The solution was to find the leading digit using only integer operations.</p>
<pre class="brush: python; title: ; notranslate">
def leading_digit_int(n):
    while n &gt; 9:
        n = n/10
    return n
</pre>
<p>This code works fine for numbers like 500! or even larger.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/10/19/benfords-law-and-scipy/">Benford&#8217;s law and SciPy</a><br />
<a href="http://www.johndcook.com/blog/2011/10/17/physical-constants-factorials/">Physical constants and factorials</a><br />
<a href="http://www.johndcook.com/blog/2009/04/06/anatomy-of-a-floating-point-number/">Anatomy of a floating point number</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/10/19/leading-digits-of-factorials/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Benford&#8217;s law and SciPy</title>
		<link>http://www.johndcook.com/blog/2011/10/19/benfords-law-and-scipy/</link>
		<comments>http://www.johndcook.com/blog/2011/10/19/benfords-law-and-scipy/#comments</comments>
		<pubDate>Wed, 19 Oct 2011 11:54:00 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9725</guid>
		<description><![CDATA[Imagine you picked up a dictionary and found that the pages with A&#8217;s were dirty and the Z&#8217;s were clean. In between there was a gradual transition with the pages becoming cleaner as you progressed through the alphabet. You might conclude that people have been looking up a lot of words that begin with letters [...]]]></description>
			<content:encoded><![CDATA[<p>Imagine you picked up a dictionary and found that the pages with A&#8217;s were dirty and the Z&#8217;s were clean. In between there was a gradual transition with the pages becoming cleaner as you progressed through the alphabet. You might conclude that people have been looking up a lot of words that begin with letters near the beginning of the alphabet and not many near the end.</p>
<p>That&#8217;s what Simon Newcomb did in 1881, only he was looking at tables of logarithms. He concluded that people were most interested in looking up the logarithms of numbers that began with 1 and progressively less interested in logarithms of numbers beginning with larger digits. This sounds absolutely bizarre, but he was right. The pattern he described has been repeatedly observed and is called <a href="http://en.wikipedia.org/wiki/Benford%27s_law">Benford&#8217;s law</a>. (Benford re-discovered the the same principle in 1938, and per <a href="http://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy">Stigler&#8217;s law</a>, Newcomb&#8217;s observation was named after Benford.)</p>
<p>Benford&#8217;s law predicts that for data sets such as collections of physical constants, about 30% of the numbers will begin with 1 down to about 5% starting with 8 or 9. To be precise, it says the leading digit will be <em>d</em> with probability log<sub>10</sub>(1 + 1/d). For a good explanation of Benford&#8217;s law, see <a href="http://www.amazon.com/gp/product/0201896842/ref=as_li_ss_tl?ie=UTF8&amp;tag=theende-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399369&amp;creativeASIN=0201896842">TAOCP volume 2</a>.</p>
<p><a href="http://www.johndcook.com/blog/2011/10/17/physical-constants-factorials/">A couple days ago</a> I blogged about using SciPy&#8217;s collection of physical constants to look for values that were approximately factorials. Let&#8217;s look at that set of constants again and see whether the most significant digits of these constants follows Benford&#8217;s law.</p>
<p>Here&#8217;s a bar chart comparing the actual number of constants starting with each digit to the results we would expect from Benford&#8217;s law.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/benford_scipy.png" alt="" width="400" height="271" /></p>
<p>Here&#8217;s the code that was used to create the data for the chart.</p>
<pre class="brush: python; title: ; notranslate">
from math import log10, floor
from scipy.constants import codata

def most_significant_digit(x):
    e = floor(log10(x))
    return int(x*10**-e)

# count how many constants have each leading digit
count = [0]*10
d = codata.physical_constants
for c in d:
    (value, unit, uncertainty) = d[c]
    x = abs(value)
    count[ most_significant_digit(x) ] += 1
total = sum(count)

# expected number of each leading digit per Benford's law
benford = [total*log10(1 + 1./i) for i in range(1, 10)]
</pre>
<p>The chart itself was produced using <code>matplotlib</code>, starting with this <a href="http://matplotlib.sourceforge.net/users/screenshots.html#bar-charts">sample code</a>.</p>
<p>The actual counts we see in <code>scipy.constants</code> line up fairly well with the predictions from Benford&#8217;s law. The results are much closer to Benford&#8217;s prediction than to the uniform distribution that you might have expected before hearing of Benford&#8217;s law.</p>
<p><strong>Update</strong>: See the <a href="http://www.johndcook.com/blog/2011/10/19/leading-digits-of-factorials/">next post</a> for an explanation of why factorials also follow Benford&#8217;s law.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/10/17/physical-constants-factorials/">Physical constants and factorials</a><br />
<a href="http://www.johndcook.com/blog/2011/04/11/sliderules/">Slide rules</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/10/19/benfords-law-and-scipy/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Physical constants and factorials</title>
		<link>http://www.johndcook.com/blog/2011/10/17/physical-constants-factorials/</link>
		<comments>http://www.johndcook.com/blog/2011/10/17/physical-constants-factorials/#comments</comments>
		<pubDate>Mon, 17 Oct 2011 12:50:43 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9670</guid>
		<description><![CDATA[The previous post mentioned that Avogadro&#8217;s constant is approximately 24!. Are there other physical constants that are nearly factorials?

I searched SciPy&#8217;s collection of physical constants looking for values that are either nearly factorials or nearly reciprocals of factorials.
The best example is the &#8220;classical electron radius&#8221; re which is 2.818 × 10-15 m and 1/17! = [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.johndcook.com/blog/2011/10/17/avogadros-number/">previous post</a> mentioned that Avogadro&#8217;s constant is approximately 24!. Are there other physical constants that are nearly factorials?</p>
<p><span id="more-9670"></span></p>
<p>I searched SciPy&#8217;s collection of physical constants looking for values that are either nearly factorials or nearly reciprocals of factorials.</p>
<p>The best example is the &#8220;classical electron radius&#8221; <em>r</em><sub>e</sub> which is 2.818 × 10<sup>-15</sup> m and 1/17! = 2.811 × 10<sup>-15</sup>.</p>
<p>Also, the &#8220;Hartree-Hertz relationship&#8221; E<sub><em>h</em></sub>/<em>h</em> equals 6.58 × 10<sup>15</sup> and 18! = 6.4 × 10<sup>15</sup>. (E<sub><em>h</em></sub> is the <a href="http://en.wikipedia.org/wiki/Hartree_energy">Hartree energy</a> and <em>h</em> is Plank&#8217;s constant.)</p>
<p>Here&#8217;s the Python code I used to discover these relationships.</p>
<pre class="brush: python; title: ; notranslate">
from scipy.special import gammaln
from math import log, factorial
from scipy.optimize import brenth
from scipy.constants import codata

def inverse_factorial(x):
    # Find r such that gammaln(r) = log(x)
    # So gamma(r) = x and (r-1)! = x
    r = brenth(lambda t: gammaln(t) - log(x), 1.0, 100.0)
    return r-1

d = codata.physical_constants
for c in d:

    (value, unit, uncertainty) = d[c]
    x = value
    if x &lt; 0: x = abs(x)
    if x &lt; 1.0: x = 1.0/x
    r = inverse_factorial(x)
    n = round(r)
    # Use n &gt; 6 to weed out uninteresting values.
    if abs(r - n) &lt; 0.01 and n &gt; 6:
        fact = factorial(n)
        if value &lt; 1.0:
            fact = 1.0/fact
        print c, n, value, fact
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/10/17/physical-constants-factorials/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Code to slice open a Menger sponge</title>
		<link>http://www.johndcook.com/blog/2011/08/30/slice-a-menger-sponge/</link>
		<comments>http://www.johndcook.com/blog/2011/08/30/slice-a-menger-sponge/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 01:18:36 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9267</guid>
		<description><![CDATA[Last month the New York Times ran a story about a sculpture based on cutting open a &#8220;Menger sponge,&#8221; a shape formed by recursively cutting holes through a cube. All the holes are rectangular, but when you cut the sponge open at an angle, you see six-pointed stars.
Here are some better photos, including both a [...]]]></description>
			<content:encoded><![CDATA[<p>Last month the New York Times ran a <a href="http://www.nytimes.com/2011/06/28/science/28math-menger.html?_r=2&amp;src=tp">story</a> about a sculpture based on cutting open a &#8220;Menger sponge,&#8221; a shape formed by recursively cutting holes through a cube. All the holes are rectangular, but when you cut the sponge open at an angle, you see six-pointed stars.</p>
<p>Here are some <a href="http://www.georgehart.com/rp/half-menger-sponge.html">better photos</a>, including both a physical model and a computer animation. Thanks to <a href="http://www.walkingrandomly.com/">Mike Croucher</a> for the link.</p>
<p>I&#8217;ve written some Python code to take slices of a Menger sponge. Here&#8217;s a sample output.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/menger1.png" alt="" width="193" height="213" /></p>
<p><span id="more-9267"></span></p>
<p>The Menger sponge starts with a unit cube, i.e. all coordinates are between 0 and 1. At the bottom of the code, you specify a plane by giving a point inside the cube and vector normal to the plane. The picture above is a slice that goes through the center of the cube (0.5, 0.5, 0.5) with a normal vector running from the origin to the opposite corner (1, 1, 1).</p>
<pre class="brush: plain; title: ; notranslate">
from math import floor, sqrt
from numpy import empty, array
from matplotlib.pylab import imshow, cm, show

def outside_unit_cube(triple):
    x, y, z = triple
    if x &lt; 0 or y &lt; 0 or z &lt; 0:
        return 1
    if x &gt; 1 or y &gt; 1 or z &gt; 1:
        return 1
    return 0

def in_sponge( triple, level ):
    &quot;&quot;&quot;Determine whether a point lies inside the Menger sponge
    after the number of iterations given by 'level.' &quot;&quot;&quot;
    x, y, z = triple
    if outside_unit_cube(triple):
        return 0
    if x == 1 or y == 1 or z == 1:
        return 1
    for i in range(level):
        x *= 3
        y *= 3
        z *= 3

        # A point is removed if two of its coordinates
        # lie in middle thirds.
        count = 0
        if int(floor(x)) % 3 == 1:
            count += 1
        if int(floor(y)) % 3 == 1:
            count += 1
        if int(floor(z)) % 3 == 1:
            count += 1
        if count &gt;= 2:
            return 0

    return 1

def cross_product(v, w):
    v1, v2, v3 = v
    w1, w2, w3 = w
    return (v2*w3 - v3*w2, v3*w1 - v1*w3, v1*w2 - v2*w1)

def length(v):
    &quot;Euclidean length&quot;
    x, y, z = v
    return sqrt(x*x + y*y + z*z)

def plot_slice(normal, point, level, n):
    &quot;&quot;&quot;Plot a slice through the Menger sponge by
    a plane containing the specified point and having
    the specified normal vector. The view is from
    the direction normal to the given plane.&quot;&quot;&quot;

    # t is an arbitrary point
    # not parallel to the normal direction.
    nx, ny, nz = normal
    if nx != 0:
        t = (0, 1, 1)
    elif ny != 0:
        t = (1, 0, 1)
    else:
        t = (1, 1, 0)

    # Use cross product to find vector orthogonal to normal
    cross = cross_product(normal, t)
    v = array(cross) / length(cross)

    # Use cross product to find vector orthogonal
    # to both v and the normal vector.
    cross = cross_product(normal, v)
    w = array(cross) / length(cross)

    m = empty( (n, n), dtype=int )
    h = 1.0 / (n - 1)
    k = 2.0*sqrt(3.0)

    for x in range(n):
        for y in range(n):
            pt = point + (h*x - 0.5)*k*v + (h*y - 0.5)*k*w
            m[x, y] = 1 - in_sponge(pt, level)
    imshow(m, cmap=cm.gray)
    show()

# Specify the normal vector of the plane
# cutting through the cube.
normal = (1, 1, 0.5)

# Specify a point on the plane.
point = (0.5, 0.5, 0.5)

level = 3
n = 500
plot_slice(normal, point, level, n)
</pre>
<p><strong>Related post</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/07/27/slice-menger-sponge/">A chip off the old fractal block</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/08/30/slice-a-menger-sponge/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A couple Python-like features in C++11</title>
		<link>http://www.johndcook.com/blog/2011/08/17/a-couple-python-like-features-in-c11/</link>
		<comments>http://www.johndcook.com/blog/2011/08/17/a-couple-python-like-features-in-c11/#comments</comments>
		<pubDate>Wed, 17 Aug 2011 12:57:53 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[C++]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9184</guid>
		<description><![CDATA[The new C++ standard includes a couple Python-like features that I ran across recently. There are other Python-like features in the new standard, but here I&#8217;ll discuss range-based for-loops and raw strings.
In Python you loop over lists rather than rather than incrementing a loop counter variable. For example,

    for p in [2, [...]]]></description>
			<content:encoded><![CDATA[<p>The new C++ standard includes a couple Python-like features that I ran across recently. There are other Python-like features in the new standard, but here I&#8217;ll discuss range-based for-loops and raw strings.</p>
<p><span id="more-9184"></span>In Python you loop over lists rather than rather than incrementing a loop counter variable. For example,</p>
<pre class="brush: python; title: ; notranslate">
    for p in [2, 3, 5, 7, 11]:
        print p
</pre>
<p>Range-based for loops now let you do something similar in C++11:</p>
<pre class="brush: plain; title: ; notranslate">
    int primes[5] = {2, 3, 5, 7, 11};
    for (int &amp;p : primes)
        cout &lt;&lt; p &lt;&lt; &quot;\n&quot;;
</pre>
<p>Also, Python has raw strings. If you preface a quoted string with <code>R</code>, the contents of the string is interpreted literally. For example,</p>
<pre class="brush: python; title: ; notranslate">
    print &quot;Hello\nworld&quot;
</pre>
<p>will produce</p>
<pre>Hello
world</pre>
<p>but</p>
<pre class="brush: python; title: ; notranslate">
    print R&quot;Hello\nworld&quot;
</pre>
<p>will produce</p>
<pre>Hello\nworld</pre>
<p>because the <code>\n</code> is no longer interpreted as a newline character but instead printed literally as two characters.</p>
<p>Raw strings in C++11 use <code>R</code> as well, but they also require a delimiter inside the quotation marks. For example,</p>
<pre class="brush: plain; title: ; notranslate">
    cout &lt;&lt; R&quot;(Hello\nworld)&quot;;
</pre>
<p>The C++ raw string syntax is a little harder to read than the Python counterpart since it requires parentheses. The advantage, however, is that such strings can contain double quotes since a double quote alone does not terminate the string. For example,</p>
<pre class="brush: plain; title: ; notranslate">
    cout &lt;&lt; R&quot;(Hello &quot;world&quot;)&quot;;
</pre>
<p>would print</p>
<pre>Hello "world"</pre>
<p>In Python this is unnecessary since single and double quotes are interchangeable; if you wanted double quotes inside your string, you&#8217;d use single quotes on the outside.</p>
<p>Note that raw strings in C++ require a capital <code>R</code> unlike Python that allows <code>r</code> or <code>R</code>.</p>
<p>The C++ features mentioned here are supported gcc 4.6.0. The MinGW version of gcc for Windows is available <a href="http://nuwen.net/mingw.html">here</a>. To use C++11 features in gcc, you must add the parameter <code>-std=c++0x</code> to the <code>g++</code> command line. For example,</p>
<pre class="brush: plain; title: ; notranslate">
    g++ -std=c++0x hello.cpp
</pre>
<p>Visual Studio 2010 supports many of the new C++ features, but not the ones discussed here.</p>
<p><strong>Related links</strong>:</p>
<p><a href="http://gcc.gnu.org/projects/cxx0x.html">C++0x features in GCC</a><br />
<a href="http://blogs.msdn.com/b/vcblog/archive/2010/04/06/c-0x-core-language-features-in-vc10-the-table.aspx">C++0x core language features in Visual Studio 2010</a><br />
<a href="http://en.wikipedia.org/wiki/C%2B%2B0x">C++0x Wikipedia article</a><br />
<a href="http://scottmeyers.blogspot.com/2011/08/c11-feature-availability-spreadsheet.html">C++11 feature availability spreadsheet</a><br />
<a href="http://www.johndcook.com/cpp_TR1_random.html">C++ TR1 random number generation</a><br />
<a href="http://www.johndcook.com/cpp_regex.html">C++ TR1 regular expressions</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/08/17/a-couple-python-like-features-in-c11/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Sage Beginner&#8217;s Guide</title>
		<link>http://www.johndcook.com/blog/2011/07/12/sage-beginners-guide/</link>
		<comments>http://www.johndcook.com/blog/2011/07/12/sage-beginners-guide/#comments</comments>
		<pubDate>Wed, 13 Jul 2011 03:33:28 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Books]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8915</guid>
		<description><![CDATA[I like books. Given a choice, I&#8217;d much rather read a book than online documentation. Typically a book speaks with one voice and has been more carefully edited. Unfortunately, it can be hard to find books on specialized software. That&#8217;s why I was glad to hear there&#8217;s a book on Sage, a project that integrates [...]]]></description>
			<content:encoded><![CDATA[<p>I like books. Given a choice, I&#8217;d much rather read a book than online documentation. Typically a book speaks with one voice and has been more carefully edited. Unfortunately, it can be hard to find books on specialized software. That&#8217;s why I was glad to hear there&#8217;s a book on Sage, a project that integrates <a href="http://www.sagemath.org/links-components.html">many Python libraries</a> for mathematical computing into a single context.</p>
<p>Craig Finch&#8217;s book <a href="http://www.packtpub.com/sage-beginners-guide/book">Sage Beginner&#8217;s Guide</a> provides an easy-to-read overview of Sage. The book is filled with examples. In fact, every topic is introduced by an example. Explanations follow the examples in sections entitled &#8220;What just happened?&#8221;. Follow-up exercises are provided to solidify the material after the example and explanation.</p>
<p>Someone could begin using Sage without knowing Python. They could think of Sage as an open source Mathematica-like application. But one of the strengths of Sage is that its underlying language is Python. And the Sage Beginner&#8217;s Guide has two chapters devoted to the Python language, one basic and one advanced.</p>
<p>Finch&#8217;s book is primarily focused on Sage as a whole, not the Python components it integrates. However, it does point out the component libraries that provide certain functionality when either the constituent library conflicts with Sage or can be used to refine Sage functionality.</p>
<p>Sage can be challenging to install. It is not yet directly supported on Windows; the recommended way to use Sage on Windows is to download a Linux virtual machine with Sage installed. I was able to install Sage on Ubuntu but not yet on OS X. However, you can try out Sage without installing it by using Sage <a href="http://www.sagenb.org/">online notebooks</a>.</p>
<p>I don&#8217;t have as much experience with Sage as with some of its components. As far as I can tell, it&#8217;s easy to take your experience from component libraries &#8212; say NumPy &#8212; and bring it over to Sage. It would be harder to take functionality you discovered while using Sage and use it outside of Sage since, though that is to be expected since part of the value Sage adds is smoothing over the peculiarities of each component library.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/07/12/sage-beginners-guide/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>How to fit an elephant</title>
		<link>http://www.johndcook.com/blog/2011/06/21/how-to-fit-an-elephant/</link>
		<comments>http://www.johndcook.com/blog/2011/06/21/how-to-fit-an-elephant/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 12:00:47 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8741</guid>
		<description><![CDATA[John von Neumann famously said
With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.
By this he meant that one should not be impressed when a complex model fits a data set well. With enough parameters, you can fit any data set.
It turns out you can literally fit [...]]]></description>
			<content:encoded><![CDATA[<p>John von Neumann famously said</p>
<blockquote><p>With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.</p></blockquote>
<p>By this he meant that one should not be impressed when a complex model fits a data set well. With enough parameters, you can fit any data set.</p>
<p>It turns out you can literally fit an elephant with four parameters if you allow the parameters to be complex numbers.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/elephant.png" alt="" width="400" height="302" /></p>
<p>I mentioned von Neumann&#8217;s quote on <a href="http://twitter.com/statfact">StatFact</a> last week and <a href="http://twitter.com/#!/zolnie">Piotr Zolnierczuk</a> replied with reference to a paper explaining how to fit an elephant:</p>
<blockquote><p>&#8220;Drawing an elephant with four complex parameters&#8221; by Jurgen Mayer, Khaled Khairy, and Jonathon Howard,  Am. J. Phys. 78, 648 (2010), DOI:10.1119/1.3254017.</p></blockquote>
<p>Piotr also sent me the following Python code he&#8217;d written to implement the method in the paper. This code produced the image above.</p>
<pre class="brush: plain; title: ; notranslate">
&quot;&quot;&quot;
Author: Piotr A. Zolnierczuk (zolnierczukp at ornl dot gov)

Based on a paper by:
Drawing an elephant with four complex parameters
Jurgen Mayer, Khaled Khairy, and Jonathon Howard,
Am. J. Phys. 78, 648 (2010), DOI:10.1119/1.3254017
&quot;&quot;&quot;
import numpy as np
import pylab

# elephant parameters
p1, p2, p3, p4 = (50 - 30j, 18 +  8j, 12 - 10j, -14 - 60j )
p5 = 40 + 20j # eyepiece

def fourier(t, C):
    f = np.zeros(t.shape)
    A, B = C.real, C.imag
    for k in range(len(C)):
        f = f + A[k]*np.cos(k*t) + B[k]*np.sin(k*t)
    return f

def elephant(t, p1, p2, p3, p4, p5):
    npar = 6
    Cx = np.zeros((npar,), dtype='complex')
    Cy = np.zeros((npar,), dtype='complex')

    Cx[1] = p1.real*1j
    Cx[2] = p2.real*1j
    Cx[3] = p3.real
    Cx[5] = p4.real

    Cy[1] = p4.imag + p1.imag*1j
    Cy[2] = p2.imag*1j
    Cy[3] = p3.imag*1j

    x = np.append(fourier(t,Cx), [-p5.imag])
    y = np.append(fourier(t,Cy), [p5.imag])

    return x,y

x, y = elephant(np.linspace(0,2*np.pi,1000), p1, p2, p3, p4, p5)
pylab.plot(y,-x,'.')
pylab.show()
</pre>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/05/25/crude-models/">Advantages of crude models</a><br />
<a href="http://www.johndcook.com/blog/2011/01/12/occams-razor-bayes-theorem/">Occam&#8217;s razor and Bayes theorem</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/06/21/how-to-fit-an-elephant/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Stand-alone scientific code</title>
		<link>http://www.johndcook.com/blog/2011/06/07/stand-alone-scientific-code/</link>
		<comments>http://www.johndcook.com/blog/2011/06/07/stand-alone-scientific-code/#comments</comments>
		<pubDate>Tue, 07 Jun 2011 12:54:38 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[CSharp]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8670</guid>
		<description><![CDATA[Sometimes you need one or two scientific functions not included in your programming environment. For a number of possible reasons, you do not want to depend on an external library. For example, maybe you don&#8217;t want to take the time to evaluate libraries. Or maybe you want to give someone else a small amount of [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes you need one or two scientific functions not included in your programming environment. For a number of possible reasons, you do not want to depend on an external library. For example, maybe you don&#8217;t want to take the time to evaluate libraries. Or maybe you want to give someone else a small amount of self-contained code. Here is a collection of code for these situations.</p>
<p style="padding-left: 30px;"><a href="http://www.johndcook.com/stand_alone_code.html">Stand-alone code for numerical computing</a></p>
<p>This page contains C++, Python, and C# code for special functions and random number generation with no external dependencies. Do whatever you  	want with it, no strings attached. Use at your own risk. I recently added software for gamma and log gamma functions, as well as a few random number generators. (Why separate functions for the gamma function and its logarithm? See explanation <a href="http://www.johndcook.com/blog/2010/06/07/math-library-functions-that-seem-unnecessary/">here</a>.)</p>
<p>I don&#8217;t recommend using this code as a way to avoid learning a good library. If you&#8217;re writing Python, for example, I&#8217;d recommend using <a href="http://www.scipy.org/">SciPy</a>. But there are times when the advantages of being self-contained outweigh the advantages of using high-quality libraries.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2010/06/07/math-library-functions-that-seem-unnecessary/">Mathematical functions that seem unnecessary</a><br />
<a href="https://twitter.com/#!/SciPyTip">SciPyTip: Daily tips on using scientific computing in Python</a><br />
<a href="http://www.johndcook.com/blog/2010/06/08/c-math-gotchas/">C# math gotchas</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/06/07/stand-alone-scientific-code/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Evaluating weather forecast accuracy: an interview with Eric Floehr</title>
		<link>http://www.johndcook.com/blog/2011/04/12/weather-forecast-accuracy/</link>
		<comments>http://www.johndcook.com/blog/2011/04/12/weather-forecast-accuracy/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 12:00:48 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Entrepreneurship]]></category>
		<category><![CDATA[Interview]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8278</guid>
		<description><![CDATA[Eric Floehr is the owner of ForecastWatch, a company that evaluates the accuracy of weather forecasts. In this interview Eric explains what his business does, how he got started, and some of the technology he uses.

JC: Let&#8217;s talk about your business and how you got started.
EF: I&#8217;m a programmer by trade. I got a computer [...]]]></description>
			<content:encoded><![CDATA[<p>Eric Floehr is the owner of <a href="http://www.forecastwatch.com/accuracydefined/">ForecastWatch</a>, a company that evaluates the accuracy of weather forecasts. In this interview Eric explains what his business does, how he got started, and some of the technology he uses.</p>
<p><span id="more-8278"></span></p>
<p><strong>JC</strong>: Let&#8217;s talk about your business and how you got started.</p>
<p><strong>EF</strong>: I&#8217;m a programmer by trade. I got a computer science degree from Ohio State University and took a number of programming jobs, eventually ending up in management.</p>
<p>I&#8217;ve also always been interested in weather. A couple years ago my Mom showed me my baby book. At five years old it said &#8220;He&#8217;s interested in space, dinosaurs, and the weather.&#8221; I&#8217;m not as interested in dinosaurs now, but still interested in space and the weather.</p>
<p>When I was working as a programmer, and especially when I was a manager, I liked to do little programming projects to learn things. So when I ran across Python I thought about what I could write. I&#8217;d wondered whether there was any difference in the accuracy of various weather services &#8212; AccuWeather, Weather.gov, etc. Did they use different models, or did they all get their data from the National Weather Service and just package it up differently?  So I wrote a little Python web scraper to pull forecasts from various places and compare it with observations. I kept doing that and realized there really were differences between the forecasters.</p>
<p>I didn&#8217;t start out for this to be a business. It just started out to satisfy personal curiosity. It just kept growing every year.  In my last position before going out on my own I was CTO for a company that made a backup appliance. We got to the point where the product was mature and doing well. ForecastWatch was taking more and more of my time because I was getting more business from it, and so I decided to make the switch. That was March 2010. Revenue doubled over the next year and and it looks like this year it will double again.  Things are going well and I really enjoy it.</p>
<p><strong>JC</strong>: So you hadn&#8217;t been doing this that long when we met last year at SciPy in Austin.</p>
<p><strong>EF</strong>: No, I&#8217;d only been doing this full time for a few months. But I&#8217;d been doing this part-time since 2004.</p>
<p>I didn&#8217;t have full-time revenue when I was doing this part-time. But it&#8217;s amazing. Once you have the time to focus on something, the opportunities that you hadn&#8217;t had time to notice before suddenly open up. Just the act of making something your focus almost makes your goal come to fruition. For years you think &#8220;too risky, too risky&#8221; and then once you make that jump, things fall in place.</p>
<p><strong>JC</strong>: So what exactly is the product you sell?</p>
<p><strong>EF</strong>: There are two main components. There&#8217;s an online component that is subscription-based. It provides monthly aggregated statistics on forecasts versus actual observations. It has absolute errors, min and max errors, <a href="http://en.wikipedia.org/wiki/Brier_score">Brier score</a>, all kinds of statistics. It evaluates forecasts for precipitation, high and low temperature, opacity, wind speed and direction, etc. Meteriologist use those statistics to evaluate their forecasts to see how they&#8217;re doing relative to their peers.</p>
<p>The second component is research reports.  Sometimes meteorologists will commission a report to show how well they&#8217;re doing. These reports are based on standard, widely-accepted metrics and time-frames, so they can&#8217;t just cherry-pick criteria they happened to do well on.  But if they see there are statistics in ForecastWatch where they are doing really well, they might want to tell their customers.  I&#8217;ve also created reports for media companies, large internet service providers, energy trading companies and other companies who were evaluating weather forecast providers or want some other data analysis related to weather forecasts.</p>
<p>Something else, and I don&#8217;t know whether this will become a major component, but another area some people are interested in is historical forecasts. I have agreements with some of the weather forecasting companies to sell their forecasts that are no longer forecasts. Some people find this information valuable.  For example, a marketer with a major sports league wanted to know how weather forecasts affected attendance. Another example was an investment manager who was looking to invest in a business whose performance he believed had some correlation with weather forecasts. For example, a ski lodge might want to know how far out people base their decisions on forecasts.</p>
<p>I have this data back to 2004. It&#8217;s funny, but most weather forecasting companies historically have not kept their forecasts. Their bread-and-butter is the forecast in the future. Once that future becomes the past, they saw no value in that data until recently.</p>
<p>Incidentally, because I&#8217;m monitoring weather forecasters&#8217; web sites, I sometimes let them know about errors they were unaware of.</p>
<p><strong>JC</strong>: What volume of data are you dealing with?</p>
<p><strong>EF</strong>: I have about 200,000,000 forecast data points back to 2004. I&#8217;m adding about 130,000 data points a day. My database is something on the order of 70 GB. That&#8217;s observation data, hourly forecasts, metadata, etc. Right now I&#8217;m looking at data from about 850 locations in the US and about 50 in Canada. I&#8217;m looking to expand that both domestically and internationally.</p>
<p><strong>JC</strong>: So what kind of technology are you using?</p>
<p><strong>EF</strong>: I&#8217;m running a LAMP stack: Linux, Apache, MySQL, Python. Originally I was on Red Hat Linux but I&#8217;ve switched to Ubuntu server. I&#8217;m using Django for the web site. Everything is in Python: the scrapers are in Python, the web site is in Python, all the administrative back-end is in Python.</p>
<p>There are two web sites right now: <a href="http://www.forecastwatch.com/accuracydefined/">ForecastWatch.com</a>, which is the subscription, professional site, and a free consumer site <a href="http://forecastadvisor.com/">ForecastAdvisor.com</a>. The consumer site will give you a local forecast and a measure of the accuracy for various forecasters for your weather.</p>
<p><strong>JC</strong>: And who are your customers?</p>
<p><strong>EF</strong>: All the major weather forecast companies. Also some financial companies, logistics and transportation companies, etc. I&#8217;m just starting to expand more into serving companies that depend on meterological forecasts whereas in the past I&#8217;ve focused directly on meterologists.</p>
<p><strong>JC</strong>: Let&#8217;s talk a little more about the entrepreneurial aspect of your business.</p>
<p><strong>EF</strong>: Well, for one thing, I don&#8217;t think I&#8217;d ever have done this if I&#8217;d thought about doing it to make money. There&#8217;s not an enormous market for this service, but in a way that&#8217;s good.  I came from a completely technical background. There&#8217;s not a marketing or sales gene in my body and I&#8217;ve had to learn a lot. ForecastWatch has given me a great opportunity to learn about those non-technical areas of a business that were so foreign to me before.</p>
<p>I got into this entirely for my own use. And I thought that maybe there was already something that did what I wanted, and in the process of trying to find what&#8217;s out there I discovered an unmet need. Even though all the major forecasters said that accuracy was the number one thing they were interested in, they weren&#8217;t effectively measuring their accuracy. I thought that if I&#8217;m interested in this, maybe other people are too.</p>
<p>At first pricing was a mystery to me. Maybe I needed a new laptop, so I&#8217;d charge someone the price of a laptop for some analysis. I had to learn the value of my time and my product.</p>
<p style="text-align: center;">***</p>
<p>Some talks by Eric:</p>
<p><a href="http://blip.tv/file/2518514">PyOhio 2009 talk about ForecastWatch</a><br />
<a href="http://python.mirocommunity.org/video/1805/pyohio-2010-python-and-entrepr">PyOhio 2010 panel on Python and entrepreneurship</a><br />
<a href=" http://www.archive.org/details/Scipy2010-EricFloehr-WeatherForecastAccuracyAnalysis">SciPy 2010 talk</a></p>
<p style="text-align: center;">***</p>
<p>More <a href="http://www.johndcook.com/blog/tag/interview/">interviews</a><br />
More on <a href="http://www.johndcook.com/blog/tag/entrepreneurship/">entrepreneurship</a><br />
More on <a href="http://www.johndcook.com/blog/tag/python/">Python</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/04/12/weather-forecast-accuracy/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>A knight&#8217;s tour magic square</title>
		<link>http://www.johndcook.com/blog/2011/04/06/a-knights-magic-square/</link>
		<comments>http://www.johndcook.com/blog/2011/04/06/a-knights-magic-square/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 12:00:42 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Chess]]></category>
		<category><![CDATA[Magic squares]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8207</guid>
		<description><![CDATA[This magic square was created by Leonhard Euler (1707-1783). Each row and each column sum to 260. Each half-row and half-column sum to 130. The square is also a knight&#8217;s tour: a knight could visit each square on a chessboard exactly once by following the numbers in sequence.

Here is Python code to verify that the [...]]]></description>
			<content:encoded><![CDATA[<p>This magic square was created by Leonhard Euler (1707-1783). Each row and each column sum to 260. Each half-row and half-column sum to 130. The square is also a knight&#8217;s tour: a knight could visit each square on a chessboard exactly once by following the numbers in sequence.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/knight_square.png" alt="" width="311" height="310" /></p>
<p>Here is <a href="http://www.johndcook.com/magic_knight.html">Python code</a> to verify that the square has the properties listed above.</p>
<p><strong>Update</strong>: It seems the attribution to Euler is a persistent error. Euler did publish the first paper on knight&#8217;s tours, but the knight&#8217;s tour square above was published by William Beverley in 1848. Thanks to George Jelliss for the correction. See the comments below.</p>
<p><strong>Update 2</strong>: Notes from George Jelliss on <a href="http://www.mayhematics.com/t/leapers/9k.htm#%284%29">magic king and queen tours</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/04/06/a-knights-magic-square/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Python for high performance computing</title>
		<link>http://www.johndcook.com/blog/2011/03/21/python-hpc/</link>
		<comments>http://www.johndcook.com/blog/2011/03/21/python-hpc/#comments</comments>
		<pubDate>Mon, 21 Mar 2011 13:47:16 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8116</guid>
		<description><![CDATA[William Scullin&#8217;s talk from PyCon 2011: Python for high performance computing.

At least in our shop [Argonne National Laboratory] we have three accepted languages for scientific computing. In this order they are C/C++, Fortran in all its dialects, and Python. You&#8217;ll notice the absolute and total lack of Ruby, Perl, Java.


If you&#8217;re interested in Python and [...]]]></description>
			<content:encoded><![CDATA[<p>William Scullin&#8217;s talk from PyCon 2011: Python for high performance computing.</p>
<blockquote><p>
At least in our shop [Argonne National Laboratory] we have three accepted languages for scientific computing. In this order they are C/C++, Fortran in all its dialects, and Python. You&#8217;ll notice the absolute and total lack of Ruby, Perl, Java.
</p></blockquote>
<p><embed src="http://blip.tv/play/g4VigquDbwI%2Em4v" type="application/x-shockwave-flash" width="480" height="390" allowscriptaccess="always" allowfullscreen="true"></embed></p>
<p>If you&#8217;re interested in Python and HPC, check out <a href="http://twitter.com/scipytip">SciPyTip</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/03/21/python-hpc/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Digital workflow</title>
		<link>http://www.johndcook.com/blog/2011/03/13/digital-workflow/</link>
		<comments>http://www.johndcook.com/blog/2011/03/13/digital-workflow/#comments</comments>
		<pubDate>Sun, 13 Mar 2011 23:30:57 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Productivity]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8030</guid>
		<description><![CDATA[William Turkel has a nice four-part series of blog posts entitled A Workflow for Digital Research Using Off-the-Shelf Tools. His four points are

Start with a backup and versioning strategy.
Make everything digital.
Research 24/7 (using RSS feeds).
Make local copies of everything.

Also by William Turkel, The Programming Historian, &#8220;an open-access introduction to programming in Python, aimed at working [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://williamjturkel.net/">William Turkel</a> has a nice four-part series of blog posts entitled <a href="http://williamjturkel.net/how-to/">A Workflow for Digital Research Using Off-the-Shelf Tools</a>. His four points are</p>
<ol>
<li>Start with a backup and versioning strategy.</li>
<li>Make everything digital.</li>
<li>Research 24/7 (using RSS feeds).</li>
<li>Make local copies of everything.</li>
</ol>
<p>Also by William Turkel, <a href="http://niche-canada.org/programming-historian">The Programming Historian</a>, &#8220;an open-access introduction to programming in Python, aimed at working historians (and other humanists) with little previous experience.&#8221;</p>
<p><strong>Related post</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2009/06/11/create-offline-analyze-online/">Create offline, analyze online</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/03/13/digital-workflow/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Scientific Python on Twitter</title>
		<link>http://www.johndcook.com/blog/2011/02/21/scientific-python-on-twitter/</link>
		<comments>http://www.johndcook.com/blog/2011/02/21/scientific-python-on-twitter/#comments</comments>
		<pubDate>Mon, 21 Feb 2011 20:14:08 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[SciPy]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7874</guid>
		<description><![CDATA[Next week I&#8217;m starting a new daily tip Twitter account: @SciPyTip.
This account will post on things related to scientific computing in Python, including the SciPy library, related software, and scientific computing in general.

Full list of daily tip accounts
]]></description>
			<content:encoded><![CDATA[<p>Next week I&#8217;m starting a new daily tip Twitter account: <a href="http://twitter.com/SciPyTip">@SciPyTip</a>.</p>
<p>This account will post on things related to scientific computing in Python, including the SciPy library, related software, and scientific computing in general.</p>
<p><a href="http://twitter.com/SciPyTip"><img class="alignnone" src="http://www.johndcook.com/sp.png" alt="" width="81" height="77" /></a></p>
<p><a href="http://www.johndcook.com/twitter">Full list of daily tip accounts</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/02/21/scientific-python-on-twitter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Python-based data/science environment from Microsoft</title>
		<link>http://www.johndcook.com/blog/2011/01/26/python-based-datascience-environment-from-microsoft/</link>
		<comments>http://www.johndcook.com/blog/2011/01/26/python-based-datascience-environment-from-microsoft/#comments</comments>
		<pubDate>Thu, 27 Jan 2011 00:08:34 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[IronPython]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7663</guid>
		<description><![CDATA[See Microsoft Research&#8217;s announcement of the the Sho project.
Sho is an interactive environment for data analysis and scientific  computing that lets you seamlessly connect scripts (in IronPython) with  compiled code (in .NET) to enable fast and flexible prototyping. The  environment includes powerful and efficient libraries for linear algebra  as well as [...]]]></description>
			<content:encoded><![CDATA[<p>See Microsoft Research&#8217;s <a href="http://blogs.msdn.com/b/the_blog_of_sho/archive/2011/01/26/introducing-sho.aspx">announcement</a> of the the <a href="http://research.microsoft.com/en-us/projects/sho/">Sho</a> project.</p>
<blockquote><p>Sho is an interactive environment for data analysis and scientific  computing that lets you seamlessly connect scripts (in IronPython) with  compiled code (in .NET) to enable fast and flexible prototyping. The  environment includes powerful and efficient libraries for linear algebra  as well as data visualization that can be used from any .NET language,  as well as a feature-rich interactive shell for rapid development.</p></blockquote>
<p>Maybe this is why Microsoft contracted Enthought this summer to <a href="http://www.johndcook.com/blog/2010/07/01/scipy-and-numpy-for-net/">port NumPy and SciPy to .NET</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/01/26/python-based-datascience-environment-from-microsoft/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Ruby, Python, and Science</title>
		<link>http://www.johndcook.com/blog/2010/11/28/ruby-python-and-science/</link>
		<comments>http://www.johndcook.com/blog/2010/11/28/ruby-python-and-science/#comments</comments>
		<pubDate>Sun, 28 Nov 2010 22:50:34 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7177</guid>
		<description><![CDATA[David Jacobs has written a long blog post Ruby is beautiful (but I&#8217;m moving to Python). Here&#8217;s my summary.
Ruby is much better than Java, but the Ruby community is too focused on web development and the language has no scientific library. Python has a lot of the same advantages as Ruby, is used for more [...]]]></description>
			<content:encoded><![CDATA[<p>David Jacobs has written a long blog post <a href="http://allthingsprogress.com/posts/ruby-is-beautiful-but-im-moving-to-python">Ruby is beautiful (but I&#8217;m moving to Python)</a>. Here&#8217;s my summary.</p>
<blockquote><p>Ruby is much better than Java, but the Ruby community is too focused on web development and the language has no scientific library. Python has a lot of the same advantages as Ruby, is used for more than web programming, and has <a href="http://www.scipy.org/">SciPy</a>.</p></blockquote>
<p><strong>Update</strong>: There is now a fledgling <a href="https://github.com/SciRuby/sciruby">SciRuby project</a>.</p>
<p><strong>Further reading</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2009/05/08/plain-python/">Plain Python</a><br />
<a href="http://www.codeproject.com/KB/library/scipy.aspx">Getting started with SciPy</a><br />
<a href="http://www.johndcook.com/blog/2010/07/09/replacing-mathematica-with-python/">Replacing Mathematica with Python</a><br />
<a href="http://www.johndcook.com/blog/2010/07/01/scipy-and-numpy-for-net/">SciPy and NumPy for .NET</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/11/28/ruby-python-and-science/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Best rational approximation</title>
		<link>http://www.johndcook.com/blog/2010/10/20/best-rational-approximation/</link>
		<comments>http://www.johndcook.com/blog/2010/10/20/best-rational-approximation/#comments</comments>
		<pubDate>Wed, 20 Oct 2010 19:25:06 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Number theory]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=6840</guid>
		<description><![CDATA[Suppose you have a number x between 0 and 1. You want to find a rational approximation for x, but you only want to consider fractions with denominators below a given limit.
For example, suppose x = 1/e = 0.367879…  Rational approximations with powers of 10 in the denominator are trivial to find: 3/10, 36/100, 367/1000, [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose you have a number <em>x</em> between 0 and 1. You want to find a rational approximation for x, but you only want to consider fractions with denominators below a given limit.</p>
<p>For example, suppose <em>x</em> = 1/<em>e</em> = 0.367879…  Rational approximations with powers of 10 in the denominator are trivial to find: 3/10, 36/100, 367/1000, etc. But say you&#8217;re willing to have a denominator as large as 10. Could you do better than 3/10? Yes, 3/8 = 0.375 is a better approximation. What about denominators no larger than 100? Then 32/87 = 0.36781… is the best choice, much better than 36/100.</p>
<p>How do you find the best approximations? You could do a brute force search. For example, if the maximum denominator size is <em>N</em>, you could try all fractions with denominators less than or equal to <em>N</em>. But there&#8217;s a much more efficient algorithm. The algorithm is related to the Farey sequence named after John Farey, though I don&#8217;t know whether he invented the algorithm.</p>
<p>The idea is to start with two fractions, <em>a</em>/<em>b</em> = 0/1 and <em>c</em>/<em>d</em> = 1/1. We update either <em>a</em>/<em>b</em> or <em>c</em>/<em>d</em> at each step so that <em>a</em>/<em>b</em> will be the best lower bound of <em>x</em> with denominator no bigger than <em>b</em>, and <em>c</em>/<em>d</em> will be the best upper bound with denominator no bigger than <em>d</em>. At each step we do a sort of binary search by introducing the <em>mediant</em> of the upper and lower bounds. The mediant of <em>a</em>/<em>b</em> and <em>c</em>/<em>d</em> is the fraction (<em>a</em>+<em>c</em>)/(<em>b</em>+<em>d</em>) which always lies between <em>a</em>/<em>b</em> and <em>c</em>/<em>d</em>.</p>
<p>Here is an implementation of the algorithm in Python. The code takes a number x between 0 and 1 and a maximum denominator size <em>N</em>. It returns the numerator and denominator of the best rational approximation to <em>x</em> using denominators no larger than <em>N</em>.</p>
<pre class="brush: python; title: ; notranslate">
def farey(x, N):
    a, b = 0, 1
    c, d = 1, 1
    while (b &lt;= N and d &lt;= N):
        mediant = float(a+c)/(b+d)
        if x == mediant:
            if b + d &lt;= N:
                return a+c, b+d
            elif d &gt; b:
                return c, d
            else:
                return a, b
        elif x &gt; mediant:
            a, b = a+c, b+d
        else:
            c, d = a+c, b+d

    if (b &gt; N):
        return c, d
    else:
        return a, b
</pre>
<p>In Python 3.0, the <code>float</code> statement could be removed since the division operator does floating point division of integers.</p>
<p>Read more about rational approximation in <a href="http://www.johndcook.com/blog/2009/05/19/golden-ratio-rational-approximation/">Breastfeeding, the golden ratio, and rational approximation</a>.</p>
<p>Here&#8217;s an example of a situation in which you might need rational approximations. Suppose you&#8217;re designing an experiment which will randomize subjects between two treatments A and B. You want to randomize in blocks of size no larger than <em>N</em> and you want the probability of assigning treatment A to be <em>p</em>. You could find the best rational approximation <em>a</em>/<em>b</em> to <em>p</em> with denominator <em>b</em> no larger than <em>N</em> and use the denominator as the block size. Each block would be a permutation of <em>a</em> A&#8217;s and <em>b</em>-<em>a</em> B&#8217;s.</p>
<p><strong>Update 1</strong>: Here is a <a href="../../rational_approximation.html">form</a> that implements the algorithm above.</p>
<p><strong>Update 2</strong>: Eugene Wallingford wrote a <a href="http://www.cs.uni.edu/%7Ewallingf/blog/archives/monthly/2010-10.html#e2010-10-25T16_50_29.htm">blog post</a> about implementing the algorithm in Klein, a very restricted language used for teaching compiler writing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/10/20/best-rational-approximation/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Bug in SciPy&#8217;s erf function</title>
		<link>http://www.johndcook.com/blog/2010/09/02/bug-in-scipys-erf-function/</link>
		<comments>http://www.johndcook.com/blog/2010/09/02/bug-in-scipys-erf-function/#comments</comments>
		<pubDate>Fri, 03 Sep 2010 00:19:56 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[SciPy]]></category>
		<category><![CDATA[Special functions]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=6328</guid>
		<description><![CDATA[Last night I produced the plot below and was very surprised at the jagged spike. I knew the curve should be smooth and strictly increasing.

My first thought was that there must be a numerical accuracy problem in my code, but it turns out there&#8217;s a bug in SciPy version 0.8.0b1. I started to report it, [...]]]></description>
			<content:encoded><![CDATA[<p>Last night I produced the plot below and was very surprised at the jagged spike. I knew the curve should be smooth and strictly increasing.</p>
<p><img src="http://www.johndcook.com/erfc_jagged.png" alt="" width="400" height="302" /></p>
<p>My first thought was that there must be a numerical accuracy problem in my code, but it turns out there&#8217;s a bug in SciPy version 0.8.0b1. I started to report it, but I saw there were similar bug reports and one such report was marked as closed, so presumably the fix will appear in the next release.</p>
<p>The problem is that SciPy&#8217;s <code>erf</code> function is inaccurate for arguments with imaginary part near 5.8.  For example, Mathematica computes erf(1.0 + 5.7i) as  -4.5717×10<sup>12</sup> + 1.04767×10<sup>12</sup> i. SciPy computes the same value as -4.4370×10<sup>12</sup> + 1.3652×10<sup>12</sup> i. The imaginary component is off by about 30%.</p>
<p>Here is the code that produced the plot.</p>
<pre class="brush: python; title: ; notranslate">
from scipy.special import erf
from numpy import linspace, exp
import matplotlib.pyplot as plt

def g(y):
    z = (1 + 1j*y) /  sqrt(2)
    temp = exp(z*z)*(1 - erf(z))
    u, v = temp.real, temp.imag
    return -v / u

x = linspace(0, 10, 101)
plt.plot(x, g(x))
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/09/02/bug-in-scipys-erf-function/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What does this code do?</title>
		<link>http://www.johndcook.com/blog/2010/07/21/what-does-this-code-do/</link>
		<comments>http://www.johndcook.com/blog/2010/07/21/what-does-this-code-do/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 12:37:29 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=5745</guid>
		<description><![CDATA[At the SciPy 2010 conference, a speaker showed several short code samples and asked us what each sample did. The samples were clearly written, but we had no comments to provide context. This was the last sample.

def what( x, n ):
    if n &#60; 0:
        [...]]]></description>
			<content:encoded><![CDATA[<p>At the SciPy 2010 conference, a speaker showed several short code samples and asked us what each sample did. The samples were clearly written, but we had no comments to provide context. This was the last sample.</p>
<pre class="brush: python; title: ; notranslate">
def what( x, n ):
    if n &lt; 0:
        n = -n
        x = 1.0 / x
    z = 1.0
    while n &gt; 0:
        if n % 2 == 1:
            z *= x
        x *= x
        n /= 2
    return z
</pre>
<p>The quiz was at the end of the day and I was tired. I couldn&#8217;t tell what the code does. Then I found out to my chagrin that the sample above implements an algorithm I know well. I&#8217;ve written the same code and I&#8217;ve even blogged about here.</p>
<p>This exercise changed my opinion of &#8220;self-documenting&#8221; code. Without some contextual clue, it is hard to understand the purpose of even a small piece of code. </p>
<p>Meaningful variable and function names would have helped, but a tiny comment might have helped even more. Not some redundant comment like explaining that the line <code>x = 1.0 / x </code> takes a reciprocal, but a comment explaining the problem the code is trying to solve.</p>
<p>For another example, what do you think this code does?</p>
<pre class="brush: csharp; title: ; notranslate">
uint what()
{
    m_z = 36969 * (m_z &amp; 65535) + (m_z &gt;&gt; 16);
    m_w = 18000 * (m_w &amp; 65535) + (m_w &gt;&gt; 16);
    return (m_z &lt;&lt; 16) + (m_w &amp; 65535);
}
</pre>
<p>It&#8217;s clear enough what the code does at a low level &mdash; it&#8217;s just a few operations &mdash; but it&#8217;s not at all clear what it&#8217;s <em>for</em>.</p>
<p>Try to figure out what the code samples do before reading further. But if you give up, the first example is described <a href="http://www.johndcook.com/blog/2008/12/10/fast-exponentiation/">here</a> and the second example comes from <a href="http://www.codeproject.com/KB/recipes/SimpleRNG.aspx">here</a>.</p>
<p>In an ordinary face-to-face conversation, more information is conveyed non-verbally than verbally. We may think that our literal words are most important, but so much is conveyed by voice inflection, facial expression, posture, etc. Something similar is going on with source code. When we read a piece of source code, we typically bring a huge amount of implicit knowledge with us. </p>
<p>Suppose a coworker Sam asks you to look at his code. The fact that the question came up at work provides a large amount of context; this isn&#8217;t just a random code fragment on the web. More specifically, you know what kinds of projects Sam works on. You know why Sam wants you to look at the code. He may be showing you something he&#8217;s proud of or he may be asking for help finding a bug. You know a lot about his code before you see it.</p>
<p>Now suppose you&#8217;re a contractor. Sam was <a href="http://www.johndcook.com/blog/2008/09/28/programmer-hit-by-a-bus/">hit by a bus</a> and you&#8217;ve been asked to work on his projects until he gets out of the hospital. You may complain to his office mate that Sam&#8217;s code is an awful mess, but she can&#8217;t understand what you&#8217;re talking about. She thinks his code is perfectly clear.</p>
<p>Now suppose you&#8217;re a contractor on the opposite side of the world from Sam. You have even less context than if you were in his office talking to his office mate. After a great deal of agony, you send your contribution back to Sam&#8217;s company. You comment your code beautifully, but Sam&#8217;s colleagues complain that your code is poorly written and that you didn&#8217;t solve the right problem. </p>
<p>Institutional memory is more valuable than source code comments. It costs a great deal to replace a programmer, even one who leaves behind well-commented code. </p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2009/04/22/being-indispensable/">Do you really want to be indispensable?</a><br />
<a href="http://reproducibleresearch.org/blog/2009/03/17/preserving-documents/">Preserving (the memory of) documents</a><br />
<a href="http://www.johndcook.com/blog/2009/03/19/the-buck-stops-with-the-programmer/">The buck stops with the programmer</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/07/21/what-does-this-code-do/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>How many errors are left to find?</title>
		<link>http://www.johndcook.com/blog/2010/07/13/lincoln-index/</link>
		<comments>http://www.johndcook.com/blog/2010/07/13/lincoln-index/#comments</comments>
		<pubDate>Tue, 13 Jul 2010 11:59:19 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Probability and Statistics]]></category>
		<category><![CDATA[Quality]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=5857</guid>
		<description><![CDATA[There&#8217;s a simple statistic called the Lincoln Index that lets you estimate the total number of errors based on the number of errors found. I&#8217;ll explain what the Lincoln Index is, why it works, give some code for playing with it, and discuss how it applies to software testing.
What is the Lincoln Index?
Suppose you have [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a simple statistic called the Lincoln Index that lets you estimate the total number of errors based on the number of errors found. I&#8217;ll explain what the Lincoln Index is, why it works, give some code for playing with it, and discuss how it applies to software testing.</p>
<p><strong>What is the Lincoln Index?</strong></p>
<p>Suppose you have a tester who finds 20 bugs in your program. You want to estimate how many bugs are really in the program. You know there are at least 20 bugs, and if you have supreme confidence in your tester, you may suppose there are around 20 bugs. But maybe your tester isn&#8217;t very good. Maybe there are hundreds of bugs. How can you have any idea how many bugs there are? There&#8217;s no way to know with one tester. But if you have two testers, you can get a good idea, even if you don&#8217;t know how skilled the testers are.</p>
<p>Suppose two testers independently search for bugs. Let E<sub>1</sub> be the number of errors the first tester finds and E<sub>2</sub> the number of errors the second tester finds. Let S be the number of errors both testers find. The Lincoln Index estimates the total number of errors as E<sub>1</sub> E<sub>2</sub>/S. You can find historical background on the Lincoln Index <a href="http://bit-player.org/2010/the-thrill-of-the-chase">here</a>.</p>
<p><strong>How does the index work? </strong></p>
<p>Suppose there are n bugs and the two testers find bugs with probability p<sub>1</sub> and p<sub>2</sub> respectively. You&#8217;d expect the two testers to find around np<sub>1</sub> and np<sub>2</sub> bugs. If you assume the probabilities of each tester finding a bug are independent, you&#8217;d expect the testers to find around np<sub>1</sub> p<sub>2</sub> bugs in common. That says E<sub>1</sub>*E2/S would be around</p>
<p style="padding-left: 30px;">(n<sup>2</sup> p<sub>1</sub> p<sub>2</sub>) / (n p<sub>1</sub> p<sub>2</sub>) = n.</p>
<p>The probabilities of each tester finding a bug cancel out leaving only n, the total number of bugs.</p>
<p><strong>Simulation code</strong></p>
<p>Here&#8217;s some Python code for simulating estimates using the Lincoln Index.</p>
<pre class="brush: python; title: ; notranslate">

from random import random

def find_error(p):
    &quot;Find an error with probability p&quot;
    if random() &lt; p:
        return 1
    return 0

def simulate(true_error_count, p1, p2, reps=10000):
    &quot;&quot;&quot;Simulate Lincoln's method for estimating errors
    given the true number of errors, each person's probability
    of finding an error, and the number of simulations to run.&quot;&quot;&quot;
    estimation_error_sum = 0
    for rep in xrange(reps):
        caught1 = 0
        caught2 = 0
        caught_both = 0
        for error in xrange(true_error_count):
            found1 = find_error(p1)
            found2 = find_error(p2)
            caught1 += found1
            caught2 += found2
            caught_both += found1*found2
        estimate = caught1*caught2 / float(caught_both)
        estimation_error_sum += abs(estimate - true_error_count)
    return estimation_error_sum / float(reps)
</pre>
<p>I used this to simulate the case of two testers, one with a 30% chance of finding a bug and the other with a 40% chance, and a total of 100 bugs. I simulated the Lincoln Index 1,000 times, keeping track of the absolute error in the estimates. The code to do this was <code>simulate(100, 0.30, 0.40, 1000)</code>. On average, the Lincoln index over- or under-estimated the number of bugs by about 16. This is a good estimate considering each tester greatly under-estimated the number of bugs.</p>
<p>If you didn&#8217;t think about using something like the Lincoln Index, in the previous example one tester would find around 30 bugs and the other around 40. The two lists might have 10 bugs in common, so you&#8217;d estimate the total number at 60, far short of 100. But the Lincoln index would often find estimates between 84 and 116.</p>
<p>Note that it is possible that the testers won&#8217;t find any of the same bugs. In that case the Lincoln Index cannot be computed and the code will divide by zero. But this is unlikely unless the p&#8217;s are small and n is small.</p>
<p><strong>Software testing</strong></p>
<p>Does the Lincoln Index actually provide a good bug count estimate? That depends on how well the assumptions are met. The index assumes all bugs are equally hard for a given tester to find. It does not assume that both testers are equally skilled, but it does assume that their chances of finding a bug are independent. In other words, tester A is no more or less likely to find a bug just because tester B found it.</p>
<p>The most questionable assumption is that all bugs are equally hard to find. That&#8217;s usually not true. But it may be true that all bugs <em>of a certain kind</em> are equally hard to find. For example, spelling errors may be easier to find than validation oversights, but the Lincoln Index might be good for estimating separately how many spelling errors or validation errors there are.</p>
<p>The index might provide a rough rule of thumb even if the assumptions it that go into it are violated. For example, suppose one tester found 15 bugs and another found 20. But only 3 of the bugs were the same. A naive estimate would say since there are 32 unique bugs found, there must be around that many in total. But the Lincoln Index would estimate 100 bugs. Maybe the Lincoln estimate is not at all accurate, but it does tell you to be worried that there may be a lot more bugs to find since the overlap between the two bug lists was so small.</p>
<p><strong>Related post</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/">Estimating the chances of something that hasn&#8217;t happened yet</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/07/13/lincoln-index/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Replacing Mathematica with Python</title>
		<link>http://www.johndcook.com/blog/2010/07/09/replacing-mathematica-with-python/</link>
		<comments>http://www.johndcook.com/blog/2010/07/09/replacing-mathematica-with-python/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 14:07:13 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Mathematica]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=5827</guid>
		<description><![CDATA[Everything I do regularly in Mathematica can be done in Python. Even though Mathematica has a mind-boggling amount of functionality, I only use a tiny proportion of it. I skimmed through some of my Mathematica files to see what functions I use and then looked for Python counterparts. I found I use less of Mathematica [...]]]></description>
			<content:encoded><![CDATA[<p>Everything I do regularly in Mathematica can be done in Python. Even though Mathematica has a mind-boggling amount of functionality, I only use a tiny proportion of it. I skimmed through some of my Mathematica files to see what functions I use and then looked for Python counterparts. I found I use less of Mathematica than I imagined.</p>
<p>The core mathematical functions I need are in <a href="http://www.scipy.org/">SciPy</a>. The plotting features are in <a href="http://matplotlib.sourceforge.net/">matplotlib</a>. The <a href="http://code.google.com/p/sympy/">SymPy</a> library appears to have the symbolic functionality I need, though I&#8217;m as not sure about this one.</p>
<p>I don&#8217;t have much experience with the Python libraries listed above. I haven&#8217;t used SymPy at all; I&#8217;ve only browsed its web site. Maybe I&#8217;ll find I&#8217;d rather work in Mathematica, particularly when I&#8217;m just trying out ideas. But I want to experiment with using Python for more tasks.</p>
<p>As I&#8217;ve blogged about before, I&#8217;d like to consolidate my tools. I started using <a href="http://www.johndcook.com/blog/2010/04/01/giving-emacs-another-try/">Emacs</a> again because I was frustrated with using a different editor for every kind of file. One of the things I find promising about Python is that I may be able to do more in Python and reduce the number of programming languages I use regularly.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2008/07/01/languages-that-are-easy-to-pick-back-up/">Languages that are easy to pick back up</a><br />
<a href="http://www.johndcook.com/blog/2010/06/30/where-the-unix-philosophy-breaks-down/">Where the Unix philosophy breaks down</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/07/09/replacing-mathematica-with-python/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>SciPy and NumPy for .NET</title>
		<link>http://www.johndcook.com/blog/2010/07/01/scipy-and-numpy-for-net/</link>
		<comments>http://www.johndcook.com/blog/2010/07/01/scipy-and-numpy-for-net/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 16:00:44 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[IronPython]]></category>
		<category><![CDATA[SciPy]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=5759</guid>
		<description><![CDATA[Travis Oliphant announced this morning at the SciPy 2010 conference that Microsoft is partnering with Enthought to produce a version of NumPy and SciPy for .NET. NumPy and SciPy are Python libraries for scientific computing. Oliphant is the president of Enthought and the original developer of NumPy.
It is possible to call NumPy and SciPy from [...]]]></description>
			<content:encoded><![CDATA[<p>Travis Oliphant announced this morning at the <a href="http://conference.scipy.org/scipy2010/index.html">SciPy 2010 conference</a> that Microsoft is partnering with <a href="http://www.enthought.com/">Enthought</a> to produce a version of NumPy and SciPy for .NET. NumPy and SciPy are Python libraries for scientific computing. Oliphant is the president of Enthought and the original developer of NumPy.</p>
<p>It is possible to call NumPy and SciPy from IronPython now by using <a href="http://www.johndcook.com/blog/2009/03/19/ironclad-ironpytho/">IronClad</a>. However, going through IronClad can be inefficient.  The new libraries will enable efficient access to NumPy and SciPy from .NET languages and in particular from IronPython.</p>
<p>Here is the official <a href="http://www.enthought.com/media/SciPyNumPyDotNet.pdf">press release</a> from Enthought.</p>
<p><img class="alignnone" src="http://www.johndcook.com/announcement.jpg" alt="" width="234" height="189" /></p>
<p>Photo credit: <a href="http://www.flickr.com/photos/pivanov/4752318542/">Paul Ivanov</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/07/01/scipy-and-numpy-for-net/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Porting Python to C#</title>
		<link>http://www.johndcook.com/blog/2010/06/18/porting-python-to-c/</link>
		<comments>http://www.johndcook.com/blog/2010/06/18/porting-python-to-c/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 15:59:29 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[CSharp]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=5663</guid>
		<description><![CDATA[When people start programming in Python, they often mention having to type less: no braces, no semicolons, fewer type declarations etc.
The difference may be more obvious when you go in the other direction, moving from Python to another language. This morning I ported some Python code to C# and was a little surprised how much [...]]]></description>
			<content:encoded><![CDATA[<p>When people start programming in Python, they often mention having to type less: no braces, no semicolons, fewer type declarations etc.</p>
<p>The difference may be more obvious when you go in the other direction, moving from Python to another language. This morning I ported some <a href="http://www.codeproject.com/KB/recipes/ParameterPercentile.aspx">Python code</a> to C# and was a little surprised how much extra code I had to add. When I&#8217;ve ported C# to Python I wasn&#8217;t as aware of the language differences. I guess it is easier to go down a notch in ceremony than to go up a notch.</p>
<p><strong>Related post</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2009/05/08/plain-python/">Plain Python</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/06/18/porting-python-to-c/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Dynamic typing and anti-lock brakes</title>
		<link>http://www.johndcook.com/blog/2010/06/09/dynamic-typing-and-risk-homeostasis/</link>
		<comments>http://www.johndcook.com/blog/2010/06/09/dynamic-typing-and-risk-homeostasis/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 07:51:06 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Quality]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4842</guid>
		<description><![CDATA[When we make one part of our lives safer, we tend to take more chances somewhere else. Psychologists call this tendency risk homeostasis.
One of the studies often cited to support the theory of risk homeostasis involved German cab drivers. Drivers in the experimental group were given cabs with anti-lock brakes while drivers in the control [...]]]></description>
			<content:encoded><![CDATA[<p>When we make one part of our lives safer, we tend to take more chances somewhere else. Psychologists call this tendency <a href="http://en.wikipedia.org/wiki/Risk_homeostasis">risk homeostasis</a>.</p>
<p>One of the studies often cited to support the theory of risk homeostasis involved German cab drivers. Drivers in the experimental group were given cabs with anti-lock brakes while drivers in the control group were given cabs with conventional brakes. There was no difference in the rate of crashes between the two groups. The drivers who had better brakes drove less carefully.</p>
<p>Risk homeostasis may explain why dynamic programming languages such as Python aren&#8217;t as dangerous as critics suppose.</p>
<p>Advocates of statically typed programming languages argue that it is safer to have static type checking than to not have it. Would you rather the computer to catch some of your errors or not? I&#8217;d rather it catch some of my errors, thank you. But this argument assumes two things:</p>
<ol>
<li>static type checking comes at no cost, and</li>
<li>static type checking has no impact on programmer behavior.</li>
</ol>
<p>Advocates of dynamic programming languages have mostly focused on the first assumption. They argue that static typing requires so much extra programming effort that it is not worth the cost. I&#8217;d like to focus on the second assumption. Maybe the presence or absence of static typing changes programmer behavior.</p>
<p>Maybe a lack of static type checking scares dynamic language programmers into writing unit tests. Or to turn things around, perhaps static type checking lulls programmers into thinking they do not need unit tests. Maybe static type checking is like anti-lock brakes.</p>
<p>Nearly everyone would agree that static type checking does not eliminate the need for unit testing. Someone accustomed to working in a statically typed language might say &#8220;I know the compiler isn&#8217;t going to catch all my errors, but I&#8217;m glad that it catches some of them.&#8221; Static typing might not eliminate the <em>need</em> for unit testing, but it may diminish the <em>motivation</em> for unit testing. The lack of compile-time checking in dynamic languages may inspire developers to write more unit tests.</p>
<p>See Bruce Eckel&#8217;s article <a href="http://docs.google.com/View?id=dcsvntt2_25wpjvbbhk">Strong Typing vs. Strong Testing</a> for more discussion of the static typing and unit testing.</p>
<p><strong>Update</strong>: I&#8217;m not knocking statically typed languages. I spend most of my coding time in such languages and I&#8217;m not advocating that we get rid of static typing in order to scare people into unit testing.</p>
<p>I wanted to address the question of what programmers <em>do</em>, not what they <em>should</em> do. In that sense, this post is more about psychology than software engineering. (Though I believe a large part of software engineering is in fact psychology as I&#8217;ve argued <a href="http://stackoverflow.com/questions/521810/theories-of-software-engineering/521826#521826">here</a>.) Do programmers who work in dynamic languages write more tests? If so, does risk homeostasis help explain why?</p>
<p>Finally, I appreciate the value of unit testing. I&#8217;ve spent most of the last couple days writing unit tests. But there is a limit to the kinds of bugs that unit tests can catch. Unit tests are good at catching errors in code that has been written, but most errors come from code that should have been written. See <a href="http://www.johndcook.com/blog/2010/01/12/software-sins-of-omission/">Software sins of omission</a>.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2009/05/11/all-languages-equally-complex/">All languages equally complex</a><br />
<a href="http://www.johndcook.com/blog/2009/08/11/reasoning-about-code/">Reasoning about code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/06/09/dynamic-typing-and-risk-homeostasis/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Pure functions have side-effects</title>
		<link>http://www.johndcook.com/blog/2010/05/18/pure-functions-have-side-effects/</link>
		<comments>http://www.johndcook.com/blog/2010/05/18/pure-functions-have-side-effects/#comments</comments>
		<pubDate>Tue, 18 May 2010 14:11:19 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=5381</guid>
		<description><![CDATA[Functional programming emphasizes &#8220;pure&#8221; functions, functions that have no side effects. When you call a pure function, all you need to know is the return value of the function. You can be confident that calling a function doesn&#8217;t leave any state changes that will effect future function calls.
But pure functions are only pure at a [...]]]></description>
			<content:encoded><![CDATA[<p>Functional programming emphasizes &#8220;pure&#8221; functions, functions that have no side effects. When you call a pure function, all you need to know is the return value of the function. You can be confident that calling a function doesn&#8217;t leave any state changes that will effect future function calls.</p>
<p>But pure functions are only pure <em>at a certain level of abstraction</em>. Every function has some side effect: it uses memory, it takes CPU time, etc. Harald Armin Massa makes this point in his PyCon 2010 talk &#8220;The real harm of functional programming.&#8221; (His talk is about eight minutes into the February 21, 2010 afternoon lightning talks:  <a href="http://pycon.blip.tv/file/3332814/">video</a>, <a href="http://blip.tv/file/get/Pycon-PyCon2010SundayAfternoonLightningTalks430.mp3">audio</a>.)</p>
<blockquote><p>Even pure functions in programming have side effects. They use memory. They use CPU. They take runtime. And if you look at those evil languages, they are quite fast at doing Fibonacci or something, but in bigger applications you get reports &#8220;Hmm, I have some runtime problems. I don&#8217;t know how to get it faster or what it going wrong.</p></blockquote>
<p>Massa argues that the concept of an action without side effects is dangerous because it disassociates us from the real world. I disagree. I appreciate his warning that the &#8220;no side effect&#8221; abstraction may leak like any other abstraction. But pure functions are a useful abstraction.</p>
<p>You can&#8217;t avoid state, but you can partition the stateful and stateless parts of your code. 100% functional purity is impossible, but <a href="http://www.johndcook.com/blog/2010/04/15/85-functional-language-purity/">85% functional purity</a> may be very productive.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2009/03/26/mit-replaces-scheme-with-python/">Using Python as a functional language</a><br />
<a href="http://www.johndcook.com/blog/2009/07/13/f-sharp/">F# may succeed where others have failed</a><br />
<a href="http://www.johndcook.com/blog/2009/03/23/functional-in-the-small-oo-in-the-large/">Functional in the small, OO in the large</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2010/05/18/pure-functions-have-side-effects/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
<enclosure url="http://blip.tv/file/get/Pycon-PyCon2010SundayAfternoonLightningTalks430.mp3" length="32802584" type="audio/mpeg" />
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.263 seconds -->

