Using SciPy with IronPython

Three years ago I wrote a post about my disappointment using SciPy with IronPython. A lot has changed since then, so I thought I’d write a short follow-up post.

To install NumPy and SciPy for use with IronPython, follow the instructions here. [Update: no longer available.] After installation, NumPy works as expected.

There is one small gotcha with SciPy. To use SciPy with IronPython, start ipy with the command line argument -X:Frames. Then you can use SciPy as you would from CPython. For example.

c:> ipy -X:Frames
>>> import scipy as sp
>>> sp.pi
3.141592653589793

Without the -X:Frames option you’ll get an error when you try to import scipy.

AttributeError: 'module' object has no attribute '_getframe'

According to this page,

The issue is that SciPy makes use of the CPython API for inspecting the current stack frame which IronPython doesn’t enable by default because of a small runtime performance hit. You can turn on this functionality by passing the command line argument “-X:Frames” to on the command line.

* * *

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

Python-based data/science environment from Microsoft

See Microsoft Research’s announcement of the the Sho project.

Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.

Maybe this is why Microsoft contracted Enthought this summer to port NumPy and SciPy to .NET.

SciPy and NumPy for .NET

Travis Oliphant announced this morning at the SciPy 2010 conference that Microsoft is partnering with Enthought to produce a version of NumPy and SciPy for .NET. NumPy and SciPy are Python libraries for scientific computing. Oliphant is the president of Enthought and the original developer of NumPy.

It is possible to call NumPy and SciPy from IronPython now by using IronClad. However, going through IronClad can be inefficient.  The new libraries will enable efficient access to NumPy and SciPy from .NET languages and in particular from IronPython.

Here is the official press release from Enthought. [Update: press release no longer available.]

 

Numerical computing in IronPython with Ironclad

In a previous post, I discuss my difficulties calling some Python modules from IronPython. In particular I wanted to call SciPy from IronPython and couldn’t. The discussion following that post brought up Ironclad as a possible solution. I wanted to learn more about Ironclad, and so I invited William Reade to write a guest post about the project. I want to thank William for responding to my request with a very helpful article. — John


Hi! My name’s William Reade, and I’ve spent the last year or so working on Ironclad, an open-source project which helps IronPython to inter-operate better with CPython. Michael Foord recently introduced  me to our host John, who kindly offered me the opportunity to write a bit about  my work and, er, how well it works. So, here I am.

To give you a little bit of context, I’ve been working at Resolver Systems for several years now; our main product, Resolver  One, is a spreadsheet with very tight IronPython integration. We like to describe  it as a “Pythonic spreadsheet”, and that’s clearly a concept that people like.  However, when people think of a “Pythonic spreadsheet”, they apparently expect it  to work with popular Python libraries — such as NumPy and SciPy — and we found that IronPython’s incompatibility put us at a serious disadvantage. And, for some reason, nobody seemed very keen to  solve the problem for us, so we had to do it ourselves.

The purpose of Ironclad is to allow you to use Python C extensions (of which there are many) from inside IronPython without recompiling anything. The secret purpose  has always been to get NumPy working in Resolver One, and in release 1.4 we finally  achieved this goal. Although the integration is still alpha level, you can import  and use NumPy inside the spreadsheet grid and user code: you can see a screencast  about the integration here.

However, while Resolver One is a great tool, you aren’t required to use it to get the benefits: Ironclad has been developed completely separately, has no external  dependencies, and is available under an open source license. If you consider  yourself adequately teased, keep reading for a discussion of what Ironclad actually  does, what it enables you to do, and where it’s headed.

As you may know, Python is written in C and IronPython is written in C#. While IronPython is an excellent implementation of Python, it works very differently  under the hood, and it certainly doesn’t have anything resembling Python’s API for  writing C extensions. However, Ironclad can work around this problem by loading a  stub DLL into an IronPython process which impersonates the real python25.dll, and  hence allows us to us intercept the CPython API calls. We can then ensure that the  appropriate things happen in response to those calls … except that we use  IronPython objects instead of CPython ones.

So long as we wrap IronPython objects for consumption by CPython, and vice versa, the two systems can coexist and inter-operate quite happily. Of course, the mix of  deterministic and non-deterministic garbage collection makes it a little tricky [1] to  ensure that unreferenced objects — and only unreferenced objects — die in a  timely manner, and there are a number of other dark corners, but I’ve done enough  work to confidently state that the problem is “just” complex and fiddly. While it’s  not the sort of project that will ever be finished, it hopefully is the sort that  can be useful without being perfect.

The upshot of my recent work is that you can now download Ironclad, type ‘import ironclad; import scipy‘ in an IronPython console, and it will Just Work [2]. I am  programmer, hear me roar!

Hundreds of tests now pass in both NumPy and SciPy, and I hope that some of you will be inspired to test it against your own requirements. For example, the Gaussian error function has been mentioned a few times on this blog (and, crucially, I have a  vague idea of what it actually is), and I can demonstrate that scipy.special.erf works perfectly under Ironclad:

C:devironclad-headbuild>ipy
IronPython 2.0 (2.0.0.0) on .NET 2.0.50727.3053
Type "help", "copyright", "credits" or "license" for more information.
>>> import ironclad
>>> from scipy.special import erf
Detected scipy import
  faking out numpy._import_tools.PackageLoader
Detected numpy import
  faking out modules: mmap, nosetester, parser
>>> erf(0)
0.0
>>> erf(0.1)
0.1124629160182849
>>> erf(1)
0.84270079294971478
>>> erf(10)
1.0

Numerical integration also seems to work pretty well, even for tricky cases (note that the quad function returns a tuple of (result, error)):

>>> from scipy.integrate import quad
>>> quad(erf, 0, 1)
(0.48606495811225592, 5.3964050795968879e-015)
>>> quad(erf, -1, 1)
(0.0, 1.0746071094349994e-014)
>>> from scipy import inf
>>> quad(erf, -inf, inf)
(0.0, 0.0)
>>> quad(erf, 0, inf) # ok, this one is probably more of a 'stupid' case
Warning: The integral is probably divergent, or slowly convergent.
(-1.564189583542768, 3.2898350710297564e-010)

And, while this exposes a little import-order wart, we can re-implement erf in terms of the normal CDF, and see that we get pretty similar results:

>>> from scipy import misc # shouldn't really be necessary - sorry :)
>>> from scipy.stats.distributions import norm
>>> import numpy as np
>>> def my_erf(x):
...   y = norm.cdf(x * np.sqrt(2))
...   return (2 * y) - 1
...
>>> my_erf(0.1)
0.11246291601828484
>>> my_erf(1)
0.84270079294971501
>>> quad(my_erf, 0, 1)
(0.48606495811225597, 5.3964050795968887e-015)
>>> quad(my_erf, -inf, inf)
(2.8756927650058737e-016, 6.1925307417506635e-016)

I also know that it’s possible to run through the whole Tentative NumPy Tutorial [3] with identical output on  CPython and IronPython [4], and the SciPy tutorial appears to work equally well in both  environments [5]. In short, if you’re trying to do scientific computing with  IronPython, Ironclad is now probably mature enough to let you get significant value  out of SciPy/NumPy.

However, I can’t claim that everything is rosy: Ironclad has a number of flaws which may impact you.

  • It won’t currently work outside Windows, and it won’t work in 64-bit processes.  However, NumPy itself doesn’t yet take advantage of 64-bit Windows. I’ll start work  on this  as soon as it’s practical; for now, it should be possible to run in 32-bit  mode without problems.
  • Performance is generally poor compared to CPython. In many places it’s only a matter of a few errant microseconds — and we’ve seen NumPy integration deliver some great performance benefits for Resolver One — but in pathological cases it’s  worse by many orders of magnitude. This is another area where I would really like  to hear back from users with examples of what needs to be faster.
  • Unicode data doesn’t work, and I don’t plan to work on this problem because it’ll disappear when IronPython catches up to Python 3000. At that point both systems will have Unicode strings only, instead of the current situation where I would have to map one string type on one side to two string types on the other.
  • NumPy’s distutils and f2py subpackages don’t currently work at all, and nor do memory-mapped files.
  • Plenty of other CPython extensions work, to a greater or lesser extent, but lots won’t even import.

However, just about every problem with Ironclad is fixable, at least in theory: if you need it to do something that it can’t, please talk to me about it (or even send me a patch!).

Click to find out more about consulting for numerical computing

 

Footnotes

[1] CPython uses reference counting to track objects’ states, and deletes them deterministically the moment they become unreferenced, while .NET uses a more advanced garbage collection strategy which unfortunately leads to non-deterministic finalization.

[2] Assuming you have the directories containing the ironclad, numpy and scipy packages already on your sys.path, at any rate. I personally just install  everything for Python 2.5, and have added the CPython install’s ‘Dlls‘ and ‘lib/site-packages‘ subdirectories to my IRONPYTHONPATH.

[3] Apart from the matplotlib/pylab bit, but even that should be workable with a little extra setup if you don’t mind using a non-interactive back-end.

[4] Modulo PRNG output, of course.

[5] That is to say, not very well at all, but at least they go wrong in similar ways.

IronPython article on CodeProject

It’s difficult to use SciPy from IronPython because much of SciPy is implemented in C rather than in Python itself. I wrote an article on CodeProject summarizing some things I’d learned about using Python modules with IronPython. (Many thanks to folks who left helpful comments here and answered my questions on StackOverflow.) The article gives stand-alone code for computing normal probabilities and explains why I wrote my own code rather than using SciPy.

Here’s the article: Computing Normal Probabilities in IronPython

Here’s some more stand-alone Python code for computing mathematical functions and generating random numbers. The code should work in Python 2.5, Python 3.0, and IronPython.


Related post
: IronPython is a one-way gate

IronPython is a one-way gate

IronPython opens up the world of .NET to Python programmers. It’s not as good yet at opening up the world of Python to .NET programmers.

It is easy to write .NET applications in IronPython. I typed in some sample code within a few minutes of installing IronPython and made a very simple Windows application. But I was also interested in going the other way around. I was hoping to use IronPython to expose Python library functionality (specifically SciPy) to C#. This may be possible, but it’s swimming upstream.

There are two issues. First, calling Python from C# is more complicated than I’d expected. In hindsight it makes sense that it should be easier to call statically-typed languages from dynamically-typed languages than the other way around. I wouldn’t be surprised if IronRuby has an analogous problem. Second, even if you’re only using IronPython, not calling it from another language, there are problems calling some Python modules.

I asked a question about SciPy and IronPython on StackOverflow and got two excellent answers. First, “NXC” explained that modules written in pure Python will work with IronPython, but modules written in C will not work directly.

Anything with components written in C (for example NumPy, which is a component of SciPy) will not work on IronPython as the external language interface works differently. Any C language component will probably not work unless it has been explicitly ported to work with IronPython.

That’s disappointing, but it makes sense.

Second, “wilberforce” pointed out an open source project, Ironclad, that might fill in the gap.

Some of my workmates are working on Ironclad, a project that will make extension modules for CPython work in IronPython. It’s still in development, but parts of numpy, scipy and some other modules already work. You should try it out to see whether the parts of scipy you need are supported.

Related links:

Getting started with IronPython

I’ve just started experimenting with IronPython, Microsoft’s implementation of Python built on .NET. You can download IronPython from CodePlex either as an MSI installer or a zip file of binaries. I installed it from the zip file on one computer and from the MSI on another. I highly recommend the latter.

Installing from the zip file

The CodePlex download page has three files:

  • IronPython.msi
  • IronPython-2.0.1-Bin.zip
  • IronPython-2.0.1-Src.zip

My first thought was that I wasn’t interested in compiling IronPython from source, so I’d just download the bin file since it was smaller. I downloaded it, unzipped it, and copied it over to my C:bin directory. (I have a habit of installing languages in C:bin to placate software that assumes paths don’t contain spaces.  For example, if you install R in the default C:Program Files location, some add-ons will break.) The typical command line “hello world” program worked just fine. The example from the readme file on how to pop up a window using WinForms worked fine as well. But my attempt to use a standard library by typing import urllib didn’t work. The standard Python modules are not in the search path by default. The tutorial that comes with IronPython explains how to fix this.  I added the following two lines to C:binIronPython-2.0.1Libsite.py and then was able to use standard modules like urllib .

import sys
sys.path.append(r"C:binPython25Lib")

I had a non-ferrous version of Python installed already in C:binPython25 so I just reused those files. The tutorial explains where to get the standard library files if IronPython is the first Python you install.

Installing from the MSI file

On a different computer, I downloaded the MSI file and ran it. This was a much nicer experience. The installer has a check box to run NGen on the .NET code in IronPython. I checked this box assuming it would make IronPython run faster in the future.

The standard modules worked immediately with no configuration on my part.The installer created a sophisticated site.py file that builds the path on start-up. Presumably this site.py file will add new modules to my path as I install things in the future.