Numerical computing in IronPython with Ironclad

In a previous post, I discuss my difficulties calling some Python modules from IronPython. In particular I wanted to call SciPy from IronPython and couldn’t. The discussion following that post brought up Ironclad as a possible solution. I wanted to learn more about Ironclad, and so I invited William Reade to write a guest post about the project. I want to thank William for responding to my request with a  very helpful article. — John


Hi! My name’s William Reade, and I’ve spent the last year or so working on Ironclad, an open-source project which helps IronPython to inter-operate better with CPython. Michael Foord recently introduced  me to our host John, who kindly offered me the opportunity to write a bit about  my work and, er, how well it works. So, here I am.

To give you a little bit of context, I’ve been working at Resolver Systems for several years now; our main product, Resolver  One, is a spreadsheet with very tight IronPython integration. We like to describe  it as a “Pythonic spreadsheet”, and that’s clearly a concept that people like.  However, when people think of a “Pythonic spreadsheet”, they apparently expect it  to work with popular Python libraries — such as NumPy and SciPy — and we found that IronPython’s incompatibility put us at a serious disadvantage. And, for some reason, nobody seemed very keen to  solve the problem for us, so we had to do it ourselves.

The purpose of Ironclad is to allow you to use Python C extensions (of which there are many) from inside IronPython without recompiling anything. The secret purpose  has always been to get NumPy working in Resolver One, and in release 1.4 we finally  achieved this goal. Although the integration is still alpha level, you can import  and use NumPy inside the spreadsheet grid and user code: you can see a screencast  about the integration here.

However, while Resolver One is a great tool, you aren’t required to use it to get the benefits: Ironclad has been developed completely separately, has no external  dependencies, and is available under an open source license. If you consider  yourself adequately teased, keep reading for a discussion of what Ironclad actually  does, what it enables you to do, and where it’s headed.

As you may know, Python is written in C and IronPython is written in C#. While IronPython is an excellent implementation of Python, it works very differently  under the hood, and it certainly doesn’t have anything resembling Python’s API for  writing C extensions. However, Ironclad can work around this problem by loading a  stub DLL into an IronPython process which impersonates the real python25.dll, and  hence allows us to us intercept the CPython API calls. We can then ensure that the  appropriate things happen in response to those calls… except that we use  IronPython objects instead of CPython ones.

So long as we wrap IronPython objects for consumption by CPython, and vice versa, the two systems can coexist and inter-operate quite happily. Of course, the mix of  deterministic and non-deterministic garbage collection makes it a little tricky [1] to  ensure that unreferenced objects — and only unreferenced objects — die in a  timely manner, and there are a number of other dark corners, but I’ve done enough  work to confidently state that the problem is “just” complex and fiddly. While it’s  not the sort of project that will ever be finished, it hopefully is the sort that  can be useful without being perfect.

The upshot of my recent work is that you can now download Ironclad, type ‘import ironclad; import scipy‘ in an IronPython console, and it will Just Work [2]. I am  programmer, hear me roar!

Hundreds of tests now pass in both NumPy and SciPy, and I hope that some of you will be inspired to test it against your own requirements. For example, the Gaussian error function has been mentioned a few times on this blog (and, crucially, I have a  vague idea of what it actually is), and I can demonstrate that scipy.special.erf works perfectly under Ironclad:

C:devironclad-headbuild>ipy
IronPython 2.0 (2.0.0.0) on .NET 2.0.50727.3053
Type "help", "copyright", "credits" or "license" for more information.
>>> import ironclad
>>> from scipy.special import erf
Detected scipy import
  faking out numpy._import_tools.PackageLoader
Detected numpy import
  faking out modules: mmap, nosetester, parser
>>> erf(0)
0.0
>>> erf(0.1)
0.1124629160182849
>>> erf(1)
0.84270079294971478
>>> erf(10)
1.0

Numerical integration also seems to work pretty well, even for tricky cases (note that the quad function returns a tuple of (result, error)):

>>> from scipy.integrate import quad
>>> quad(erf, 0, 1)
(0.48606495811225592, 5.3964050795968879e-015)
>>> quad(erf, -1, 1)
(0.0, 1.0746071094349994e-014)
>>> from scipy import inf
>>> quad(erf, -inf, inf)
(0.0, 0.0)
>>> quad(erf, 0, inf) # ok, this one is probably more of a 'stupid' case
Warning: The integral is probably divergent, or slowly convergent.
(-1.564189583542768, 3.2898350710297564e-010)

And, while this exposes a little import-order wart, we can re-implement erf in terms of the normal CDF, and see that we get pretty similar results:

>>> from scipy import misc # shouldn't really be necessary - sorry :)
>>> from scipy.stats.distributions import norm
>>> import numpy as np
>>> def my_erf(x):
...   y = norm.cdf(x * np.sqrt(2))
...   return (2 * y) - 1
...
>>> my_erf(0.1)
0.11246291601828484
>>> my_erf(1)
0.84270079294971501
>>> quad(my_erf, 0, 1)
(0.48606495811225597, 5.3964050795968887e-015)
>>> quad(my_erf, -inf, inf)
(2.8756927650058737e-016, 6.1925307417506635e-016)

I also know that it’s possible to run through the whole Tentative NumPy Tutorial [3] with identical output on  CPython and IronPython [4], and the SciPy tutorial appears to work equally well in both  environments [5]. In short, if you’re trying to do scientific computing with  IronPython, Ironclad is now probably mature enough to let you get significant value  out of SciPy/NumPy.

However, I can’t claim that everything is rosy: Ironclad has a number of flaws which may impact you.

  • It won’t currently work outside Windows, and it won’t work in 64-bit processes.  However, NumPy itself doesn’t yet take advantage of 64-bit Windows. I’ll start work  on this  as soon as it’s practical; for now, it should be possible to run in 32-bit  mode without problems.
  • Performance is generally poor compared to CPython. In many places it’s only a matter of a few errant microseconds — and we’ve seen NumPy integration deliver some great performance benefits for Resolver One — but in pathological cases it’s  worse by many orders of magnitude. This is another area where I would really like  to hear back from users with examples of what needs to be faster.
  • Unicode data doesn’t work, and I don’t plan to work on this problem because it’ll disappear when IronPython catches up to Python 3000. At that point both systems will have Unicode strings only, instead of the current situation where I would have to map one string type on one side to two string types on the other.
  • NumPy’s distutils and f2py subpackages don’t currently work at all, and nor do memory-mapped files.
  • Plenty of other CPython extensions work, to a greater or lesser extent, but lots won’t even import.

However, just about every problem with Ironclad is fixable, at least in theory: if you need it to do something that it can’t, please talk to me about it (or even send me a patch!).

Footnotes

[1] CPython uses reference counting to track objects’ states, and deletes them deterministically the moment they become unreferenced, while .NET uses a more advanced garbage collection strategy which unfortunately leads to non-deterministic finalization.

[2] Assuming you have the directories containing the ironclad, numpy and scipy packages already on your sys.path, at any rate. I personally just install  everything for Python 2.5, and have added the CPython install’s ‘Dlls‘ and ‘lib/site-packages‘ subdirectories to my IRONPYTHONPATH.

[3] Apart from the matplotlib/pylab bit, but even that should be workable with a little extra setup if you don’t mind using a non-interactive back-end.

[4] Modulo PRNG output, of course.

[5] That is to say, not very well at all, but at least they go wrong in similar ways.

4 thoughts on “Numerical computing in IronPython with Ironclad

Comments are closed.