Stand-alone scientific code

Sometimes you need one or two scientific functions not included in your programming environment. For a number of possible reasons, you do not want to depend on an external library. For example, maybe you don’t want to take the time to evaluate libraries. Or maybe you want to give someone else a small amount of self-contained code. Here is a collection of code for these situations.

Stand-alone code for numerical computing

This page contains C++, Python, and C# code for special functions and random number generation with no external dependencies. Do whatever you want with it, no strings attached. Use at your own risk. I recently added software for gamma and log gamma functions, as well as a few random number generators. (Why separate functions for the gamma function and its logarithm? See explanation here.)

I don’t recommend using this code as a way to avoid learning a good library. If you’re writing Python, for example, I’d recommend using SciPy. But there are times when the advantages of being self-contained outweigh the advantages of using high-quality libraries.

Related posts:

Mathematical functions that seem unnecessary
SciPyTip: Daily tips on using scientific computing in Python
C# math gotchas

Read More

Porting Python to C#

When people start programming in Python, they often mention having to type less: no braces, no semicolons, fewer type declarations etc.

The difference may be more obvious when you go in the other direction, moving from Python to another language. This morning I ported some Python code to C# and was a little surprised how much extra code I had to add. When I’ve ported C# to Python I wasn’t as aware of the language differences. I guess it is easier to go down a notch in ceremony than to go up a notch.

Related post:

Plain Python

Read More

C# math gotchas

C# has three mathematical constants that look like constants in the C header file float.h. Two of these are not what you might expect.

The constant double.MaxValue in C# looks like the constant DBL_MAX in C, and indeed it is. Both give the maximum finite value of a double, which is on the order of 10^308. This might lead you to believe that double.MinValue in C# is the same as DBL_MIN in C or that double.Epsilon in C# is the same as DBL_EPSILON. If so, you’re in for a surprise.

The constants DBL_MAX and double.MaxValue are the same because there is no ambiguity over what “max” means: the largest finite value of a double. But DBL_MIN and double.MinValue are different because they minimize over different ranges. The constant DBL_MIN is the smallest positive value of a normalized double. The constant double.MinValue in C# is the smallest (i.e. most negative) value of a double and is the negative of double.MaxValue. The difference between DBL_MIN and double.MinValue is approximately the difference between 10^-308 and -10^308, between a very small positive number and a very large negative number.

C has a constant DBL_EPSILON for the smallest positive double precision number x such that 1 + x does not equal 1 in machine precision. Typically a double has about 15 figures of precision, and so DBL_EPSILON is on the order of 10^-16. (For a more precise description, see Anatomy of a floating point number.)

You might expect double.Epsilon in C# corresponds to DBL_EPSILON in C. I did, until a unit test failed on some numerical code I was porting from C++ to C#. But in C# double.Epsilon is the smallest positive value a (denormalized) double can take. It is similar to DBL_MIN, except that double.Epsilon is the possible smallest value of a double, not requiring normalization. The constant DBL_MIN is on the order of 10^-308 while double.Epsilon is on the order of 10^-324 because it allows denormalized values. (See Anatomy of a floating point number for details of denormalized numbers.)

Incidentally, the C constants DBL_MAX, DBL_MIN, and DBL_EPSILON equal the return values of max, min, and epsilon for the C++ class numeric_limits<double>.

To summarize,

  • double.MaxValue in C# equals DBL_MAX in C.
  • double.MinValue in C# equals -DBL_MAX in C.
  • double.Epsilon is similar to DBL_MIN in C, but orders of magnitude smaller.
  • C# has no analog of DBL_EPSILON from C.

One could argue that the C# names are better than the C names. It makes sense for double.MinValue to be the negative of double.MaxValue. But the use of Epsilon was a mistake. The term “epsilon” in numeric computing has long been established and is not limited to C. It would have been better if Microsoft had used the name MinPositiveValue to be more explicit and not conflict with established terminology.

Related posts:

C# random number generation
Math library functions that seem unnecessary
Floating point numbers are a leaky abstraction

Read More

C# random number generation code

This weekend Code Project posted an updated version of my article Simple Random Number Generation. The article comes with C# source code for generating random samples from the following distributions.

  • Cauchy
  • Chi square
  • Exponential
  • Inverse gamma
  • Laplace (double exponential)
  • Normal
  • Student t
  • Uniform
  • Weibull

After I submitted the revised article I realized I could have easily included a beta distribution generator. To generate a sample from a beta(a, b) distribution, generate a sample u from gamma(a, 1) and a sample v from gamma(b, 1) and return u/(u+v). (See why this works here.)

This isn’t the most efficient beta generator possible, especially for some parameters. But it’s not grossly inefficient either. Also, it’s very simple, and the code in that article emphasizes simplicity over efficiency.

The code doesn’t use advanced C# features; it could easily be translated to other languages.

Related links:

How to test a random number generator
Pitfalls in random number generation
Random number generation in C++
Probability distribution relationship chart

Read More

Visual Studio 2010 is a pig

Visual Studio 2010 has not made a good first impression.

It took about a day to install. I was using the Visual Studio Ultimate Web Installer and much of the time was spent downloading bits. I’m sure it would have been faster had I started with a DVD.¬† Also, I wasn’t giving the install my full attention. I was doing my regular work on one machine while installing VS 2010 on a remote machine. I would connect to the remote machine now and then to check on the progress. I don’t know exactly how long it took, but it was the majority of a day.

When I first started Visual Studio 2010, it took about half an hour to write my first “hello world” example. When I fired up VS 2010, I spent several minutes staring at a dialog that said “Microsoft Visual Studio is loading user settings. This may take a few minutes.” Seven minutes after launching Visual Studio, the application went away and my machine rebooted. I started Visual Studio again, started a C# console application, inserted a WriteLine statement, and compiled. Total elapsed time: 27 minutes.

I closed Visual Studio and did some more work. Later I came back and opened Visual Studio to write “hello world” again. Time from starting Visual Studio to compiling: 2 minutes 50 seconds.

Now I realize that start-up time isn’t everything. Most users will start Visual Studio and keep it up for hours or days. And that’s who Visual Studio is intended to serve. It’s not meant to be something you fire up for quick jobs.

Visual Studio 2010 is huge. The installation DVD is 2.3 GB. The source code for VS 2010 contains about 1,500,000 files and takes Microsoft 61 hours to build according to Phil Haack. (He said he didn’t know how many machines the build process uses.) Phil Haack also said that the release of VS 2010 was delayed because the feedback from testers was that the product was too slow. If the released product is faster, the betas must have been intolerably slow.

Update: I installed the Express version of VS 2010 on another computer and have been using it regularly. It is much faster, and pleasant to use. Maybe there’s something about the Ultimate edition (TFS integration?) that slows it down.

Related posts:

Moore’s law and software bloat
Better tools, less productivity?
You do pay for what you don’t use

Read More

Solver Foundation optimization library

Microsoft’s Solver Foundation is a numerical optimization library capable of solving problems involving millions of variables and millions of constraints. When I listened Scott Hanselman interview Nathan Brixius from Microsoft’s Solver Foundation team, I expected Brixius to say that Solver Foundation was written in C++ at its core and had a thin C# veneer to make it callable from .NET applications. Instead, he said that Solver Foundation is entirely written in managed code.

Even in heavy-duty numerical code the bottlenecks may not be numerical. The inner loops of the software would execute faster if they were written in C++, but Solver Foundation solves optimization problems about as quickly as other packages written in lower-level languages.

Read More

Free C# book

Charles Petzold is a highly respected author in Windows programming circles. For years, his book was THE reference for Win32 API programming. I knew he had since written several books on .NET programming but I didn’t realize until I listened to an interview with Petzold that he has a .NET book that he gives away on his web site.

.NET Book Zero: What the C or C++ Programmer Needs to Know About C# and the .NET Framework

Read More

C# verbatim strings vs. PowerShell here-strings

C# verbatim strings and PowerShell here-strings have just enough in common to be confusing. The differences are summarized here.

C# verbatim strings PowerShell here-strings
May contain line breaks Must contain line breaks
Only double quote variety Single and double quote varieties
Begins with @” Begins with @” (or @’) plus a line break
Ends with “ Ends with a line break followed by “@ (or ‘@)
Cannot contain un-escaped double quotes May contain quotes
Turns off C# escape sequences @’ turns off PowerShell escape sequences but @” does not

Read More