Posts tagged as:

C++

Two perspectives on the design of C++

by John on March 17, 2009

Here are two complementary (but not entirely complimentary!) blog posts about C++.

Roshan James has a scathing article about C++. When asked to recommend books on C++, he replied that he doesn’t recommend C++. He explains how the best C++ books may be Scott Meyer’s series Effective C++ but argues that they should be called “Defective C++. ” He isn’t criticizing Scott Meyers, only the aspects of the C++ language that made it necessary for Scott Meyers to write such books. Effective C++ explains how to get around problems that don’t exist in more recent languages.

Bruce Eckel’s article The Positive Legacy of C++ and Java focused more on what C++ did well. C++ was designed to be backwardly compatible with C. Bjarne Stroustrup, original author  of C++, realized that the decision to be compatible with C would cause major difficulties, but he also thought (correctly) that without such compatibility no one would move over to the new language. Given this severe constraint, C++ has been remarkably well designed.

Update: Check out the C++ FQA site Alessandro Gentilini mentions in the comments. “FQA” is not a typo. It stands for Frequently Questioned Answers.

{ 8 comments }

Patrick Getzmann and I have been exchanging email about a problem he had using some sample code I’d written for working with regular expressions in C++. I wasn’t much help, but Patrick figured it out. I wanted to post his solution here in case someone else has the same problem.

His code would compile but not link. The compiler gave the error message “regex.obj : error LNK2019 …” His solution follows.

I have German Visual Studio installed. The Feature Pack is bundled with SP1 in the German Version. My Installation order was:

1. Visual Studio 2008
2. Visual Studio 2008 SP1 (including the Feature Pack)
3. Windows SDK for Windows Server 2008 and .NET Framework 3.5

But it should be (If one needs/wants the Server 2008 SDK):

1. Visual Studio 2008
2. Windows SDK for Windows Server 2008 and .NET Framework 3.5
3. Visual Studio 2008 SP1 (including the Feature Pack)

Otherwise reinstallation of SP1 helps.

The same problem would show up if you were using C++ TR1 random number generators. In a nutshell, try reinstalling SP1.

{ 2 comments }

Here’s something I do all the time. I have a function of one variable and several parameters. I implement it as a function object in C++ so I can pass it on to code that does something with functions of one variable, such as integration or optimization. I’ll give a trivial example and then show the most recent real problem I’ve worked on.

Say I have a function f(x; a, b, c) = 1000a + 100b + 10c + x. In a sense this is simply a function of four variables. But the connotation of using a semicolon rather than a comma after the x is that I think of x as being a variable and I think of a, b, and c as parameters. So f is a function of one variable that depends on three constants. (A “parameter” is a “constant” that can change!)

I create a C++ function object with two methods. One method is a constructor that takes the function parameters as arguments and saves them to member variables. The other method is an overload of the parenthesis method. That’s what makes the class a function object. By overloading the parenthesis method, I can call an instance of the class as if it were a function. Here’s some code.

class FunctionObject
{
public:
	FunctionObject(double a, double b, double c)
	{
		m_a = a;
		m_b = b;
		m_c = c;
	}

	double operator()(double x) const
	{
		return 1000*m_a + 100*m_b + 10*m_c + x;
	}

private:
	double m_a;
	double m_b;
	double m_c;
};

So maybe I instantiate an instance of this function object and pass it to a function that finds the maximum value over an interval [a, b]. The code might look like this.

FunctionObject f(3, 1, 4);
double maximum = Maximize(f, a, b);

Here’s a more realistic example. A few days ago I needed to solve this problem. Given user input parameters λ, σ, n, and ξ, find b such that the following holds.

\int_0^1 \frac{1}{\sqrt{2}\nu} \Phi\left(\frac{\lambda \sqrt{2\nu n}}{\sqrt{\sigma^2(1 - 2\nu) + bn}}\right) \, d\nu = \xi

The function Φ above is the CDF of a standard normal random variable, defined here.

To solve this problem, I wrote a function object to evaluate the left side of the equation above. It takes λ, σ, and n as constructor arguments and takes b as an argument to operator(). Then I passed the function object to a root-finding method to solve for the value of b that makes the function value equal ξ. But my function is defined in terms of an integral, so I needed to write another function object first that returns the integrand. Then I pass that function object to this numerical integration routine.  So I had to write two function objects to solve this problem.

There are several advantages to function objects over functions. For example, I would typically do parameter validation in the constructor. Quite often I also do some expensive calculations in the constructor and cache the results so that each call to operator() is then more efficient. Maybe I want to keep track of how often the function is called, so I put in some sort of odometer method that increments a counter with each call.

Unfortunately there’s a fair amount of code to write in order to implement even the simplest function. This effort hardly matters in production code; so many other things take more time. But it is annoying when doing some quick exploration. The next post shows how this can be done much easier in Python. The Python approach would be much easier for small problems, but it doesn’t have the advantages mentioned above such as caching expensive calculations in a constructor.

{ 8 comments }

Computing the inverse of the normal CDF

by John on September 25, 2008

Someone asked me this week for C++ code to compute the inverse of the normal (Gaussian) distribution function. The code I usually use isn’t convenient to give away because it’s part of a large library, so I wrote a stand-alone function using an approximation out of Abramowitz and Stegun (A&S). There are a couple things A&S takes for granted, so I decided to write up the code in the spirit of a literate program to explain the details. The code is compact and portable. It isn’t as fast as possible nor as accurate as possible, but it’s good enough for many purposes.

A literate program to compute the inverse of the normal CDF

{ 0 comments }

Five tips for floating point programming

by John on September 24, 2008

I have a new article on CodeProject:

Five tips for floating point programming

The article discusses how to avoid some of traps that people often fall into when working with floating point numbers.

{ 3 comments }

Free C# book

by John on September 23, 2008

Charles Petzold is a highly respected author in Windows programming circles. For years, his book was THE reference for Win32 API programming. I knew he had since written several books on .NET programming but I didn’t realize until I listened to an interview with Petzold that he has a .NET book that he gives away on his web site.

.NET Book Zero: What the C or C++ Programmer Needs to Know About C# and the .NET Framework

{ 1 comment }

How to compute standard deviation accurately

by John on September 23, 2008

ThThe most convenient way to compute sample variance by hand may not work in a program. Sample variance is given by

\sigma^2 = \frac{1}{ n(n-1)}\left(n \sum_{i=1}^n x_i^2 -\left(\sum_{i=1}^n x_k\right)^2\right)

If you compute the two summations and then carry out the subtraction above, you might be OK. Or you might have a large loss of precision. You might get a negative result even though in theory the quantity above cannot be negative. If you want the standard deviation rather than the variance, you may be in for an unpleasant surprise when you try to take your square root.

There is a simple but non-obvious way to compute sample variance that has excellent numerical properties. The algorithm was first published back in 1962 but is not as well known as it should be. Here are some notes explaining the algorithm and some C++ code for implementing the algorithm.

Accurately computing running variance

The algorithm has the added advantage that it keeps a running account of the mean and variance as data are entered sequentially.

Update: Related posts

Comparing three methods of computing standard deviation
Theoretical explanation of numerical results
Comparing two ways to fit a line to data
How to calculate correlation accurately

{ 3 comments }

NaN, 1.#IND, 1.#INF, and all that

by John on August 28, 2008

If you’ve ever been surprised by a program that printed some cryptic letter combination when you were expecting a number, you’ve run into an arithmetic exception. This article explains what caused your problem and what you may be able to do to fix it.

IEEE floating-point exceptions

Here’s a teaser. If x is of type float or double, does the expression (x == x) always evaluate to true? Are you certain?

{ 0 comments }

This morning I listened to a podcast interview with Kate Gregory. She used some terms I hadn’t heard in years: BSTR, OLE strings, etc.

Around a decade ago I was working with COM in C++ and had to deal with the menagerie of string types Kate Gregory mentioned. I wrote an article to get all the various types straight in my head: all the different memory allocation rules, conventions for use, conversions between types, etc. I never published the article. When I started my personal web site I thought about posting the article there, but then I thought that by now nobody cared about such things. But the interview I listened to this morning made me think more people might be interested than I’d thought. So I posted my article Unravelling Strings in Visual C++ in case someone finds it useful.

{ 0 comments }

Random number generation in C++ TR1

by John on July 17, 2008

The C++ Standard Library Technical Report 1 (TR1) includes a specification for random number generation classes.

The Boost library has supported TR1 for a while. Microsoft released a feature pack for Visual Studio 2008 in April that includes support for most of TR1. (They left out support for mathematical special functions.) Dinkumware sells a complete TR1 implementation. And gcc included support for TR1 in version 4.3 released in May. (According to the gcc status page the latest version supports most of TR1 except regular expressions. I’ve been able to get some TR1 features to work using gcc 4.3.1 but have not been able to get random number generation to work yet.)

I’ve posted a set of notes that explain how to use the C++ TR1 random number generation classes in Visual Studio 2008. The notes include sample code and point out a few gotchas. They also explain how to use the C++ TR1 classes to generate from distributions not directly supported by the TR1.

{ 0 comments }

One of the complaints about C++ templates is that they can cause code bloat. But Scott Meyers pointed out in an interview that some people are using templates in embedded systems applications because templates result in smaller code.

C++ compilers only generate code for template methods that are actually used in an application, so it’s possible that code using templates may result in a smaller executable than code that a more traditional object oriented approach.

{ 0 comments }

Programming language subsets

by John on June 11, 2008

I just found out that Douglas Crockford has written a book JavaScript: The Good Parts. I haven’t read the book, but I imagine it’s quite good based on having seen the author’s JavaScript videos.

Crockford says JavaScript is an elegant and powerful language at its core, but it suffers from numerous egregious flaws that have been impossible to correct due to its rapid adoption and standardization.

I like the idea of carving out a subset of a language, the good parts, but several difficulties come to mind.

  1. Although you may limit yourself to a certain language subset, your colleagues may choose a different subset. This is particularly a problem with an enormous language such as Perl. Coworkers may carve out nearly disjoint subsets for their own use.
  2. Features outside your intended subset may be just a typo away. You have to have at least some familiarity with the whole language because every feature is a feature you might accidentally use.
  3. The parts of the language you don’t want to use still take up space in your reference material and make it harder to find what you’re looking for.

One of the design principles of C++ is “you only pay for what you use.” I believe the primary intention was that you shouldn’t pay a performance penalty for language features you don’t use, and C++ delivers on that promise. But there’s a mental price to pay for language features you don’t use. As I’d commented about Perl before, you have to use the language several hours a week just to keep it loaded in your memory.

There’s an old saying that when you marry a girl you marry her family. A similar principle applies to programming languages. You may have a subset you love, but you’re going to have to live with the rest of the language.

{ 0 comments }

Porting Visual C++ code to Linux/gcc

by John on May 29, 2008

Here are a few lessons learned from porting a numerical library recently from Windows/Visual C++ to Linux/gcc.

Some of our code only runs on Windows, and only needs to run on Windows. Our first thought was to put #ifdef WIN32 directives around that code. Clift Norris came up with the clever idea of using #ifndef EXCLUDE_WINDOWS_ONLY_CODE instead. That way we could do a preliminary test of the portable subset of the code while still working on Windows where we’re more comfortable. Also, by not referring specifically to 32-bit Windows, we’re OK moving the code to 64-bit Windows.

Visual C++ does not require source and header files to end with newline characters, but gcc does. We got hundreds of warnings of the form warning: no newline at end of file when we first attempted to compile our code on Linux. Apparently there’s no gcc switch to turn this off, and it may not be prudent to turn it off if you could. As I understand it, Visual Studio inserts a linebreak after including header files, but gcc may not and so gcc needs to issue a warning in this case while Visual Studio does not. We copy our source tree to a Linux box then run the following Python code on that box to insert the extra newline characters when needed.

import os  

# List of directories of files to add newlines to.
# Script must be in the same location as these directories.
directories = ["Banana", "Apple", "Peach"] 

for dir in directories:
    for file in os.listdir(dir):
        if file.endswith(".h") or file.endswith(".cpp"):
            path = dir + "/" + file
            handle = open(path, "r")
            slurp = handle.read()
            handle.close() 

            if not slurp.endswith("\n"):
                retcode = os.system("chmod +w " + path)
                if retcode != 0:
                    print "chmod returned " + retcode + " on " + path
                else:
                    handle = open(path, "a")
                    handle.write("\n")
                    handle.close()

There were several places in our code where a variable was deliberately unused but retained in a function signature. Suppose a function has signature void foo(int a, int b) but b is unused. We had usually handled that by making b; the first line of the implementation. That would suppress unused variable warnings in Visual C++, but not in gcc. When we changed the function signature to void foo(int a, int /* b */), that made both compilers happy.

We started out using autoconf, but that was overkill for our project. Our build process became two orders of magnitude simpler when we switched over to a crude, old-fashioned make file. After porting the library to Linux, we built it on OS X without any issue.

This wasn’t an issue for us, but a potential problem when moving numerical code between Unix-like systems is that the function gamma computes different things on different systems. On Linux it computes the logarithm of the mathematical gamma function but on OS X it computes the gamma function itself. See the last two paragraphs of how to calculate binomial probabilities and Thomas Guest’s comment on that post for a full explanation.

This library had been ported to Linux years ago, but nobody used it on Linux and so development continued only on Windows. When we first ported the code, gcc and Visual C++ seemed to have incompatible requirements, especially with templates. The more recent port described here was much easier now that both compilers are more compliant with the C++ standard.

{ 3 comments }

Regular expressions in C++ TR1

by John on May 7, 2008

Regular expressions are not a part of the C++ Standard Library quite yet, but there is a document (Technical Report 1, or TR1) that includes among other things a specification for regular expression support that will probably be added to the C++ standard eventually.

The Boost library has supported TR1 for a while. Microsoft just released a feature pack for Visual Studio 2008 a month ago that includes support for most of TR1. (They’ve left out support for mathematical special functions.) And Dinkumware sells a complete TR1 implementation.

I’ve added some notes to my web site for getting started with C++ TR1 regular expressions. I took my PowerShell regex notes as a starting point and implemented some of the same examples in C++. I changed the organization though, because the C++ implementation is fairly different from PowerShell.

Working with regular expressions is harder in C++ than in scripting languages such as Perl or Python, but not unnecessarily so. C++ is optimized for fine-grained control and efficiency rather than ease of use; that’s what C++ is for. The TR1 implementation is internally consistent and elegant in its own way.

It’s easy to find API-level documentation but harder to find examples for getting started. (I’ve heard good things about Pete Becker’s book The C++ Standard Library Extensions but I haven’t read it.) So I decided to keep some notes as I played with the Visual Studio implementation. I imagine most of the content applies to other implementations, but I’ve only tested the examples using Visual Studio.

Update: GCC just added support for C++ TR1 two days ago with their verion 4.3 release.  However, it appears support for regular expressions is not included.

{ 0 comments }

Random number generator controversy

by John on April 12, 2008

I submitted an article to Code Project yesterday, Simple Random Number Generation, describing a small C# class called SimpleRNG that uses George Marsaglia’s WMC algorithm. The article was posted around 5 PM (central US time) and comments started pouring in right away. I didn’t expect any feedback on a Friday afternoon or Saturday morning. But as I write this post, there have been 580 page views and 11 comments.

There have been three basic questions raised in the comments.

  1. Why not just use the random number generator that comes with .NET?
  2. Is this code suitable for cryptography?
  3. Is this code suitable for Monte Carlo applications?

Why not use the built-in generator? For many applications, the simplest thing would be to use the .NET random number generator. But there are instances where this might not be best. There are questions about the statistical quality of the .NET generator; I’ll get to that in a minute. The primary advantages I see to the SimpleRNG class are transparency and portability.

By transparency I mean that the internal state of the generator is simple and easy to access. When you’re trying to reproduce a result, say while debugging, it’s convenient to have full access to the internal state of the random generator. If you’re using your own generator, you can see everything. You can even temporarily change it: for debugging, it may be convenient to temporarily have the “random” generator return a very regular, predictable sequence.

By portability I do not necessarily mean moving the code between operating systems. The primary application I have in mind is moving the algorithm between languages. For example, in my work we often have prototype code written in R that needs to be rewritten in C++ for efficiency. If the code involves random number generation, the output of the prototype and the rewrite cannot be directly compared, only compared on average. Then you have to judge whether the differences are to be expected or whether they indicate a bug. But if both the R and the C++ code use the same RNG algorithm and the same seed, the results may be directly comparable. (They still may not be directly comparable due to other factors, but at least this way the results are often comparable.)

As for cryptography, no, SimpleRNG is not appropriate for cryptography.

As for Monte Carlo applications, not all Monte Carlo applications are created equal. Some applications do not require high quality random number generators. Or more accurately, different applications require different kinds of quality. Some random number generators break down when used for high-dimensional integration. I suspect SimpleRNG is appropriate for moderate dimensions. I use the Mersenne Twister generator for numerical integration. However, SimpleRNG is faster and much simpler; the MT generator has a very large internal state.

Someone commented on the CodeProject article that the random number generator in .NET is not appropriate for Monte Carlo simulation because it does not pass Marsaglia’s DIEHARD tests while SimpleRNG does. I don’t know what algorithm the .NET generator uses, so I can’t comment on its quality. Before I’d use it in statistical applications, I’d want to find out.

{ 0 comments }