Posts tagged as:

C++

What most C++ programmers do

by John on August 31, 2011

“Nobody knows what most C++ programmers do.” — Bjarne Stroustrup

The quote above came up in a discussion of C++ by Scott Meyers, Andrei Alexandrescu, and Herb Sutter. They argue that C++ is used in so many diverse applications that if someone starts a sentence with “Most C++ programmers …” he probably doesn’t know what he’s talking about.

Related post:

The dark matter of programmers

{ 12 comments }

A couple Python-like features in C++11

by John on August 17, 2011

The new C++ standard includes a couple Python-like features that I ran across recently. There are other Python-like features in the new standard, but here I’ll discuss range-based for-loops and raw strings.

[click to continue...]

{ 9 comments }

A couple years ago I wrote a blog post Has C++ jumped the shark? I wondered how many people would care about the new C++ standard by the time it came out. I doubted that it would matter much to me personally.

… if something is hard to do in C++, I just don’t use C++. I don’t see a new version of C++ changing my decisions about what language to use for various tasks.

The new standard is out now, and I find it more interesting that I thought I would have. I’d still rather not use C++ for tasks that are easier in another language I know, but sometimes I don’t have that option.

When I heard about plans to add lambdas and closures to C++ I commented here that “it seems odd to add these features to C++.”  Now I think lambdas (anonymous functions) and closures may be the most important new features for my use.

One of my favorite features in Python is the ability to define functions on the fly. If I’m writing a function and need to create a new function to pass as an argument, then I can define that function on the spot. By contrast, C++ has not let you define a function inside a function. Now you can.

(You still cannot define a named function inside a C++ function. However, you can create anonymous functions inside another function. If you save that anonymous function to a variable, then you’ve effectively created a named function.)

Here are a couple examples of how it can be very convenient to define little functions on the fly. First, optimization software typically provides methods for minimizing functions. If you want to maximize a f(x), you define a new function g(x) = -f(x) and minimize g. Second, root-finding software typically solves equations of the form f(x) = 0. If you want to solve f(x) = c, you create a new function g(x) = f(x) – c and find where g is zero. In both cases, the function g is so trivial that there’s no need to give it a name. And more importantly, you’d rather define this auxiliary function right where you need it than interrupt your context by defining it outside.

Anonymous functions can also have associated data obtained from the context where they’re defined. That’s called a closure because it encloses some context with the function. This is very often necessary in mathematical software development. For example, say you have a function of two variables f(x, y) and you want to integrate f with respect to x, holding y constant. You might then think of f as a function of one variable with one constant, but the compiler sees it as a function of two variables and will not let you pass it to an integration routine expecting a function of one variable. One way to solve this problem is to use function objects. This isn’t difficult, but it requires a lot of ceremony compared to using closures.

For specifics of how to use lambdas and closures in C++, see Ajay Vijayvargiya’s article Explicating the new C++ standard (C++0x), and its implementation in VC10.

Related posts:

Why do C++ folks make things so complicated?
Random number generation in C++ TR1
Regular expressions in C++ TR1

{ 21 comments }

Calling C++ from R

by John on June 30, 2011

This post relates my experience with calling C++ from R by writing an R module from scratch and by the inline module.

[click to continue...]

{ 4 comments }

This morning Miroslav Bajtoš asked “Why do C++ folks make things so complicated?” in response to my article on regular expressions in C++. Other people asked similar questions yesterday.

My response has two parts:

  1. Why I believe C++ libraries are often complicated.
  2. Why I don’t think it has to be that way.

Why would someone be using C++ in the first place? Most likely because they need performance or fine-grained control that they cannot get somewhere else. A Ruby programmer, for example, can make a design decision that makes code 10% slower but much easier to use. “Hey, if you want the best performance possible, why are you using Ruby? Didn’t you come here because you wanted convenience?” But the C++ programmer can’t say that. It’s not turtles all the way down. Often C++ is the last human-generated language in a technology stack before you hit metal.

This weekend The Register quoted Herb Sutter saying “The world is built on C++.” The article goes on to mention some of the foundational software written in C++.

Apple’s Mac OS X, Adobe Illustrator, Facebook, Google’s Chrome browser, the Apache MapReduce clustered data-processing architecture, Microsoft Windows 7 and Internet Explorer, Firefox, and MySQL — to name just a handful — are written in part or in their entirety with C++.

Certainly there is a lot of software implemented in higher-level languages, but those high-level languages are almost always implemented in C or C++. When there’s no lower-level language to appeal to, you have to offer a lot of options, even if 90% of users won’t need those options.

On the other hand, that doesn’t mean all C++ libraries have to be complicated. The argument above says that the lowest layers have to be complicated and they’re written in C++.  But why couldn’t the next layer up also be written in C++?

Some time in the 90’s I ran across an article called “Top C++.” I’ve tried unsuccessfully since then to find a copy of  it. As I recall, the article proposed dividing C++ conceptually into two languages: Top C++ and Bottom C++. Explicit memory management, for example, would be in Bottom C++. Top C++ would be a higher-level language. You could, for example, tell a compiler that you intend to write Top C++ and it could warn you if you use features designated as Bottom C++.

Of course you could slip from Top C++ into Bottom C++ if you really needed to, and that’s the beauty of using one language for high-level and low-level code. Application code in C++ would use Top C++ features by default, but could deliberately drop down a level when necessary. A section of Bottom C++ could be marked with comments or compiler pragmas and justified in a code review. Instead of having to cross a language barrier, you simply cross a convention barrier.

I thought this was a great idea. I’ve followed this approach throughout my career, writing high-level C++ on top of low-level C++ libraries. To some extent I put on my Bottom C++ hat when I’m writing library code and my Top C++ hat when I’m writing applications.

But most of the world has decided that’s not the way to go. If you’re writing C++, it might as well be Bottom C++. Instead of Top C++, write VB or some other higher-level language. There’s very little interest in high-level C++.

I imagine this decision has more to do with management than technology. It’s easier to have people either write C++ or not. If you’re writing C++, then use any language feature any time. I suppose this is a stable equilibrium and that the Top/Bottom C++ partition is not.

Related posts:

Software architecture as a function of trust
Programming language subsets

{ 114 comments }

C# math gotchas

by John on June 8, 2010

C# has three mathematical constants that look like constants in the C header file float.h. Two of these are not what you might expect.

The constant double.MaxValue in C# looks like the constant DBL_MAX in C, and indeed it is. Both give the maximum finite value of a double, which is on the order of 10^308. This might lead you to believe that double.MinValue in C# is the same as DBL_MIN in C or that double.Epsilon in C# is the same as DBL_EPSILON. If so, you’re in for a surprise.

The constants DBL_MAX and double.MaxValue are the same because there is no ambiguity over what “max” means: the largest finite value of a double. But DBL_MIN and double.MinValue are different because they minimize over different ranges. The constant DBL_MIN is the smallest positive value of a normalized double. The constant double.MinValue in C# is the smallest (i.e. most negative) value of a double and is the negative of double.MaxValue. The difference between DBL_MIN and double.MinValue is approximately the difference between 10^-308 and -10^308, between a very small positive number and a very large negative number.

C has a constant DBL_EPSILON for the smallest positive double precision number x such that 1 + x does not equal 1 in machine precision. Typically a double has about 15 figures of precision, and so DBL_EPSILON is on the order of 10^-16. (For a more precise description, see Anatomy of a floating point number.)

You might expect double.Epsilon in C# corresponds to DBL_EPSILON in C. I did, until a unit test failed on some numerical code I was porting from C++ to C#. But in C# double.Epsilon is the smallest positive value a (denormalized) double can take. It is similar to DBL_MIN, except that double.Epsilon is the possible smallest value of a double, not requiring normalization. The constant DBL_MIN is on the order of 10^-308 while double.Epsilon is on the order of 10^-324 because it allows denormalized values. (See Anatomy of a floating point number for details of denormalized numbers.)

Incidentally, the C constants DBL_MAX, DBL_MIN, and DBL_EPSILON equal the return values of max, min, and epsilon for the C++ class numeric_limits<double>.

To summarize,

  • double.MaxValue in C# equals DBL_MAX in C.
  • double.MinValue in C# equals -DBL_MAX in C.
  • double.Epsilon is similar to DBL_MIN in C, but orders of magnitude smaller.
  • C# has no analog of DBL_EPSILON from C.

One could argue that the C# names are better than the C names. It makes sense for double.MinValue to be the negative of double.MaxValue. But the use of Epsilon was a mistake. The term “epsilon” in numeric computing has long been established and is not limited to C. It would have been better if Microsoft had used the name MinPositiveValue to be more explicit and not conflict with established terminology.

Related posts:

C# random number generation
Math library functions that seem unnecessary
Floating point numbers are a leaky abstraction

{ 10 comments }

This post will give several examples of functions include in the standard C math library that seem unnecessary at first glance.

Function log1p(x) = log(1 + x)

The function log1p computes log(1 + x).  How hard could this be to implement?

log(1 + x);

Done.

But wait. What if x is very small? If x is 10-16, for example, then on a typical system 1 + x = 1 because machine precision does not contain enough significant bits to distinguish 1 + x from 1. (For details, see Anatomy of a floating point number.) That means that the code log(1 + x) would first compute 1 + x, obtain 1, then compute log(1), and return 0. But log(1 + 10-16) ≈ 10-16. This means the absolute error is about 10-16 and the relative error is 100%. For values of x larger than 10-16 but still fairly small, the code log(1 + x) may not be completely inaccurate, but the relative error may still be unacceptably large.

Fortunately, this is an easy problem to fix. For small x, log(1 + x) is approximately x. So for very small arguments, just return x. For larger arguments, compute log(1 + x) directly. See details here.

Why does this matter? The absolute error is small, even if the code returns a zero for a non-zero answer. Isn’t that OK? Well, it might be. It depends on what you do next. If you add the result to a large number, then the relative error in the answer doesn’t matter. But if you multiply the result by a large number, your large relative error becomes a large absolute error as well.

Function expm1(x) = exp(x) – 1

Another function that may seem unnecessary is expm1. This function computes ex – 1. Why not just write this?

exp(x) - 1.0;

That’s fine for moderately large x. For very small values of x, exp(x) is close to 1, maybe so close to 1 that it actually equals 1 to machine precision. In that case, the code exp(x) - 1 will return 0 and result in 100% relative error. As before, for slightly larger values of x the naive code will not be entirely inaccurate, but it may be less accurate than needed. The solution is to compute exp(x) – 1 directly without first computing exp(x). The Taylor series for exp(x) is 1 + x + x2/2 … So for very small x, we could just return x for exp(x) – 1. Or for slightly larger x, we could return x + x2/2. (See details here.)

Functions erf(x) and erfc(x)

The C math library contains a pair of functions erf and erfc. The “c” stands for “complement” because erfc(x) = 1 – erf(x). The function erf(x) is known as the error function and is not trivial to implement. But why have a separate routine for erfc? Isn’t it trivial to implement once you have code for erf? For some values of x, yes, this is true. But if x is large, erf(x) is near 1, and the code 1 - erf(x) may return 0 when the result should be small but positive. As in the examples above, the naive implementation results in a complete loss of precision for some values and a partial loss of precision for other values.

Functions Γ(x) and log Γ(x)

The standard C math library has two functions related to the gamma function: tgamma that returns Γ(x) and lgamma that return log Γ(x). Why have both? Why can’t the latter just use the log of the former? The gamma function grows extremely quickly. For moderately large arguments, its value exceeds the capacity of a computer number. Sometimes you need these astronomically large numbers as intermediate results. Maybe you need a moderate-sized number that is the ratio of two very large numbers. (See an example here.) In such cases, you need to subtract lgamma values rather than take the ratio of tgamma values. (Why is the function called “tgamma” and not just “gamma”? See the last paragraph and first comment on this post.)

Conclusion

The standard C math library distills a lot of experience. Some of the functions may seem unnecessary, and so they are for some arguments. But for other arguments these functions are indispensable.

Related posts:

What’s so hard about finding a hypotenuse?
Floating point numbers are a leaky abstraction
Comparing math.h on POSIX and Windows systems

{ 21 comments }

The dark matter of programmers

by John on May 17, 2010

Kate Gregory said in an interview that she likes to call C++ programmers the dark matter of the software development world. They’re not visible — they don’t attend conferences and they don’t buy a lot of books — and yet there are a lot of them.

I’ve thought the same about those who write code for games and embedded systems. I’ve hardly met anyone who works in either area, but there is a huge amount of such software and someone has to be writing it.

Game developers and embedded systems folk often work in C++, so my observation would overlap with Kate Gregory’s.

Related posts:

Termites and programmers
Disagreeing with Torvalds on C++
Complementary perspectives on the design of C++

{ 9 comments }

C++ 0X overview

by John on April 7, 2010

Scott Meyers gives an overview of the new C++ standard in his interview on Software Engineering Radio.

On the one hand, some of the new features sound very nice. For example, C++ will gain

  • Type inference. This will make it easier to work with complex (i.e. template) type definitions.
  • Lambda expressions. This will make working with the standard library much easier.
  • Raw string literals. This will make regular expressions easier to read and write.
  • R-value references. This will make some code more efficient.

On the other hand, the new standard will be about twice the size of the previous standard. A complex language is about to become a lot more complex.

The show notes have several links to much more information on the new standard.

Related posts:

Has C++ jumped the shark?
Two perspectives on the design of C++

{ 2 comments }

Interview with Clojure author

by John on March 8, 2010

Simple-talk has an interview with Rich Hickey, author of the programming language Clojure (pronounced “closure”). Clojure is a dialect of Lisp designed to run on top of the Java Virtual Machine. The language is also being ported to the .NET framework as Clojure CLR.

Two things stood out to me in the interview: a comparison of Lisp with C++, and a discussion of complexity.

You’ll often hear a programmer argue that language X is better than language Y.  To support their argument, they’ll say they wrote a program in Y, then wrote it in X in less time. For example, someone might argue that Ruby is better than Python because they were able to rewrite their web site using Ruby in half the time it took to write the original Python version. Such arguments are weak because you can write anything faster the second time. The first implementation required analysis and design that the second implementation can reuse entirely or at least learn from.

Rich Hickey argues that he can develop programs in Lisp faster than in C++. He offers as support that he first wrote something in Lisp and then took three times longer to rewrite it in C++. This is just a personal anecdote, not a scientific study, but it carries more weight than the usual anecdote because he’s claiming the first language was more efficient than the second.

In his discussion of incidental complexity, complexity coming from ones tools rather than from the intrinsic complexity of the problem being solved, Hickey says

I think programmers have become inured to incidental complexity, in particular by confusing familiar or concise with simple. And when they encounter complexity, they consider it a challenge to overcome, rather than an obstacle to remove. Overcoming complexity isn’t work, it’s waste.

The phrase “confusing familiar or concise with simple” is insightful. I never appreciated the arguments about the complexity of C++ until I got a little distance from the language; C++ was so familiar I didn’t appreciate how complex it is until I had a break from writing it. Also, simple solutions are usually concise, but concise solutions may not be simple. I chuckle whenever I hear someone say a problem was simple to solve because they were able to solve it in one line — one long stream of entirely mysterious commands.

Thanks to Omar Gomez for pointing out the interview article.

Related posts:

A little simplicity goes a long way
I disagree with Torvalds about C++
Baklava code

{ 3 comments }

I disagree with Linus Torvalds about C++

by John on September 15, 2009

I heard about this note from Linus Torvalds from David Wolever yesterday. Here’s Torvald’s opinion of C++.

C++ is a horrible language. It’s made more horrible by the fact that a lot of substandard programmers use it, to the point where it’s much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C.

Well, I’m nowhere near as talented a programmer as Linus Torvalds, but I totally disagree with him. If it’s easy to generate crap in a relatively high-level and type-safe language like C++, then it must be child’s play to generate crap in C. It’s not fair to compare world-class C programmers like Torvalds and his peers to average C++ programmers. Either compare the best with the best or compare the average with the average. Comparing the best with the best isn’t very interesting. I imagine gurus like Bjarne Stroustrup and Herb Sutter can write C++ as skillfully as Linus Torvalds writes C, though that is an almost pointless comparison. Comparing average programmers in each language is more important, and I don’t believe C would come out on top in such a comparison.

Torvalds talks about “STL and Boost and other total and utter crap.” A great deal of thought has gone into the STL and to Boost by some very smart people over the course of several years. Their work has been reviewed by countless peers. A typical C or C++ programmer simply will not write anything more efficient or more robust than the methods in these libraries if they decide to roll their own.

Torvalds goes on to say

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C.

I’ve had the opposite experience. I’d say that anyone wanting to write a large C program ends up reinventing large parts of C++ and doing it poorly. The features added to C to form C++ were added for good reasons. For example, once you’ve allocated and de-allocated C structs a few times, you realize it would be good to have functions to do this allocation and de-allocation. You basically end up re-inventing C++ constructors and destructors. But you end up with something totally sui generis. There’s no compiler support for the conventions you’ve created. No one can read about your home-grown constructors and destructors in a book. And you probably have not thought about as many contingencies as the members of the C++ standards committee have thought of.

I disagree that writing projects in C keeps out inferior C++ programmers who are too lazy to write C. One could as easily argue the opposite, that C is for programmers too lazy to learn C++. Neither argument is fair, but I think there is at least as much validity to the latter as there is to the former. I think there may be a sort of bimodal distribution of C programmer talent: some of the best and some of the worst programmers use C but for different reasons.

I do not claim that C++ is perfect, but I’ve never had any desire to go back to C after I moved to C++ many years ago. I’ll grant that I’m not writing my own operating system, but neither are the vast majority of programmers. For my work, C++ is as low-level as I care to go.

Related posts:

Conservation of complexity
Two perspectives on the design of C++
C++ templates may reduce memory footprint
Porting Visual C++ code to Linux/gcc

{ 84 comments }

Has C++ jumped the shark?

by John on August 18, 2009

Bjarne Stroustrup, creator of C++, wrote an article for Dr. Dobbs recently lamenting the decision to cut “concepts” from the upcoming revision of the C++ standard. His article left me with the feeling that C++ had jumped the shark.

The upcoming standard has been called “C++0x” based on the assumption (or at least the hope) that the standard would come out in the first decade of this century. But there will be no C++0x; now it will have to be C++1x. Part of the reason for removing concepts was to avoid pushing the release of the the standard out even further. Stroustrup says he expects concepts will be added to the standard in five years. How many people will care by the time the standard is finished?

I’ve written C++ for a long time and I still use C++ for some problems. I like the language, but it has gone about as far as it can go. It’s safe to say the language has essentially stopped evolving if new standards are only going to come out every decade or two.

I have great respect for the people working on the C++ standard committee. The design of C++ is an engineering marvel given its constraints. But if it takes such Herculean efforts to make changes, maybe it’s time to call the language finished.

I’m content with the current version of C++. If I used C++ for everything I do, maybe I’d be anxious for new features. But if something is hard to do in C++, I just don’t use C++. I don’t see a new version of C++ changing my decisions about what language to use for various tasks. If something is easier to do in Python than in C++ now, for example, that will probably still be the case when the new standard is implemented.

Update: The ISO committee approved the final draft of the C++ 2011 standard 25 March 2011.

Related links:

Closures and lambdas in C++
Two perspectives on the design of C++
Getting started with C++ TR1 regular expressions
Random number generation in C++ TR1
Strings in Visual C++

{ 21 comments }

Two perspectives on the design of C++

by John on March 17, 2009

Here are two complementary (but not entirely complimentary!) blog posts about C++.

Roshan James has a scathing article about C++. When asked to recommend books on C++, he replied that he doesn’t recommend C++. He explains how the best C++ books may be Scott Meyer’s series Effective C++ but argues that they should be called “Defective C++. ” He isn’t criticizing Scott Meyers, only the aspects of the C++ language that made it necessary for Scott Meyers to write such books. Effective C++ explains how to get around problems that don’t exist in more recent languages.

Bruce Eckel’s article The Positive Legacy of C++ and Java focused more on what C++ did well. C++ was designed to be backwardly compatible with C. Bjarne Stroustrup, original author  of C++, realized that the decision to be compatible with C would cause major difficulties, but he also thought (correctly) that without such compatibility no one would move over to the new language. Given this severe constraint, C++ has been remarkably well designed.

Update: Check out the C++ FQA site Alessandro Gentilini mentions in the comments. “FQA” is not a typo. It stands for Frequently Questioned Answers.

{ 8 comments }

Patrick Getzmann and I have been exchanging email about a problem he had using some sample code I’d written for working with regular expressions in C++. I wasn’t much help, but Patrick figured it out. I wanted to post his solution here in case someone else has the same problem.

His code would compile but not link. The compiler gave the error message “regex.obj : error LNK2019 …” His solution follows.

I have German Visual Studio installed. The Feature Pack is bundled with SP1 in the German Version. My Installation order was:

1. Visual Studio 2008
2. Visual Studio 2008 SP1 (including the Feature Pack)
3. Windows SDK for Windows Server 2008 and .NET Framework 3.5

But it should be (If one needs/wants the Server 2008 SDK):

1. Visual Studio 2008
2. Windows SDK for Windows Server 2008 and .NET Framework 3.5
3. Visual Studio 2008 SP1 (including the Feature Pack)

Otherwise reinstallation of SP1 helps.

The same problem would show up if you were using C++ TR1 random number generators. In a nutshell, try reinstalling SP1.

{ 2 comments }

Here’s something I do all the time. I have a function of one variable and several parameters. I implement it as a function object in C++ so I can pass it on to code that does something with functions of one variable, such as integration or optimization. I’ll give a trivial example and then show the most recent real problem I’ve worked on.

Say I have a function f(x; a, b, c) = 1000a + 100b + 10c + x. In a sense this is simply a function of four variables. But the connotation of using a semicolon rather than a comma after the x is that I think of x as being a variable and I think of a, b, and c as parameters. So f is a function of one variable that depends on three constants. (A “parameter” is a “constant” that can change!)

I create a C++ function object with two methods. One method is a constructor that takes the function parameters as arguments and saves them to member variables. The other method is an overload of the parenthesis method. That’s what makes the class a function object. By overloading the parenthesis method, I can call an instance of the class as if it were a function. Here’s some code.

class FunctionObject
{
public:
	FunctionObject(double a, double b, double c)
	{
		m_a = a;
		m_b = b;
		m_c = c;
	}

	double operator()(double x) const
	{
		return 1000*m_a + 100*m_b + 10*m_c + x;
	}

private:
	double m_a;
	double m_b;
	double m_c;
};

So maybe I instantiate an instance of this function object and pass it to a function that finds the maximum value over an interval [a, b]. The code might look like this.

FunctionObject f(3, 1, 4);
double maximum = Maximize(f, a, b);

Here’s a more realistic example. A few days ago I needed to solve this problem. Given user input parameters λ, σ, n, and ξ, find b such that the following holds.

\int_0^1 \frac{1}{\sqrt{2}\nu} \Phi\left(\frac{\lambda \sqrt{2\nu n}}{\sqrt{\sigma^2(1 - 2\nu) + bn}}\right) \, d\nu = \xi

The function Φ above is the CDF of a standard normal random variable, defined here.

To solve this problem, I wrote a function object to evaluate the left side of the equation above. It takes λ, σ, and n as constructor arguments and takes b as an argument to operator(). Then I passed the function object to a root-finding method to solve for the value of b that makes the function value equal ξ. But my function is defined in terms of an integral, so I needed to write another function object first that returns the integrand. Then I pass that function object to this numerical integration routine.  So I had to write two function objects to solve this problem.

There are several advantages to function objects over functions. For example, I would typically do parameter validation in the constructor. Quite often I also do some expensive calculations in the constructor and cache the results so that each call to operator() is then more efficient. Maybe I want to keep track of how often the function is called, so I put in some sort of odometer method that increments a counter with each call.

Unfortunately there’s a fair amount of code to write in order to implement even the simplest function. This effort hardly matters in production code; so many other things take more time. But it is annoying when doing some quick exploration. The next post shows how this can be done much easier in Python. The Python approach would be much easier for small problems, but it doesn’t have the advantages mentioned above such as caching expensive calculations in a constructor.

{ 8 comments }