Initialize and print 128-bit integers in C

If you look very closely at my previous post, you’ll notice that I initialize a 128-bit integer with a 64-bit value. The 128-bit unsigned integer represents the internal state of a random number generator. Why not initialize it to a 128-bit value? I was trying to keep the code simple.

A surprising feature of C compilers, at least of GCC and Clang, is that you cannot initialize a 128-bit integer to a 128-bit integer literal. You can’t directly print a 128-bit integer either, which is why the previous post introduces a function print_u128.

The code

__uint128_t x = 0x00112233445566778899aabbccddeeff;

Produces the following error message.

error: integer literal is too large to be represented in any integer type

The problem isn’t initializing a 128-bit number to a 128-bit value; the problem is that the compiler cannot parse the literal expression

0x00112233445566778899aabbccddeeff

One solution to the problem is to introduce the macro

#define U128(hi, lo) (((__uint128_t)(hi) << 64) | (lo))

and use it to initialize the variable.

__uint128_t x = U128(0x0011223344556677, 0x8899aabbccddeeff);

You can verify that x has the intended state by calling print_u128 from the previous post.

void print_u128(__uint128_t n)
{
    printf("0x%016lx%016lx\n",
           (uint64_t)(n >> 64),      // upper 64 bits
           (uint64_t)n);             // lower 64 bits
}

Then

print_u128(x);

prints

0x00112233445566778899aabbccddeeff

Update. The code for print_u128 above compiles cleanly with gcc but clang gives the following warning.

warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]

You can suppress the warning by including the inttypes header and modifying the print_u128 function.

Here’s the final code. It compiles cleanly under gcc and clang.

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#define U128(hi, lo) (((__uint128_t)(hi) << 64) | (lo))

void print_u128(__uint128_t n)
{
    printf("0x%016" PRIx64 "%016" PRIx64 "\n",
           (uint64_t)(n >> 64),
           (uint64_t)n);
}

int main(void)
{
    __uint128_t x = U128(0x0011223344556677, 0x8899aabbccddeeff);
    print_u128(x);
    return 0;
}

Two cheers for ugly code

Ugly code may be very valuable, depending on why it’s ugly. I’m not saying that it’s good for code to be ugly, but that code that is already ugly may be valuable.

Some of the ugliest code was started by someone who knew the problem domain well but did not know how to write maintainable code. It may implicitly contain information that is not explicitly codified anywhere else. It may contain information the original programmer isn’t even consciously aware of. It’s often easier to clean up the code than to surface the information it contains using any other source.

Another way code gets ugly is undisciplined modification by multiple programmers over a long period of time. In that case the code has proved to be useful. It’s the opposite of Field of Dreams code that you hope will be used if you build it.

Working effectively with legacy code is hard. It may be easier, and certainly more pleasant, to start from scratch. Even so, there may be more to learn from the old code than is immediately obvious.

***

This post was motivated by looking to extend some code I use in my business. It wouldn’t win a beauty pageant, but it’s very useful.

Writing this post reminded me of a post Productive Productivity that I wrote a while back. From that post:

The scripts that have been most useful are of zero interest to anyone else because they are very specific to my work. I imagine that’s true of most scripts ever written.

Pioneering work is ugly

“A mathematician’s reputation rests on the number of bad proofs he has given. (Pioneer work is clumsy.)” — A. S. Besicovitch

I’m sure I’ve written about this quote somewhere, but I can’t find where. The quote comes from A Mathematician’s Miscellany by J. E. Littlewood, citing Besicovitch.

I’ve more often seen the quote concluding with “Pioneering work is ugly.” Maybe that’s what Besicovitch actually said, but I suspect it came from someone misremembering/improving Littlewood’s citation. Since the words are in parentheses, perhaps Besicovitch didn’t say them at all but Littlewood added them as commentary.

One way of interpreting the quote is to say it takes more creativity to produce a rough draft than to edit it.

The quote came to mind when I was talking to a colleague about the value of ugly code, code that is either used once or that serves as a prototype for something more lasting.

This is nearly the opposite of the attitude I had as a software developer and as a software team manager. But it all depends on context. Software techniques need to scale down as well as scale up. It doesn’t make sense to apply the same formality to disposable code and to enterprise software.

Yes, supposedly disposable code can become permanent. And as soon as someone realizes that disposable code isn’t being disposed of it should be tidied up. But to write every little one-liner as if it is going to be preserved for posterity is absurd.

Naming Awk

The Awk programming language was named after the initials of its creators. In the preface to a book that just came out, The AWK Programming Language, Second Edition, the authors give a little background on this.

Naming a language after its creators shows a certain paucity of imagination. In our defense, we didn’t have a better idea, and by coincidence, at some point in the process we were in three adjacent offices in the order Aho, Weinberger, and Kernighan.

By the way, here’s a nice line from near the end of the book.

Realistically, if you’re going to learn only one programming language, Python is the one. But for small programs typed at the command line, Awk is hard to beat.

A small programming language

Paul Graham said “Programming languages teach you not to want what they don’t provide.” He meant that as a negative: programmers using less expressive languages don’t know what they’re missing. But you could also take that as a positive: using a simple language can teach you that you don’t need features you thought you needed.

Awk

I read the original awk book recently, published in 1988. It’s a small book for a small language. The language has grown since 1988, especially the Gnu implementation gawk, and yet from the beginning the language had a useful set of features. Most of what has been added since then is of no use to me.

How I use awk

It has been years since I’ve written an awk program that is more than one line. If something would require more than one line of awk, I probably wouldn’t use awk. I’m not morally opposed to writing longer awk programs, but awk’s sweet spot is very short programs typed at the command line.

At one point when I was saying how I like little awk programs, someone suggested I use Perl one-liners instead because then I’d have access to Perl’s much richer set of features, in particular Perl regular expressions. Along those lines, see these notes on how to write Perl one-liners to mimic sed, grep, and awk.

But when I was reading the awk book I thought about how I rarely need the the features awk doesn’t have, not for the way I use awk. If I were writing a large program, not only would I want more features, I’d want a different language.

Now my response to the suggestion to use Perl one-liners would be that the simplicity of awk helps me focus by limiting my options. Awk is a jig. In Paul Graham’s terms, awk teaches me not to want what it doesn’t provide.

Regular expressions

At first I wished awk were more expressive is in its regular expression implementation. But awk’s minimal regex syntax is consistent with the aesthetic of the rest of the language. Awk has managed to maintain its elegant simplicity by resisting calls to add minor conveniences that would complicate the language. The maintainers are right not to add the regex features I miss.

Awk does not support, for example, \d for digits. You have to type [0-9] instead. In exchange for such minor inconveniences you get a simple but adequate regular expression implementation that you could learn quickly. See notes on awk’s regex features here.

The awk book describes regular expressions in four leisurely pages. Perl regular expressions are an order of magnitude more complex, but not an order of magnitude more useful.

 

Productive constraints

This post will discuss two scripting languages, but that’s not what the post is really about. It’s really about expressiveness and (or versus) productivity.

***

I was excited to discover the awk programming language sometime in college because I had not used a scripting language before. Compared to C, awk was high-level luxury.

Then a few weeks later someone said “Have you seen Perl? It can do everything awk can do and a lot more.” So I learned Perl. Was that or a good idea or a bad idea? I’ve been wondering about that for years.

Awk versus Perl is a metaphor for a lot of other decisions.

***

Awk is a very small language, best suited for working with tabular data files. Awk implicitly loops over a file, repeating some code on every line of a file. This makes it possible to write very short programs, programs so short that they can be typed at the command line, for doing common tasks. I am continually impressed by bits of awk code I see here and there, where someone has found a short, elegant solution to a problem.

Because awk is small and specialized, it is also efficient at solving the problems it is designed to solve. The previous post gives an example.

The flip side of awk being small and specialized is that it can be awkward to use for problems that deviate from its sweet spot.

***

Perl is a very expressive programming language and is suitable for a larger class of problems than awk is. Awk was one of the influences in the design of Perl, and you can program in an awk-like subset of Perl. So why not give yourself more options and write Perl instead?

Expressiveness is mostly good. Nobody is forcing you to use any features you don’t want to use and it’s nice to have options. But expressiveness isn’t a free option. I’ll mention three costs.

  1. You might accidentally use a feature that you don’t intend to use, and rather than getting an error message you get unexpected behavior. This is not a hypothetical risk but a common experience.
  2. If you have more options, so does everyone writing code that you might need to debug or maintain. “Expressiveness for me but not for thee.”
  3. More options means more time spent debating options. Having too many options dissipates your energy.

***

You can mitigate #1 by turning on warnings available in later versions of Perl. And you can mitigate #2 and #3 by deciding which language features you (or your team) will use and which features you will avoid.

But if you use a less expressive language, these decisions have been made for you. No need to decide on and enforce rules on features to shun. Avoiding decision fatigue is great, if you can live with the decisions that have been made for you.

The Python community has flourished in part because the people who don’t like the language’s design choices would rather live with those choices than leave these issues permanently unsettled.

***

Bruce Lee famously said “I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times.” You could apply that aphorism by choosing to master a small language like awk, learning not just its syntax but its idioms, and learning it so well that you never need to consult documentation.

Some people have done this, mastering awk and a few other utilities. They write brief scripts that do tasks that seem like they would require far more code. I look at these scripts expecting to see utilities or features that I didn’t know about, but usually these people have combined familiar pieces in a clever way.

***

Some people like to write haiku and some free verse. Hedgehogs and foxes. Scheme and Common Lisp. Birds and Frogs. Awk and Perl. So the optimal size of your toolbox is to some extent a matter of personality. But it’s also a matter of tasks and circumstances. There are no solutions, only trade-offs.

Software and the Allee effect

The Allee effect is named after Warder Clyde Allee who added a term to the famous logistic equation. His added term is highlighted in blue.

\frac{dN}{dt} = r N {\color{red}\left( \frac{N}{A} - 1 \right)} \left( 1 - \frac{N}{K} \right)

Here N is the population of a species over time, r is the intrinsic rate of increase, K is the carrying capacity, and A is the critical point.

If you remove Allee’s term, you get an equation saying that the rate of growth of a population is proportional to the current population size, and so growth starts out exponential, and a term (1 – N/K), which says growth slows down as the population approaches its carrying capacity.

Allee’s term (N/A – 1) says that the rate of growth becomes negative when the population falls below some threshold A. When there are too few individuals, survival becomes more difficult.

Software metaphor

I thought of the Allee effect as a metaphor for software technology after writing my previous post. In general, problems become easier to solve over time. Software development may become harder because the problems developers solve are changing, but solving old problems typically gets easier. Algorithms improve and get wrapped up for convenience. There’s something like logistic growth where tasks get easier to solve, but improvement slows down over time.

If a problem is specialized, it can run into something like the Allee effect. It becomes harder over time because fewer people are interested in it. Software isn’t maintained as fast as it degrades. Fewer people have experience with it. It’s harder to be a COBOL programmer, for example, than it used to be. But this can also apply to much more current problems. A problem that was hot five years ago can be harder to solve now than it was then, for reasons the previous post discusses.

Branch cuts for elementary functions

As far as I know, all contemporary math libraries use the same branch cuts when extending elementary functions to the complex plane. It seems that the current conventions date back to Kahan’s paper [1]. I imagine to some extent he codified existing practice, but he also settled some issues, particularly regarding floating point implementation.

I’ve verified that the following branch cuts are used by Mathematica, Common Lisp, and SciPy. If you know of any software that follows other conventions, please let me know in a comment.

The conventional branch cuts are as follows.

  • sqrt: [−∞, 0)
  • log: [−∞, 0]
  • arcsin: [−∞, −1] and [1, ∞]
  • arccos: [−∞, −1] and [1, ∞]
  • arctan: [−∞i, −i] and [i, ∞i]
  • arcsinh: [−∞i, −i] and [i, ∞i]
  • arccosh: [−∞, 1]
  • arctanh: [−∞, -1] and [1, ∞]

Related posts

[1] W. Kahan. Branch Cuts for Complex Elementary Functions or Much Ado About Nothing’s Sign Bit. The State of the Art in Numerical Analysis. Clarendon Preess (1987).

Code katas taken more literally

Karate class

Code katas are programming exercises intended to develop programming skills, analogous to the way katas develop martial art skills.

But literal katas are choreographed. They are rituals rather than problem-solving exercises. There may be an element of problem solving, such as figuring how to better execute the prescribed movements, but katas are rehearsal rather than improvisation.

CodeKata.com brings up the analogy to musical practice in the opening paragraph of the home page. But musical practice is also more ritual than problem-solving, at least for classical music. A musician might go through major and minor scales in all 12 keys, then maybe a chromatic scale over the range of the instrument, then two different whole-tone scales, etc.

A code kata would be more like a jazz musician improvising a different melody to the same chord changes every day. (Richie Cole would show off by improvising over the chord changes to Cherokee in all twelve keys. I don’t know whether this was a ritual for him or something he would pull out for performances.)

This brings up a couple questions. What would a more literal analog of katas look like for programming? Would these be useful?

I could imagine someone going through a prescribed sequence of keystrokes that exercise a set of software features that they wanted to keep top of mind, sorta like practicing penmanship by writing out a pangram.

This is admittedly a kind of an odd idea. It makes sense that the kinds of exercises programmers are interested in require problem solving rather than recall. But maybe it would appeal to some people.

***

Image “karate training” by Genista is licensed under CC BY-SA 2.0 .

Visualizing C operator precedence

Here’s an idea for visualizing C operator precedence. You snake your way through the diagram starting from left to right.

Operators at the same precedence level are on the same horizontal level.

Following the arrows for changing directions, you move from left-to-right through the operators that associate left-to-right and you move right-to-left through the operators that associate right-to-left.

Although this diagram is specifically for C, many languages follow the same precedence with minor exceptions. For example, all operators that Perl shares with C follow the same precedence as C.

visualization of C operator precedence

Related posts