Double words in technical writing

Double words such as “the the” are a common source of writing errors. On the other hand, some doubled words are legitimate. You might, for example, find “had had” or “that that” in a grammatically correct sentence.

I’ve been looking through my website to purge erroneous double words, and found a few doubles that are correct in context but would probably be incorrect elsewhere.

In ordinary English prose, long long is probably not what the author intended. There should either be a comma between the two words or a different choice of words. But in C code snippets, you’ll see long long as a type of integer. Also, it is common in many programming languages for a type and a variable to have the same name with varying capitalization, such as FILE file in C.

There are several pages on my site that refer to the Blum Blum Shub cryptographic random number generator. (The name of this algorithm always makes me think of a line from Night at the Museum.)

There are several pages on this site that use log log, always in the context of number theory. Logarithms of logarithms come up frequently in that context.

I also refer to unknown unknowns. The press ridiculed Donald Rumsfeld mercilessly when he first used this expression, but now the phrase is commonly used because more people understand that it names an important concept. It comes up frequently in statistics because so much attention is focused on known unknowns, even though unknown unknowns are very often the weakest link.

***

By the way, if you’d like to make a list of doubled words in a file, you could run the following shell one-liner:

   egrep -i -o '\<([a-z]+) \1\>' myfile | sort | uniq > doubles

I used something like this on a backup of my site to search for doubled words.

4 thoughts on “Double words”

Fatih Karakurt

30 June 2020 at 12:23

there is a `-u` switch for sort. `sort | uniq` == `sort -u`

Ben Bradley

30 June 2020 at 15:08

I learned C circa 1986 as ANSI C came out, but (especially coming from an embedded background where knowing the exact size of data is always important) I now consider named integer sizes such as long long to be obsolete. C99 came out with well-defined, exact integer size types such as int64_t, yet after two decades they’re not used often (enough, imho).

There are lin-lin and log-log (the dash shows they’re intentional and very unlikely to be a repetition mistake) to describe graph axes:
https://en.wikipedia.org/wiki/Logarithmic_scale

Iman

2 July 2020 at 06:49

How about legitimate triple words? Here is one example:
High high high alarm

John

2 July 2020 at 09:11

Thanks for the suggestion. I wrote a post to answer your question: https://www.johndcook.com/blog/2020/07/02/triple-words/

Comments are closed.