Double words such as “the the” are a common source of writing errors. On the other hand, some doubled words are legitimate. You might, for example, find “had had” or “that that” in a grammatically correct sentence.
I’ve been looking through my website to purge erroneous double words, and found a few doubles that are correct in context but would probably be incorrect elsewhere.
In ordinary English prose, long long is probably not what the author intended. There should either be a comma between the two words or a different choice of words. But in C code snippets, you’ll see
long long as a type of integer. Also, it is common in many programming languages for a type and a variable to have the same name with varying capitalization, such as
FILE file in C.
There are several pages on my site that refer to the Blum Blum Shub cryptographic random number generator. (The name of this algorithm always makes me think of a line from Night at the Museum.)
There are several pages on this site that use log log, always in the context of number theory. Logarithms of logarithms come up frequently in that context.
I also refer to unknown unknowns. The press ridiculed Donald Rumsfeld mercilessly when he first used this expression, but now the phrase is commonly used because more people understand that it names an important concept. It comes up frequently in statistics because so much attention is focused on known unknowns, even though unknown unknowns are very often the weakest link.
By the way, if you’d like to make a list of doubled words in a file, you could run the following shell one-liner:
egrep -i -o '\<([a-z]+) \1\>' myfile | sort | uniq > doubles
I used something like this on a backup of my site to search for doubled words.