I started this post by wanting to look at the frequency of LaTeX commands, but then thought that some people mind find the code to find the frequencies more interesting than the frequencies themselves.
So I’m splitting this into two posts. This post will look at the shell one-liner to find command frequencies, and the next post will look at the actual frequencies.
I want to explore LaTeX files, so I’ll start by using
find to find such files.
find . -name "*.tex"
This searches for files ending in
.tex, starting with the current directory (hence
.) and searching recursively into subdirectories. The
find command explores subdirectories by default; you have to tell it not to if that’s not what you want.
Next, I want to use
grep to search the LaTeX files. If I pipe the output of
grep it will search the file names, but I want it to search the file contents. The
xargs command takes care of this, receiving the file names and passing them along as file names, i.e. not as text input.
find . -name "*.tex" | xargs grep ...
LaTeX commands have the form of a backslash followed by letters, so the regular expression I’ll pass is
\\[a-z]+. This says to look for a literal backslash followed by one or more letters.
grep four option flags. I’ll use
-i to ask it to use case-insensitive matching, because LaTeX commands can begin contain capital letters. I’ll use
-E to tell it I want to use extended regular expressions .
I’m after just the commands, not the lines containing commands, and so I use the
-o option to tell
grep to return just the commands, one per line. But that’s not enough. I would be enough if we were only search one file, but since we’re searching multiple files, the default behavior is for
grep to return the file name as well. The
-h option tells it to only return the matches, no file names.
So now we’re up to this:
find . -name "*.tex" | xargs grep -oihE '\\[a-z]+'
Next I want to count how many times each command occurs, and I need to sort the output first so that
uniq will count correctly.
find . -name "*.tex" | xargs grep -oihE '\\[a-z]+' | sort | uniq -c
And finally I want to sort the output by frequency, in descending order. The
-n option tells
sort to sort numerically, and
-r says to sort in descending order than the default ascending order. This produces a lot of output, so I pipe everything to
less to view it one screen at a time.
find . -name "*.tex" | xargs grep -oihE '\\[a-z]+' | sort | uniq -c | sort -rn | less
That’s my one-liner. In the next post I’ll look at the results.
More command line posts
 I learned regular expressions from writing Perl long ago. What I think of a simply a regular expression is what
grep calls “extended” regular expressions, so adding the
-E option keeps me out of trouble in case I use a feature that
grep considers an extension. You could use
egrep instead, which is essentially the same as