Is fast grep faster?

The grep utility searches text files for regular expressions, but it can search for ordinary strings since these strings are a special case of regular expressions. However, if your regular expressions are in fact simply text strings, fgrep may be much faster than grep. Or so I’ve heard. I did some benchmarks to see.

Strictly speaking I used grep -F rather than fgrep. On Linux, if you ask for the man (manual) page for fgrep you’ll be taken to the man page for grep which says

In addition, the variant programs egrep, fgrep and rgrep are the same as grep -E, grep -F, and grep -r, respectively. These variants are deprecated, but are provided for backward compatibility.

I was working on a project for a client where I had to search for a long list of words in a long list of files [1]. This is the kind of task where fgrep (“fast grep”) is supposed to be much faster than grep. It was a tiny bit faster, not enough to notice. When I timed it the difference was on the order of 1%.

I ran an analogous search on my own computer with different data and got similar results [2]. There may be instances where fgrep is much faster than grep, but I haven’t seen one first hand.

I suspect that the performance difference between fgrep and grep used to be larger, but the latter has gotten more efficient. Now grep is smart enough to search for strings quickly without having to be told explicitly via -F that the regular expressions are in fact strings. Maybe it scans the regular expression(s) before searching and effectively sets the -F flag itself if appropriate.

Related posts

[1] I used the -f to tell grep the name of a file containing the terms to search for, not to be confused with the additional flag -F to tell grep that the search terms are simply strings.

[2] I got similar results when I was using Linux (WSL) on my computer. When I used grep from GOW the -F flag made the search 24 times faster. Because the GOW project provides light-weight ports of Gnu tools to Windows, it’s understandable that it would not include some of the optimizations in Gnu’s implementation of grep.

2 thoughts on “Is fast grep faster?

  1. ripgrep is often advocated as being a faster version of grep, thanks to just-in-time compilation of the regular expression matcher and parallel execution. It also has some more user-friendly features, like defaulting to recursive search and ignoring .gitignore files. I never benchmarked it before, but for comparison:

    On a Linux workstation with 64 GiB RAM, working on 42 GB of data across 284 files, “grep ‘foo bar’ m*.20o” reported 0.76 seconds of user-space time, 3.68 seconds system time, and 4.448 seconds total time, to process about 42 GB of data cached in memory. (System time and wall-clock time were significantly higher the first few runs, when the data was being read from an SSD.) In contrast, “rg ‘foo bar’ m*.20o” reported 1.39 seconds of user-space time, 6.62 seconds system time, and 0.699 seconds total time — 1145% of a CPU core!

    On the other hand, “grep -i ‘foo bar’ m*.20o” and “rg -i ‘foo bar’ m*.20o” were astoundingly different: ripgrep reported 3.52 seconds user-space, 5.90 seconds system, and 0.816 seconds total. GNU grep took 27.63 seconds user space, 3.89 seconds system, and 31.544 seconds total. There is some seriously missed optimization in grep, because “grep -i ‘(foo bar)’ m*.20o” only takes 4.435 seconds; rg is insignificantly different with the parentheses.

    These searches do not match any text in the file, so output speed does not affect these results.

  2. I believe the “f” in “fgrep” stands for “fixed”, as in, fgrep can only be used to search for fixed strings, not regular expressions.

    I could look at some open source egrep source code to confirm this, but if I were to implement egrep, I’d first look at the search string to see if it was a regular expression or not; if not, I’d just do what “fgrep” does.

    The only reason I see to use “fgrep” is to avoid having to add escape characters if the search string contain characters that would otherwise be interpreted as regular expression characters; e.g., * . ^ $ ? | and others.

Comments are closed.