Recursive grep

The regular expression search utility grep has a recursive switch -R, but it may not work like you’d expect.

Suppose want to find the names of all .org files in your current directory and below that contain the text “cheese.”

You have four files, two in the working directory and two below, that all contain the same string: “I like cheese.”

    $ ls -R
    .:
    rootfile.org  rootfile.txt  sub
 
    ./sub:
    subfile.org  subfile.txt

It seems that grep -R can either search all files of the form *.org in the current directory, ignoring the -R switch, or search all files recursively if you don’t give it a file glob, but it can’t do both.

    $ grep -R -l cheese *.org
    rootfile.org
 
    $ grep -R -l cheese .
    ./rootfile.org
    ./rootfile.txt
    ./sub/subfile.org
    ./sub/subfile.txt

One way to solve this is with find and xargs:

    $ find . -name '*.org' | xargs grep -l cheese 
    ./rootfile.org                                                           
    ./sub/subfile.org                                                        

I was discussing this with Chris Toomey and he suggested an alternative using a subshell that seems more natural:

    grep -l cheese $(find . -name '*.org')

Now the code reads more like an ordinary call to grep. From left to right, it essentially says “Search for ‘cheese’ in files ending in .org” whereas the version with find reads like “Find files whose names end in .org and search them for ‘cheese.'” It’s good to understand how both approaches work.

Related posts

8 thoughts on “Recursive grep

  1. Hi,

    You need to enable the globstar option.

    $ shopt globstar
    globstar off

    $ grep 404 **/*csv

    $ shopt -s globstar
    $ shopt globstar
    globstar on

    $ grep 404 **/*csv
    bucket/aws/fuzz-s3-params.csv:images,http://s3.bucket.htb/adserver/images/,,3,404,207,24,7,
    ….
    snip

    bucket/aws/fuzz-s3-params.csv:logout,http://s3.bucket.htb/adserver/logout/,,53,404,207,24,7,
    bucket/aws/fuzz-s3-params.csv:comment,http://s3.bucket.htb/adserver/comment/,,54,404,207,24,7,

    $ shopt -u globstar
    $ shopt globstar
    globstar off

  2. Another approach is to make the shell generate the list of files to search using a glob pattern. For example, in Bash you could write

    grep -l cheese *.org **/*.org

    In Zsh you only need to say

    grep -l cheese **/*.org

  3. This unfortunately has a few poor suggestions.

    First, if you use the glob pattern, you’re relying on the shell globing, that explains what you see (that -R has no effect.)

    And then if you use find, the better practice is to use find … -exec grep … + for various reasons. Putting the stdout of find to subshell or pipe to xargs all have cornering cases that’s going to bite you.

  4. Lawrence Kesteloot

    The terse version at the end risks blowing the max size of the command line if there are too many .org files. The xargs program will invoke grep multiple times in that case.

    And if you’re sure there aren’t too many .org files, then most shells will accept this syntax:

    grep -l cheese **/*.org

  5. ripgrep is a 3rd party utility specifically designed to do this task very efficiently. It is available in most repos and on github.

    rg -torg cheese

    is the equivalent of the above

  6. I’d normally do this using find’s -exec argument:

    find . -name ‘*.org’ -exec grep -l cheese {} +

    or

    find . -name ‘*.org’ -exec grep -l cheese {} \;

  7. FWIW, GNU grep has a way to do this that doesn’t choke on filenames containing spaces, and also avoids the overhead of starting a new process for each file (this was more of a thing twenty-mumble years ago when I was a baby sysadmin, but it’s still relevant if you have a very large number of matching filenames).

    grep -r –include=’*.org’ cheese ./ # NB: both -r and -R work, see below

    Or, if you’re using org files, you’re probably in emacs. M-x rgrep explicitly prompts for a filename pattern to use when running, *and* gives you a nice result buffer that you can click/Enter on to go directly to the result in another emacs frame.

    -R vs -r: I’m not sure if you actually care about the difference between -R and -r; my guess is that you probably don’t: -R derefs symlinks while -r ignores them. I generally use -r because symlinks are often semantically “This doesn’t quite belong here” for me.

    Oops, if you’ll excuse me, there are some kids using superscalar multiprocessor RISC unix machines (iPhones) on my lawn that I need to go yell at.

Comments are closed.