Perl as a better …

Today I ran across Minimal Perl: For UNIX and Linux People. The book was published a few years ago but I hadn’t heard of it because I haven’t kept up with the Perl world. The following chapters from the table of contents jumped out at me because I’ve been doing a fair amount of awk and sed lately.:


3. Perl as a (better) grep command
4. Perl as a (better) sed command
5. Perl as a (better) awk command
6. Perl as a (better) find command

These chapters can be read a couple ways. The most obvious reading would be “Learn a few features of Perl and use it as a replacement for a handful of separate tools.”

But if you find these tools familiar and are not looking to replace them, you could read the book as saying “Here’s an introduction to Perl that teaches you the language by comparing it to things you already know well.”

The book suggests learning one tool instead of several, and in the bargain getting more powerful features, such as more expressive pattern matching. It also suggests not necessarily committing to learn the entire enormous Perl language, and not necessarily committing to use Perl for every programming task.

Regarding Perl’s pattern matching, I could relate to the following quip from the book.

What’s the only thing worse than not having a particular metacharacter … in a pattern-matching utility? Thinking you do, when you don’t! Unfortunately, that’s a common problem when using Unix utilities for pattern matching.

That was my experience just yesterday. I wrote a regular expression containing \d for a digit and couldn’t understand why it wasn’t matching.

Most of the examples rely on giving Perl command line options such as -e so that it acts more like command line utility. The book gives numerous examples carrying out common tasks in grep etc. and with Perl one-liners. The latter tend to be a little more verbose. If a task falls in the sweet spot of a common tool, that tool’s syntax will be more succinct. But when a task falls outside that sweet spot, such as matching a pattern that cannot be easily expressed with traditional regular expressions, the Perl solution will be shorter.

More specifics

This is an update, written March 3, 2021.

If you’re going to use Perl as a replacement for command line tools, you’ll need to know about one-liners and quoting.

Here is a post that covers Perl as a better grep.

If your main use for sed is to run commands like s/foo/bar/g, you can do this in Perl with

    perl -ple 's/foo/bar/g'

I talk more about using Perl to replace sed here.

If you want to use Perl as a replacement for awk, the main thing you need to know about is the -a option. This populates an array @F which corresponds to $1, $2, $3, etc. in awk. Note however that Perl arrays are indexed from 0, so $F[0] corresponds to $1 etc. A few more correspondences between the languages are given in the table below.

    | awk | perl  |
    |-----+-------|
    | $0  | $_    |
    | $2  | $F[1] |
    | RS  | $/    |
    | ORS | $\    |
    | OFS | $,    |

Perl can have BEGIN and END blocks just like awk.

You can set the field separator in Perl with -F, such as -F: to make the field separator a colon. In newer versions of Perl 5 you don’t have to specify -a if you specify -F; it figures that if you’re setting the field separator, you must want an array of fields to play with.

Learn one Perl command

A while back I wrote a post Learn one sed command. In a nutshell, I said it’s worth learning sed just do commands of the form sed s/foo/bar/ to replace “foo” with “bar.”

Dan Haskin and Will Fitzgerald suggested in their comments that instead of sed use perl -pe with the same command. The advantage is that you could use Perl’s more powerful regular expression syntax. Will said he uses Perl like this:

    cat file | perl -lpe "s/old/new/g" > newfile

I think they’re right. Except for the simplest regular expressions, sed’s regular expression syntax is too restrictive. For example, I recently needed to remove commas that immediately follow a digit and this did the trick:

    cat file | perl -lpe "s/(?<=\d),//g" > newfile

Since sed does not have the look-behind feature or d for digits, the corresponding sed code would be more complicated.

I quit writing Perl years ago. I don’t miss Perl as a whole, but I do miss Perl’s regular expression support.

Learning Perl is a big commitment, but just learning Perl regular expressions is not. Perl is the leader in regular expression support, and many programming languages implement a subset of Perl’s regex features. You could just use a subset of Perl features you already know, but you’d have the option of using more features.

Sed one-liners

A few weeks ago I reviewed Peteris Krumins’ book Awk One-Liners Explained. This post looks at his sequel, Sed One-Liners Explained.

The format of both books is the same: one-line scripts followed by detailed commentary. However, the sed book takes more effort to read because the content is more subtle. The awk book covers the most basic features of awk, but the sed book goes into the more advanced features of sed.

Sed One-Liners Explained provides clear explanations of features I found hard to understand from reading the sed documentation. If you want to learn sed in depth, this is a great book. But you may not want to learn sed in depth; the oldest and simplest parts of sed offer the greatest return on time invested. Since the book is organized by task — line numbering, selective printing, etc — rather than by language feature, the advanced and basic features are mingled.

On the other hand, there are two appendices  organized by language feature. Depending on your learning style, you may want to read the appendices first or jump into the examples and refer to the appendices only as needed.

For a sample of the book, see the table of contents, preface, and first chapter here.

Related links

Learn one sed command

You may have seen sed programs even if you didn’t know that’s what they were. In online discussions it’s common to hear someone say

s/foo/bar/

as a shorthand to mean “replace foo with bar.” The line s/foo/bar/ is a complete sed program to do such a replacement.

sed comes with every Unix-like operating system and is available for Windows here. It has a range of features for editing files, but sed is worth using even if you only know how to do one thing with it:

sed "s/pattern1/pattern2/g" file.txt > newfile.txt

This will replace every instance of pattern1 with pattern2 in the file file.txt and will write the result to newfile.txt. The original file file.txt is unchanged.

I used to think there was no reason to use sed when other languages like Python will do everything sed does and much more. Suppose you agree with that. Now suppose you find you often have to make global search-and-replace operations and so you write a script to do this, say a Python script. You’ve got to call your script something, remember what you called it, and put it in your path. How about calling it sed? Or better, don’t write your script, but pretend that you did. If you’re on Linux, it’s already in your path. One advantage of the real sed over your script named sed is that the former can do a lot more, should you ever need it to.

Now for a few details regarding the sed command above. The “s” on the front stands for “substitute” and the “g” on the end stands for “global.” Without the “g” on the end, sed would only replace the first instance of the pattern on each line. If that’s what you want, then remove the “g.”

The patterns inside a sed command are regular expressions, so it’s best to get in the habit of always quoting sed commands. This isn’t necessary for simple string substitutions, but regular expressions often contain characters that you’ll need to prevent the shell from interpreting.

You may find the default regular expression support in sed odd or restrictive. If you’re used to regular expressions in Perl, Python, JavaScript, etc. and you’re using a Gnu implementation of sed, you can add the -r option for more familiar regular expression syntax.

I got the idea for this post from Greg Grouthaus’ post Why you should learn just a little Awk. He makes a good case that you can benefit from learning just a few commands of a language like Awk with no intention to learn more of the language.

More regular expression posts