Perl as a better ...

Today I ran across Minimal Perl: For UNIX and Linux People. The book was published a few years ago but I hadn’t heard of it because I haven’t kept up with the Perl world. The following chapters from the table of contents jumped out at me because I’ve been doing a fair amount of awk and sed lately.:

…
3. Perl as a (better) grep command
4. Perl as a (better) sed command
5. Perl as a (better) awk command
6. Perl as a (better) find command
…

These chapters can be read a couple ways. The most obvious reading would be “Learn a few features of Perl and use it as a replacement for a handful of separate tools.”

But if you find these tools familiar and are not looking to replace them, you could read the book as saying “Here’s an introduction to Perl that teaches you the language by comparing it to things you already know well.”

The book suggests learning one tool instead of several, and in the bargain getting more powerful features, such as more expressive pattern matching. It also suggests not necessarily committing to learn the entire enormous Perl language, and not necessarily committing to use Perl for every programming task.

Regarding Perl’s pattern matching, I could relate to the following quip from the book.

What’s the only thing worse than not having a particular metacharacter … in a pattern-matching utility? Thinking you do, when you don’t! Unfortunately, that’s a common problem when using Unix utilities for pattern matching.

That was my experience just yesterday. I wrote a regular expression containing \d for a digit and couldn’t understand why it wasn’t matching.

Most of the examples rely on giving Perl command line options such as -e so that it acts more like command line utility. The book gives numerous examples carrying out common tasks in grep etc. and with Perl one-liners. The latter tend to be a little more verbose. If a task falls in the sweet spot of a common tool, that tool’s syntax will be more succinct. But when a task falls outside that sweet spot, such as matching a pattern that cannot be easily expressed with traditional regular expressions, the Perl solution will be shorter.

More specifics

This is an update, written March 3, 2021.

If you’re going to use Perl as a replacement for command line tools, you’ll need to know about one-liners and quoting.

Here is a post that covers Perl as a better grep.

If your main use for sed is to run commands like s/foo/bar/g, you can do this in Perl with

    perl -ple 's/foo/bar/g'

I talk more about using Perl to replace sed here.

If you want to use Perl as a replacement for awk, the main thing you need to know about is the -a option. This populates an array @F which corresponds to $1, $2, $3, etc. in awk. Note however that Perl arrays are indexed from 0, so $F[0] corresponds to $1 etc. A few more correspondences between the languages are given in the table below.

    | awk | perl  |
    |-----+-------|
    | $0  | $_    |
    | $2  | $F[1] |
    | RS  | $/    |
    | ORS | $\    |
    | OFS | $,    |

Perl can have BEGIN and END blocks just like awk.

You can set the field separator in Perl with -F, such as -F: to make the field separator a colon. In newer versions of Perl 5 you don’t have to specify -a if you specify -F; it figures that if you’re setting the field separator, you must want an array of fields to play with.

19 thoughts on “Perl as a better …”

Charlie

20 August 2013 at 18:06

Hi John,

I use R for data analysis and I’ve been thinking of learning a language for text pre-processing, especially for large data sets.

sed and awk are appealing because they work line-by-line. Is this how Perl works?

For my relatively modest purposes (pre-processing) is learning sed and (more likely) awk a better bet than diving into Perl?

Thanks for your insights.

Charlie

John

20 August 2013 at 18:32

Perl can be used to apply the same commands line-by-line, like sed or awk, but it doesn’t have to. It’s a general programming language.

Perl is much larger and much more powerful than sed and awk, and essentially contains these languages as subsets. So if you want to learn the more powerful language, learn Perl. But because sed and awk are much smaller, they’re easier to learn. And if you learn sed and awk first, then decide to learn Perl, you’ll recognize a lot of Perl features as coming from sed and awk.

neizod

20 August 2013 at 18:52

Any elegant way to use perl as an in-place substitute script (like `sed -i`)? Redirect output to old file name doesn’t work here.

Dave Jacoby

20 August 2013 at 19:00

perl -e “" is a start. There are more flags, but I don't do this enough to memorize them.

Dave Jacoby

20 August 2013 at 19:01

perl -e “code” is a start. There are more flags, but I don’t do this enough to memorize them.

Gabor Szabo

20 August 2013 at 23:14

@neizod you probably want something like this:

perl -i -s -e ‘s/old/new/g’ file1 file2

You can also check out the examples here: http://www.catonmat.net/blog/perl-book/

Michael

21 August 2013 at 04:57

Charlie, I am using Python for any scripting and I think it is best scripting language available. Perl code easily becomes intractable mess while Python is based on minimal set of languages feature yet it most expressive language I know.

John

21 August 2013 at 06:44

Michael: I used to write Perl, and I now prefer Python, but I miss Perl’s pattern matching.

I agree that it’s easy for Perl to get messy in a large script, but in a one-liner I don’t think that’s a problem.

Quantum Mechanic

21 August 2013 at 06:54

@neizod, In place file munging:

perl -i -e ‘code goes here’ file1 file2

Quantum Mechanic

21 August 2013 at 07:11

@Charlie,

As a self-professed Perl weenie, take the following with a pinch of salt…

I’ve used Perl for very large file processing, line by line and other ways. While Perl is an interpreted language (it “compiles” to byte code), the IO and regex engine are very fast. Code changes can be implemented and tested quickly.

Perl can call into C code with “XS” modules for a speedup, and inline other languages for flexibility. There’s also CPAN, and the oft-repeated phrase “Whatever you’re doing, there’s probably a module on CPAN to do it already” applies.

I first got into Perl because I had a task a bit too complex for grep. It was an easy road to get on, and I’m still on it.

Mark Galassi

21 August 2013 at 08:59

I still remember Larry Wall’s first posting of perl to comp.sources.unix: “a replacement for awk and sed” was in his subject line. Many of my friends took to it strongly, but I always preferred keeping my scripting simpler to what could be done with UNIX pipe sequences.

neizod

21 August 2013 at 09:18

`perl -i -e ‘s/…/…/’ file` doesn’t work.

`perl -p -i -e ‘s/…/…/’ file` does work, however.

Thanks, @Gabor @Quantum.

Charlie

21 August 2013 at 09:48

Thanks for the information, everyone. I considered Python, but, as I mentioned, I’d like to learn something that lets me pre-process large files that don’t easily fit into memory. Can Python work effectively line-by-line? I thought that it holds everything in memory.

(I know that, in principle, many languages, including R, could work line-by-line by reading a file one line at a time, but I’ve found R to be super slow when used in this fashion and was thinking that Python would be the same way.)

Charlie

Hilmar

21 August 2013 at 09:51

I have dabbled a little in Perl but not enough to be able to read code written by others so I still fall back on shell scripting. Usually just enough to get data into R.

I have heard many positive things about Python and I am toying with the idea of learning it. I have yet to look at the language and I am wondering if it will add anything significant to what I can already do. I learned coding with Basic in the 1980s, learned it properly with C in the 1990s, then shifted to R as my go to language since then.

Anyone have thoughts on what python could add to this?

Joshua Drake

21 August 2013 at 11:47

From your article it appears that “(better)” in the book equates to either “more robust” or “more extensible”.

Michael

23 August 2013 at 04:45

Python processes files line by line be default:

for line in open(‘/tmp/test.txt’):
print “first column” line.split()[0]

Michael

23 August 2013 at 04:48

@Joshua
Python is very good scripting language. You can implement complicated algorithms fast. It has also reasonable performance and score of extremely high quality libraries which I could not find in any other language. Take a look at matplotlib for example.

John

30 August 2013 at 11:42

@Michael: why do people feel the need to denigrate perl in order to say that they like python? Both are fine languages. However, for the usage under discussion (sed/awk one-liners) perl has significant and obvious advantages over python and huge advantages over sed/awk.

John

30 August 2013 at 11:49

@John: Here’s a possible explanation on why people denigrate technologies they don’t use.

Comments are closed.