Perl as a better …

Today I ran across Minimal Perl: For UNIX and Linux People. The book was published a few years ago but I hadn’t heard of it because I haven’t kept up with the Perl world. The following chapters from the table of contents jumped out at me because I’ve been doing a fair amount of awk and sed lately.:


3. Perl as a (better) grep command
4. Perl as a (better) sed command
5. Perl as a (better) awk command
6. Perl as a (better) find command

These chapters can be read a couple ways. The most obvious reading would be “Learn a few features of Perl and use it as a replacement for a handful of separate tools.”

But if you find these tools familiar and are not looking to replace them, you could read the book as saying “Here’s an introduction to Perl that teaches you the language by comparing it to things you already know well.”

The book suggests learning one tool instead of several, and in the bargain getting more powerful features, such as more expressive pattern matching. It also suggests not necessarily committing to learn the entire enormous Perl language, and not necessarily committing to use Perl for every programming task.

Regarding Perl’s pattern matching, I could relate to the following quip from the book.

What the only thing worse than not having a particular metacharacter … in a pattern-matching utility? Thinking you do, when you don’t! Unfortunately, that’s a common problem when using Unix utilities for pattern matching.

That was my experience just yesterday. I wrote a regular expression containing \d for a digit and couldn’t understand why it wasn’t matching.

Most of the examples rely on giving Perl command line options such as -e so that it acts more like command line utility. The book gives numerous examples carrying out common tasks in grep etc. and with Perl one-liners. The latter tend to be a little more verbose. If a task falls in the sweet spot of a common tool, that tool’s syntax will be more succinct. But when a task falls outside that sweet spot, such as matching a pattern that cannot be easily expressed with traditional regular expressions, the Perl solution will be shorter.

 

Related posts:

A little awk
Learn one sed command
Learn one Perl command

Tagged with: , , ,
Posted in Software development
19 comments on “Perl as a better …
  1. Charlie says:

    Hi John,

    I use R for data analysis and I’ve been thinking of learning a language for text pre-processing, especially for large data sets.

    sed and awk are appealing because they work line-by-line. Is this how Perl works?

    For my relatively modest purposes (pre-processing) is learning sed and (more likely) awk a better bet than diving into Perl?

    Thanks for your insights.

    Charlie

  2. John says:

    Perl can be used to apply the same commands line-by-line, like sed or awk, but it doesn’t have to. It’s a general programming language.

    Perl is much larger and much more powerful than sed and awk, and essentially contains these languages as subsets. So if you want to learn the more powerful language, learn Perl. But because sed and awk are much smaller, they’re easier to learn. And if you learn sed and awk first, then decide to learn Perl, you’ll recognize a lot of Perl features as coming from sed and awk.

  3. neizod says:

    Any elegant way to use perl as an in-place substitute script (like `sed -i`)? Redirect output to old file name doesn’t work here.

  4. Dave Jacoby says:

    perl -e “" is a start. There are more flags, but I don't do this enough to memorize them.

  5. Dave Jacoby says:

    perl -e “code” is a start. There are more flags, but I don’t do this enough to memorize them.

  6. Gabor Szabo says:

    @neizod you probably want something like this:

    perl -i -s -e ‘s/old/new/g’ file1 file2

    You can also check out the examples here: http://www.catonmat.net/blog/perl-book/

  7. Michael says:

    Charlie, I am using Python for any scripting and I think it is best scripting language available. Perl code easily becomes intractable mess while Python is based on minimal set of languages feature yet it most expressive language I know.

  8. John says:

    Michael: I used to write Perl, and I now prefer Python, but I miss Perl’s pattern matching.

    I agree that it’s easy for Perl to get messy in a large script, but in a one-liner I don’t think that’s a problem.

  9. Quantum Mechanic says:

    @neizod, In place file munging:

    perl -i -e ‘code goes here’ file1 file2

  10. Quantum Mechanic says:

    @Charlie,

    As a self-professed Perl weenie, take the following with a pinch of salt…

    I’ve used Perl for very large file processing, line by line and other ways. While Perl is an interpreted language (it “compiles” to byte code), the IO and regex engine are very fast. Code changes can be implemented and tested quickly.

    Perl can call into C code with “XS” modules for a speedup, and inline other languages for flexibility. There’s also CPAN, and the oft-repeated phrase “Whatever you’re doing, there’s probably a module on CPAN to do it already” applies.

    I first got into Perl because I had a task a bit too complex for grep. It was an easy road to get on, and I’m still on it.

  11. Mark Galassi says:

    I still remember Larry Wall’s first posting of perl to comp.sources.unix: “a replacement for awk and sed” was in his subject line. Many of my friends took to it strongly, but I always preferred keeping my scripting simpler to what could be done with UNIX pipe sequences.

  12. neizod says:

    `perl -i -e ‘s/…/…/’ file` doesn’t work.

    `perl -p -i -e ‘s/…/…/’ file` does work, however.

    Thanks, @Gabor @Quantum.

  13. Charlie says:

    Thanks for the information, everyone. I considered Python, but, as I mentioned, I’d like to learn something that lets me pre-process large files that don’t easily fit into memory. Can Python work effectively line-by-line? I thought that it holds everything in memory.

    (I know that, in principle, many languages, including R, could work line-by-line by reading a file one line at a time, but I’ve found R to be super slow when used in this fashion and was thinking that Python would be the same way.)

    Charlie

  14. Hilmar says:

    I have dabbled a little in Perl but not enough to be able to read code written by others so I still fall back on shell scripting. Usually just enough to get data into R.

    I have heard many positive things about Python and I am toying with the idea of learning it. I have yet to look at the language and I am wondering if it will add anything significant to what I can already do. I learned coding with Basic in the 1980s, learned it properly with C in the 1990s, then shifted to R as my go to language since then.

    Anyone have thoughts on what python could add to this?

  15. Joshua Drake says:

    From your article it appears that “(better)” in the book equates to either “more robust” or “more extensible”.

  16. Michael says:

    Python processes files line by line be default:

    for line in open(‘/tmp/test.txt’):
    print “first column” line.split()[0]

  17. Michael says:

    @Joshua
    Python is very good scripting language. You can implement complicated algorithms fast. It has also reasonable performance and score of extremely high quality libraries which I could not find in any other language. Take a look at matplotlib for example.

  18. John says:

    @Michael: why do people feel the need to denigrate perl in order to say that they like python? Both are fine languages. However, for the usage under discussion (sed/awk one-liners) perl has significant and obvious advantages over python and huge advantages over sed/awk.

  19. John says:

    @John: Here’s a possible explanation on why people denigrate technologies they don’t use.