Perl as a better …

Today I ran across Minimal Perl: For UNIX and Linux People. The book was published a few years ago but I hadn’t heard of it because I haven’t kept up with the Perl world. The following chapters from the table of contents jumped out at me because I’ve been doing a fair amount of awk and sed lately.:


3. Perl as a (better) grep command
4. Perl as a (better) sed command
5. Perl as a (better) awk command
6. Perl as a (better) find command

These chapters can be read a couple ways. The most obvious reading would be “Learn a few features of Perl and use it as a replacement for a handful of separate tools.”

But if you find these tools familiar and are not looking to replace them, you could read the book as saying “Here’s an introduction to Perl that teaches you the language by comparing it to things you already know well.”

The book suggests learning one tool instead of several, and in the bargain getting more powerful features, such as more expressive pattern matching. It also suggests not necessarily committing to learn the entire enormous Perl language, and not necessarily committing to use Perl for every programming task.

Regarding Perl’s pattern matching, I could relate to the following quip from the book.

What the only thing worse than not having a particular metacharacter … in a pattern-matching utility? Thinking you do, when you don’t! Unfortunately, that’s a common problem when using Unix utilities for pattern matching.

That was my experience just yesterday. I wrote a regular expression containing \d for a digit and couldn’t understand why it wasn’t matching.

Most of the examples rely on giving Perl command line options such as -e so that it acts more like command line utility. The book gives numerous examples carrying out common tasks in grep etc. and with Perl one-liners. The latter tend to be a little more verbose. If a task falls in the sweet spot of a common tool, that tool’s syntax will be more succinct. But when a task falls outside that sweet spot, such as matching a pattern that cannot be easily expressed with traditional regular expressions, the Perl solution will be shorter.

 

Related posts:

Extreme syntax

In his book Let Over Lambda, Doug Hoyte says

Lisp is the result of taking syntax away, Perl is the result of taking syntax all the way.

Lisp practically has no syntax. It simply has parenthesized expressions. This makes it very easy to start using the language. And above all, it makes it easy to treat code as data. Lisp macros are very powerful, and these macros are made possible by the fact that the language is simple to parse.

Perl has complex syntax. Some people say it looks like line noise because its liberal use of non-alphanumeric characters as operators. Perl is not easy to parse — there’s a saying that only Perl can parse Perl — nor is it easy to start using. But the language was designed for regular users, not beginners, because you spend more time using a language than learning it.

There are reasons I no longer use Perl, but I don’t object to the rich syntax. Saying Perl is hard to use because of its symbols is like saying Greek is hard to learn because it has a different alphabet. It takes years to master Greek, but you can learn the alphabet in a day. The alphabet is not the hard part.

Symbols can make text more expressive. If you’ve ever tried to read mathematics from the 18th or 19th century, you’ll see what I mean. Before the 20th century, math publications were very verbose. It might take a paragraph to say what would now be said in a single equation. In part this is because notation has developed and standardized over time. Also, it is now much easier to typeset the symbols someone would use in handwriting. Perl’s repertoire of symbols is parsimonious compared to mathematics.

I imagine that programming languages will gradually expand their range of symbols.

People joke about how unreadable Perl code is, but I think a page of well-written Perl is easier to read than a page of well-written Lisp.  At least the Perl is easier to scan: Lisp’s typographical monotony makes it hard to skim for landmarks. One might argue that a page of Lisp can accomplish more than a page of Perl, and that may be true, but that’s another topic.

***

Any discussion of symbols and programming languages must mention APL. This language introduced a large number of new symbols and never gained wide acceptance. I don’t know that much about APL, but I’ll give my impression of why I don’t think APL’s failure is not proof that programmers won’t use more symbols.

APL required a special keyboard to input. That would no longer be necessary. APL also introduced a new programming model; the language would have been hard to adopt even without the special symbols. Finally, APL’s symbols were completely unfamiliar and introduced all at once, unlike math notation that developed world-wide over centuries.

***

What if programming notation were more like music notation? Music notation is predominately non-verbal, but people learn to read it fluently with a little training. And it expresses concurrency very easily. Or maybe programs could look more like choral music, a mixture of symbols and prose.

Learn one Perl command

A while back I wrote a post Learn one sed command. In a nutshell, I said it’s worth learning sed just do commands of the form sed s/foo/bar/ to replace “foo” with “bar.”

Dan Haskin and Will Fitzgerald suggested in their comments that instead of sed use perl -pe with the same command. The advantage is that you could use Perl’s more powerful regular expression syntax. Will said he uses Perl like this:

cat file | perl -pe "s/old/new/g" > newfile

I think they’re right. Except for the simplest regular expressions, sed’s regular expression syntax is too restrictive. For example, I recently needed to remove commas that immediately follow a digit and this did the trick:

cat file | perl -pe "s/(?<=d),//g" > newfile

Since sed does not have the look-behind feature or d for digits, the corresponding sed code would be more complicated.

I quit writing Perl years ago. I don’t miss Perl as a whole, but I do miss Perl’s regular expression support.

Learning Perl is a big commitment, but just learning Perl regular expressions is not. Perl is the leader in regular expression support, and many programming languages implement a subset of Perl’s regex features. You could just use a subset of Perl features you already know, but you’d have the option of using more features.

Related post: Perl 6 as the anti-JavaScript

The anti-JavaScript

The problems with JavaScript come from premature standardization. The language’s author Brendan Eich said

I had to be done in ten days or something worse than JS would have happened.

For a programming language designed in 10 days, he did an amazing job. Maybe he did too good a job: his first draft was good enough to use, and so he never got a chance to fix the language’s flaws.

The opposite of JavaScript may be Perl 6. The language has been in the works for 12 years and is still in development, though there are compilers you can use today. An awful lot of thought has gone into the language’s design. Importantly, some early design decisions were overturned after the community had time to think, a luxury JavaScript never had.

Perl 6 has gotten a lot of ridicule for being so slow to come out, but it may have the last laugh. Someone learning Perl 6 in the future will not care how long the language was in development, but they will appreciate that the language was very thoughtfully designed.

***

Another contrast between JavaScript and Perl 6 is their names. Netscape gave JavaScript a deliberately misleading name to imply a connection to the Java language. The Perl 6 name honestly positions the new language as a successor to Perl 5.

Perl 6 really is a new language, compatible in spirit with earlier versions of Perl though not always in syntax. Damian Conway has suggested that perhaps Perl 6 should have been developed under a completely different name. Then after it was completed, the developers could announce, “Oh, and by the way, this language is the upgrade path for Perl.”

If you think of Perl 6 as a new language, your expectations are quite different than if you think of it as an upgrade. If it’s a new language, it doesn’t matter so much how long it was in development. Perl programmers would be pleased with how similar the new language is to their familiar one, rather than upset about the differences. And people would evaluate the new language on its merits rather than being prejudiced by previous experience with Perl.

Related posts:

Perl One-Liners Explained

Peteris Krumins has a new book, Perl One-Liners Explained. His new book is in the same style as his previous books on awk and sed, reviewed here and here.

All the books in this series are organized by task. For each task, there is a one-line solution followed by detailed commentary. The explanations frequently offer alternate solutions with varying degrees of concision and clarity. Sections are seldom more than one page long, so the books are easy to read a little at a time.

Programmers who have written a lot of Perl may still learn a few things from Krumins. In particular, those who have primarily written Perl in script files may not be familiar with some of the tricks for writing succinct Perl on the command line.

Other Perl posts:

Conservation of complexity

Larry Wall said something one time to the effect that Scheme is beautiful and every Scheme program is ugly; Perl is ugly, but it lets you write beautiful programs. Of course it also lets you write ugly programs if you choose.

Scheme is an elegant, minimalist language. The syntax of the language is extremely simple; you could say it has no syntax. But this simplicity comes at a price. But because the language does so little for you, you have to write the code that might have been included in other languages. And because the language has no syntax, code written in Scheme is hard to read. As Larry Wall said

The reason many people hate programming in Lisp [the parent language of Scheme] is because every thing looks the same. I’ve said it before, and I’ll say it again: Lisp has all the visual appeal of oatmeal with fingernail clippings mixed in.

The complexity left out of Scheme is transferred to the code you write in Scheme. If you’re writing small programs, that’s fine. But if you write large programs in Scheme, you’ll either write a lot of code yourself or you’ll leverage a lot of code someone else has written in libraries.

Perl is the opposite of a minimalist language. There are shortcuts for everything. And if you master the language, you can write programs that are beautiful in that they are very concise. Perl programs can even be easy to read. Yes, Perl programs look like line noise to the uninitiated, but once you’ve learned Perl, the syntax can be helpful if used well. (I have my complaints about Perl, but I got over the syntax.)

Perl is a complicated language, but it works very well for some problems. Features that other languages would put in libraries (e.g. regular expressions, text munging) are baked directly into the Perl language. And if you depend on those features, it’s very handy to have direct support in the language.

The point of my discussion of Scheme and Perl is that the complexity has to go somewhere, either in the language, in libraries, or in application code. That doesn’t mean all languages are equal for all tasks. Some languages put the complexity where you don’t have to think about it. For example, Java simpler than C++, as long as you don’t have to understand the inner workings of the JVM. But if you do need to look inside the JVM, suddenly Java is more complex than C++. The total complexity hasn’t changed, but your subjective experience of the complexity increased.

Earlier this week I wrote a post about C and C++. My point there was similar. C is simpler than C++, but software written in C is often more complicated that software written in C++ when you compare code written by developers of similar talent. If you need the functionality of C++, and most large programs will, then you will have to write it yourself if you’re using C. And if you’re a superstar developer, that’s fine. If you’re less than a superstar, the people who inherit your code may wish that you had used a language that had this functionality built-in.

I understand the attraction to small programming languages. The ideal programming language has everything you need and nothing more. But that means the ideal language is a moving target, changing as your work changes. As your work becomes more complicated, you might be better off moving to a more complex language, pushing more of the complexity out of your application code and into the language and its environment. Or you may be able down-size your language because you no longer need the functionality of a more complex language.

Related posts:

All languages equally complex

This post compares complexity in spoken languages and programming languages.

There is a theory in linguistics that all human languages are equally complex. Languages may distribute their complexity in different ways, but the total complexity is roughly the same across all spoken languages. One language may be simpler in some aspect than another but more complicated in some other respect. For example, Chinese has simple grammar but a complex tonal system.

Even if all languages are equally complex, that doesn’t mean all languages are equally difficult to learn. An English speaker might find French easier to learn than Russian, not because French is simpler than Russian in some objective sense, but because French is more similar to English.

All spoken languages are supposed to be equally complex because languages reach an equilibrium between at least two forces. Skilled adult speakers tend to complicate languages by looking for ways to be more expressive. But children must be able to learn their language relatively quickly, and less skilled speakers need to be able to use the language as well.

I wonder what this says about programming languages. There are analogous dynamics. Programming languages can be relatively simpler in some way while being relatively complex in another way. And programming languages become more complex over time due to the demands of skilled users.

But there are several important differences. Programming languages are part of a complex system of language, standard libraries, idioms, tools, etc. It may make more sense to speak of a programming “system” to make better comparisons, taking into account the language and its environment.

I do not think that all programming systems are equally complex. Some are better designed than others. Some are more appropriate for a given task than others. Some programming systems achieve simplicity by sacrificing efficiency. Some abstractions leak less than others.

On the other hand, I imagine the levels of complexity are more similar when comparing programming systems rather than just comparing programming languages.  Larry Wall said something to the effect that Perl is ugly so you can write beautiful programs in it. I think there’s some truth to that. A language can always be small and elegant by simply not providing much functionality, forcing the user to implement that functionality in application code.

See Larry Wall’s article Natural Language Principles in Perl for more comparisons of spoken languages and programming languages.

Related posts:

Plain Python

Perl is cool, much more so than Python. But I prefer writing Python.

Perl is fun to read about. It has an endless stream of features to discover. Python by comparison is kinda dull. But the aspects of a language that make it fun to read about do not necessarily make it pleasant to use.

I wrote Perl off and on for several years before trying Python. People would tell me I should try Python and every six months or so I’d skim through a Python book. My impression was that Python was prosaic. It didn’t seem to offer any advantage over Perl, so I stuck with Perl. (Not that I was ever very good at Perl.)

Then I read an article by Bruce Eckel saying that he liked Python because he could remember the syntax. He said that despite teaching Java and writing books on Java, he could never remember the syntax for opening a file in Java, for example. But he could remember the corresponding Python syntax. I would never have picked up on that by skimming books. You’ve got to actually use a language a while to know how memorable the syntax is. But  I had used Perl enough to know that I could not remember its syntax without frequent use. Memorable syntax increases productivity. You don’t have to break your train of thought as often to reach for a reference book.

I stand by my initial impression that Python is plain, but I now think that’s a strength. It just gets out of my way and lets me get my work done. I’m sure  Perl gurus can be extremely productive in Perl. I tried being a Perl guru, and I never made it. I wouldn’t say I’m a Python guru, but I also don’t feel the need to be a guru to use Python.

Python code is not cool in a line-by-line sense, not in the way that an awesomely powerful Perl one-liner is cool. Python is cool in more subtle ways.

Related posts:

API symmetry

Symmetric APIs are easier to use. I was reminded of this when doing some regular expression programming in Python and comparing it to Perl. Perl’s regular expression operators for search and replace are symmetric in a way that their Python counterparts are not.

Perl uses m/pattern/ for matching and s/pattern/replacement/ for substitution. Both apply to the first instance of a pattern in a string by default. The g option following a match or substitute operator causes the command to apply to all instances of the pattern. The i option after either a match or substitute command causes the pattern to apply in a case-insensitive manner. Matching and substitution are symmetric.

Python uses re.search() for matching and re.sub() for substitution. The search function can only apply to the first instance of a pattern; to match all instances of a pattern, use re.findall(). The function re.sub() applies to all instances by default, but it has a max parameter that can be set to limit the number of instances it applies to. To make a search pattern case-insensitive, pass in re.IGNORECASE flag. To make a substitution case-insensitive, modify the regular expression itself by adding (?i).

In general, I find Python syntax much cleaner than Perl, but regular expressions are implemented more elegantly in Perl.

Languages that are easy to pick back up

Some programming languages are much easier to come back to than others. In my previous post I mentioned that Mathematica is easy to come back to, put Perl is not.

I found it easy to come back LaTeX after not using it for a while. It has a few quirks, but it’s basically consistent. The LaTeX commands for Greek letters are their names, lower case names for lower case letters, upper case names for upper case letters. The command for a mathematical symbol is usually the name a mathematician would give the symbol. Modes always begin with begin and end with end.

Python also has a consistent syntax that make it easier to come back to the language after a break. Someone has said that Python is similar to Perl, except that the word “except” does not appear nearly so often in the Python documentation.

It’s more important that a language be internally consistent than conventional. Each of the languages I mentioned have their peculiarities. Mathematica uses square brackets for function argument arguments. LaTeX uses percent signs for comments. Python uses indention to denote blocks. Each of these take a little getting used to, but each makes sense in its own context.

A special case of consistency is using full names for keywords. Mathematica always spells out words in full. For example, the gamma distribution object is named GammaDistribution. I don’t mind a little extra typing. I’d rather optimize for recall and readability than minimize keystrokes since I spend more time recalling and reading than typing. (One flaw in LaTeX is that it occasionally uses unnecessary abbreviations. For example, \infty for infinity. The corresponding Mathematica keyword is Infinity.)

Programming language subsets

I just found out that Douglas Crockford has written a book JavaScript: The Good Parts. I haven’t read the book, but I imagine it’s quite good based on having seen the author’s JavaScript videos.

Crockford says JavaScript is an elegant and powerful language at its core, but it suffers from numerous egregious flaws that have been impossible to correct due to its rapid adoption and standardization.

I like the idea of carving out a subset of a language, the good parts, but several difficulties come to mind.

  1. Although you may limit yourself to a certain language subset, your colleagues may choose a different subset. This is particularly a problem with an enormous language such as Perl. Coworkers may carve out nearly disjoint subsets for their own use.
  2. Features outside your intended subset may be just a typo away. You have to have at least some familiarity with the whole language because every feature is a feature you might accidentally use.
  3. The parts of the language you don’t want to use still take up space in your reference material and make it harder to find what you’re looking for.

One of the design principles of C++ is “you only pay for what you use.” I believe the primary intention was that you shouldn’t pay a performance penalty for language features you don’t use, and C++ delivers on that promise. But there’s a mental price to pay for language features you don’t use. As I’d commented about Perl before, you have to use the language several hours a week just to keep it loaded in your memory.

There’s an old saying that when you marry a girl you marry her family. A similar principle applies to programming languages. You may have a subset you love, but you’re going to have to live with the rest of the language.

Periodic table of Perl operators

Mark Lentczner has posted a periodic table of Perl operators. The table shows Perl 6 in all its Byzantine glory. If you work in the language constantly and enjoy the terse syntax optimized for experts, you’ll love Perl 6. But if you’re already having difficulty holding Perl in your head, this periodic table might be the straw that breaks the camel’s back.

 Periodic table of Perl 6 operators

Three-hour-a-week language

I listened to an interview last night with Perl guru Randal Schwartz. He said that Perl is meant for people who use the language at least two or three hours per week. If you’re not going to use it that often, you’re probably better off using something else. But if you use it full time, you can see huge productivity increases.

That matches my experience, at least the first part. I was attracted to Perl because I could imagine being very productive with it. At first I used the language infrequently. Every time I sat down to write Perl I had to work hard to reload the language into my brain. Then for a while I used it frequently enough to achieve some fluency. Then as I wrote Perl less often I could almost feel the language slipping away.

Of course you have to use any language, human or computer, to achieve and maintain fluency. But my sense is that Perl requires more frequent use than other programming languages in order to remain minimally competent, and it repays frequent use more than other languages. I imagine this is a consequence of the natural language principles baked into the language.

One of the things I prefer about Python is that you do not have to use it three hours a week to remain competent.

I no longer write Perl, but I still enjoy listening to thought-provoking Perl speakers like Randal Schwartz.

Related post: Plain Python