Here are two examples that persuaded me long ago that regular expressions could be powerful. Both come from The Unix Programming Environment by Kernighan and Pike (1984).
The first problem is to produce a list of all English words that contain all five vowels exactly once and in alphabetical order.
The book creates a regular expression
then uses it to filter a dictionary file
egrep -f alphavowels /usr/dict/web2
This produced 16 words ranging from abstemious to majestious.
The second problem is to produce a list of all English words of at least six letters with letters appearing in increasing alphabetical order.
The book creates a regular expression named
then uses it to filter a dictionary file as before, except there is an additional filter stage.
egrep -f monotonic /usr/dict/web2 | grep '......'
This produced 17 words including common words such as almost and ghosty. Some of the more interesting results were bijoux, chintz, and egilops. Kernighan and Pike explain that egilops is a disease that attacks wheat.
The regular expressions above are fairly long, but shorter and more transparent than a procedural program to solve the same problem. The solutions may look mysterious at first sight, but they are entirely straightforward once you know the most basic features of regular expressions.
In the first problem, the pattern
[^aeiou] says to look for anything that isn’t a vowel, i.e. is a consonant (assuming entries in the dictionary file contain only letters). So the regular expression says to start at the beginning of each line and look for zero or more consonants, followed by an ‘a’, followed by zero or more consonants, followed by an ‘e’, and so on down to a ‘u’ optionally followed by consonants at the end of the line.
In the second problem, the question mark matches zero or one instances of a character, i.e. the character is optional. The regular expression says to start at the beginning of each line, look for an optional ‘a’, followed by an optional ‘b’, and so forth to the end of the line. Then the output is filtered by another regular expression
....... Since a period matches any character, a sequence of six periods says to select only words that contain six characters.
For daily tips on regular expressions, follow @RegexTip on Twitter.