Tips for learning regular expressions

Here are a few realizations that helped me the most when I was learning regular expressions.

1. Regular expressions aren’t trivial. If you think they’re trivial, but you can’t get them to work, then you feel stupid. They’re not trivial, but they’re not that hard either. They just take some study.

2. Regular expressions are not command line wild cards. They contain some of the same symbols but they don’t mean the same thing. They’re just similar enough to cause confusion.

3. Regular expressions are a little programming language.Regular expressions are usually contained inside another programming language, like JavaScript or PowerShell. Think of the expressions as little bits of a foreign language, like a French quotation inside English prose. Don’t expect rules from the outside language to have any relation to the rules inside, no more than you’d expect English grammar to apply inside that French quote.

4. Character classes are a little sub-language within regular expressions. Character classes are their own little world. Once you realize that and don’t expect the usual rules for regular expressions outside character classes to apply, you can see that they’re not very complicated, just different. Failure to realize that they are different is a major source of bugs.

Once you’re ready to dive into regular expressions, read Jeffrey Friedl’s book. It’s by far the best book on the subject. Read the first few chapters carefully, but then flip the pages quickly when he goes off into NFA engines and all that.

Tagged with:
Posted in Software development
7 comments on “Tips for learning regular expressions
  1. Larry Singer says:

    Hi John:

    Nice blog. I need to get out my text books for some of the entries.

  2. Nick Dunn says:

    Great post. I especially like point #3; it’s an apt metaphor.

  3. James says:

    I remember reading Mastering Regular Expressions in undergrad and this may sound cliché, but it did change my perspective on computer programming and my approach to problem solving. I loved the chapters on NFA and DFA engines and eventually went on to craft my own regular expression engine in C.

  4. John says:

    I had a copy of Mastering Regular Expressions with me one time when someone said “There’s a whole book on regular expressions?! It’s just wildcards.” Regular expressions aren’t the highest achievement of computer science, but they’re not trivial either.

  5. Hello, Mister Cook!

    Many thank-you’s for your continuing effort in “regextip”, here, in medical research, and elsewhere. I appreciate your apparent integrity.
    I suspect, as one who teaches, it is your wont when presented with student questions over rudiments, as an elementary inquiry in regular expressions, you might reply– instead, with another question– a technique in educational practice which I tend to employ. As well, I might prefer a teacher should present the study in that very manner, topic depending. Having only followed for a few weeks, I have experienced some benefit from your mnemonics for regular expressions. As student, I present the following query on topic of your recent tips in “lookarounds”, where you stated [I paraphrase], “syntax for the ‘lookbehind’ is not unlike that of the ‘lookahead’: replace the ‘equal-sign’ of the ‘lookahead’ with a ‘greater-than’ symbol…”. Employing your mnemonic against an expression I’d engineered to find only /some/ semi-colons of so-called compressed-text, for the purpose of de-minifying a minified (aka. compressed) javascript.js file, I realized my syntax was incorrect, even though my text-processor did not indicate any error. (The expression, at time of this commentary, remains published, and requires my editing. I want to come up w/ the appropriate expression, however, before editing so I might offer a more illustrative text.)
    I realize my texts are lengthy. I apologize for that!

    Best wishes!
    -js / Author, NoviceNotes.Net

  6. This looks like a simple start list, but still pretty long.

    * What does /test/ match
    * What does /Test/ match
    * What does /Test/i match
    * What does /test / match
    * What does /test./ match
    * What does /test d/ match
    * What does /test [a-z]/ match
    * What does /a*/ match
    * What does /test a*/ match
    * What does /test w/ match

    First to learn would be, what does “match” mean.

    Stephan

  7. Rob Campbell says:

    My first rule of regular expressions is “know your data”. Anyone who sets out to write a regular expression without that is doomed to failure.

2 Pings/Trackbacks for "Tips for learning regular expressions"
  1. [...] Tips for learning regular expressions @RegexTip: One regular expression tip per day ? X [...]

  2. [...] old regular expressions Tips for learning regular expressions A little awk ? [...]