Tips for learning regular expressions

Here are a few realizations that helped me the most when I was learning regular expressions.

1. Regular expressions aren’t trivial. If you think they’re trivial, but you can’t get them to work, then you feel stupid. They’re not trivial, but they’re not that hard either. They just take some study.

2. Regular expressions are not command line wild cards. They contain some of the same symbols but they don’t mean the same thing. They’re just similar enough to cause confusion.

3. Regular expressions are a little programming language. Regular expressions are usually contained inside another programming language, like JavaScript or PowerShell. Think of the expressions as little bits of a foreign language, like a French quotation inside English prose. Don’t expect rules from the outside language to have any relation to the rules inside, no more than you’d expect English grammar to apply inside that French quote.

4. Character classes are a little sub-language within regular expressions. Character classes are their own little world. Once you realize that and don’t expect the usual rules for regular expressions outside character classes to apply, you can see that they’re not very complicated, just different. Failure to realize that they are different is a major source of bugs.

Once you’re ready to dive into regular expressions, read Jeffrey Friedl’s book (ISBN 0596528124). It’s by far the best book on the subject. Read the first few chapters carefully, but then flip the pages quickly when he goes off into NFA engines and all that.

***

For daily tips on regular expressions, follow @RegexTip on Twitter.

8 thoughts on “Tips for learning regular expressions

  1. I remember reading Mastering Regular Expressions in undergrad and this may sound cliché, but it did change my perspective on computer programming and my approach to problem solving. I loved the chapters on NFA and DFA engines and eventually went on to craft my own regular expression engine in C.

  2. I had a copy of Mastering Regular Expressions with me one time when someone said “There’s a whole book on regular expressions?! It’s just wildcards.” Regular expressions aren’t the highest achievement of computer science, but they’re not trivial either.

  3. Hello, Mister Cook!

    Many thank-you’s for your continuing effort in “regextip”, here, in medical research, and elsewhere. I appreciate your apparent integrity.
    I suspect, as one who teaches, it is your wont when presented with student questions over rudiments, as an elementary inquiry in regular expressions, you might reply– instead, with another question– a technique in educational practice which I tend to employ. As well, I might prefer a teacher should present the study in that very manner, topic depending. Having only followed for a few weeks, I have experienced some benefit from your mnemonics for regular expressions. As student, I present the following query on topic of your recent tips in “lookarounds”, where you stated [I paraphrase], “syntax for the ‘lookbehind’ is not unlike that of the ‘lookahead’: replace the ‘equal-sign’ of the ‘lookahead’ with a ‘greater-than’ symbol…”. Employing your mnemonic against an expression I’d engineered to find only /some/ semi-colons of so-called compressed-text, for the purpose of de-minifying a minified (aka. compressed) javascript.js file, I realized my syntax was incorrect, even though my text-processor did not indicate any error. (The expression, at time of this commentary, remains published, and requires my editing. I want to come up w/ the appropriate expression, however, before editing so I might offer a more illustrative text.)
    I realize my texts are lengthy. I apologize for that!

    Best wishes!
    -js / Author, NoviceNotes.Net

  4. This looks like a simple start list, but still pretty long.

    * What does /test/ match
    * What does /Test/ match
    * What does /Test/i match
    * What does /test / match
    * What does /test./ match
    * What does /test d/ match
    * What does /test [a-z]/ match
    * What does /a*/ match
    * What does /test a*/ match
    * What does /test w/ match

    First to learn would be, what does “match” mean.

    Stephan

  5. Hello John,

    Firstly, many thanks for sharing your knowledge in small digestible chunks on Twitter and even on your website. With that said, I have a question for you, if you have the time then I would appreciate a response.

    In Bash regular expressions, when dealing with numbers and alphabets, programmers can simply use the alphanum or alnum class instead of enclosing the ranges within brackets, as we have to do in JavaScript. Considering that I have not explored the deeper, darker corners of the JavaScript’s regular expression syntax, I was wondering that does JavaScript allow the use of such character classes within its regular expressions. Although I have not used Bash scripting extensively, still, in my experience, use of character classes, instead of requiring ranges, makes the task of writing regular expressions slightly more straightforward.

    As I said in the opening paragraph, I would certainly appreciate your response.

    Irfan.

Comments are closed.