Look-behind regex

Look-behind is one of those advanced/obscure regular expression features that I don’t use frequently enough to remember the syntax, but just frequently enough that I wish I could remember it.

Look-behind can be positive or negative. Look-behind says “match this position only if the preceding text matches (does not match) the following  pattern.”

The syntax in Perl and similar regular expression implementations is (?<= … ) for positive look-behind and (?<! … ) for negative look-behind. For the longest time I couldn’t remember whether the next symbol after ? was the direction (i.e. < for behind) or the polarity (= for positive, ! for negative). I was more likely to guess wrong unless I’d used the syntax recently.

The reason I was tempted to get these wrong is that I thought “positive look-behind” and “negative look-behind.” That’s how these patterns are described. But this means the words and symbols come in a different order. If you think look-behind positive and look-behind negative then the words and the symbols come in the same order:

look (?
behind <
positive =
negative !

Maybe this syntax comes more naturally to people who speak French and other languages where adjectives follow the thing they describe. English word order was tripping me up.

By the way, the syntax for look-ahead patterns is simpler: just leave out the <. The default direction for look-around patterns is forward. You don’t have to remember whether the symbol for direction or parity comes first because there is no symbol for direction.

6 thoughts on “Look-behind regex

  1. It sounds like a good way to remember the order for the look-behind operator is that it doesn’t make (normal) look-ahead syntax ambiguous.

  2. Michael: Good point. That’s probably why it was designed the way it was.

  3. Interesting — I thought the order of symbols was perfectly natural in English, because I think of it as a command:
    ( ? < {=,!}
    Match if previous {IS,ISN'T} ___

  4. From Larry Wall (he’s talking about Perl 6, actually):

    “Interestingly, there were no withdrawn RFCs for pattern matching. That means either that there were no cork-brained ideas proposed, or that regex culture is so cork-brained already that the cork-brained ideas blend right in. I know where my money is… :-)

    In fact, regular expression culture is a mess, and I share some of the blame for making it that way.”

  5. The rule stating that in French, adjectives follow the noun they describe is a myth.
    I grant you that the white horse becomes le cheval blanc. But the small horse becomes le petit cheval.
    Word order in French is a long story, a complicated story. C’est une longue histoire. Une histoire compliquée. Indeed.

Comments are closed.