Regular expressions in Mathematica

Outline:

Overview

This page is written for the benefit of someone familiar with regular expressions but not with the use of regular expressions in Mathematical. Comparisons will be made with Perl for those familiar with the language, though no knowledge of Perl is required.

Mathematica’s regular expression flavor

Mathematica uses essentially the same regular expression flavor as Perl 5. Specifically, Mathematica is compatible with PCRE. Note that metacharacters require two backslashes. For example, the`\d` shortcut for matching any digit must be written as `\\d`.

Matching

The Mathematica function `StringFreeQ` is analogous to the` m//` operator in Perl. However, The logic of `StringFreeQ` is inverted compared to `m//` because it returns whether a string is “free” of a pattern, i.e. it returns `True` if the string does not contain the pattern and `False` if it does.

The first argument to `StringFreeQ` is the string to search. The second argument can be a simple string or a regular expression.

Examples:

`StringFreeQ["Hello world", "ello"]`

returns `False` because the string “Hello world” does contain “ello”.

`StringFreeQ["Hello world", "el+o"]`

returns `True` because “Hello world” does not contain the literal string “el+o”. However

`StringFreeQ["Hello world", RegularExpression["el+o"]]`

returns `False` because “Hello world” does match the regular expression `el+o`.

If you want retrieve the text matched rather than simply asking whether there was a match, use `StringCases`. This function returns a list containing all matches. Of course the list will be empty if there were no matches.

Replacing

The Mathematica function `StringReplace` is analogous to Perl’s `s///` operator. The first argument is the string to operate on. The second is a regular expression followed by `->` and a replacement string.

Example:

`StringReplace["Hello world", RegularExpression["world"] -> "planet"]`

returns “Hello planet”. Note that StringReplace does not modify its arguments. If the replacement pattern needs to reference captured subexpressions, these can be accessed by `\$1`, `\$2`, etc. just as in Perl.

Example:

`StringReplace["Hello world", RegularExpression["(world)"] -> "planet \$1"]`

returns “Hello planet world”.

Note that StringReplace replaces all matches in a string by default, and so it is more precisely analogous to `s//g` than `s//`. To limit the number of matches, add a third argument specifying the maximum number of replacements. For example, adding 1 as the last argument causes only the first instance to be replaced.

Example:

`StringReplace["Hello world", RegularExpression["o"] -> "x", 1]`

returns “Hellx world”. Without the final argument it would have returned “Hellx wxrld.”

Case-sensitivity

The `m//` and `s///` operators in Perl are case-sensitive by default. Perl has two ways of making these operators case-insensitive. One is by appending an ‘`i`‘ following the operator. The other is to add `(?i)` to the beginning of the regular expression.

Mathematica is also case-sensitive by default, and it also has two ways of changing the case-sensitivity. One is to use the attribute `IgnoreCase -> True`. The other is to add `(?i)` to the regular expression as in Perl.