Regular expressions in Mathematica

Outline:

Overview
Mathematica’s regular expression flavor
Matching and replacing
Replacing
Case-sensitivity
More about regular expressions

Overview

This page is written for the benefit of someone familiar with regular expressions but not with the use of regular expressions in Mathematical. Comparisons will be made with Perl for those familiar with the language, though no knowledge of Perl is required.

Mathematica’s regular expression flavor

Mathematica uses essentially the same regular expression flavor as Perl 5. Specifically, Mathematica is compatible with PCRE. Note that metacharacters require two backslashes. For example, the\d shortcut for matching any digit must be written as \\d.

Matching

The Mathematica function StringFreeQ is analogous to the m// operator in Perl. However, The logic of StringFreeQ is inverted compared to m// because it returns whether a string is “free” of a pattern, i.e. it returns True if the string does not contain the pattern and False if it does.

The first argument to StringFreeQ is the string to search. The second argument can be a simple string or a regular expression.

Examples:

StringFreeQ["Hello world", "ello"]

returns False because the string “Hello world” does contain “ello”.

StringFreeQ["Hello world", "el+o"]

returns True because “Hello world” does not contain the literal string “el+o”. However

StringFreeQ["Hello world", RegularExpression["el+o"]]

returns False because “Hello world” does match the regular expression el+o.

If you want retrieve the text matched rather than simply asking whether there was a match, use StringCases. This function returns a list containing all matches. Of course the list will be empty if there were no matches.

Replacing

The Mathematica function StringReplace is analogous to Perl’s s/// operator. The first argument is the string to operate on. The second is a regular expression followed by -> and a replacement string.

Example:

StringReplace["Hello world", RegularExpression["world"] -> "planet"]

returns “Hello planet”. Note that StringReplace does not modify its arguments. If the replacement pattern needs to reference captured subexpressions, these can be accessed by $1, $2, etc. just as in Perl.

Example:

StringReplace["Hello world", RegularExpression["(world)"] -> "planet $1"]

returns “Hello planet world”.

Note that StringReplace replaces all matches in a string by default, and so it is more precisely analogous to s//g than s//. To limit the number of matches, add a third argument specifying the maximum number of replacements. For example, adding 1 as the last argument causes only the first instance to be replaced.

Example:

StringReplace["Hello world", RegularExpression["o"] -> "x", 1]

returns “Hellx world”. Without the final argument it would have returned “Hellx wxrld.”

Case-sensitivity

The m// and s/// operators in Perl are case-sensitive by default. Perl has two ways of making these operators case-insensitive. One is by appending an ‘i‘ following the operator. The other is to add (?i) to the beginning of the regular expression.

Mathematica is also case-sensitive by default, and it also has two ways of changing the case-sensitivity. One is to use the attribute IgnoreCase -> True. The other is to add (?i) to the regular expression as in Perl.

More about regular expressions

Notes on using regular expression in other languages: C++, Python, R, PowerShell

Tips for getting started with regular expressions