Python supports essentially the same regular expression syntax as Perl, as far as the regular expressions themselves. However, the syntax for using regular expressions is substantially different.
Regular expression support is not available out of the box; you must import the
Regular expression patterns are contained in strings, in contrast to Perl’s built-in // syntax. This means that some characters need to be escaped in order to be passed on to the regular expression engine. To be safe, always use raw strings (
r"") to contain patterns.
You might think that
re.match() is the analog to Perl’s
m// match operator. It’s not! The
re.match() function matches regular expressions starting at the beginning of a string. It behaves as if every pattern has
^ prepended. The function
re.search() behaves like Perl’s
m// and is probably what you want to use exclusively.
None if no match is found and a match object otherwise. You can retrieve captured matches via the
group method on the match object. The
group method without any argument returns the entire match. The
group method with a positive integer argument returns captured expressions:
group(1) returns the first capture,
group(2) returns the second, analogous to
$2, etc. in Perl.
Python doesn’t have a global modifier like Perl’s
/g option. To find all matches to a pattern, use
re.findall() rather than
findall method returns a list of matches rather than a match object. If the match contains captured subexpressions,
findall will return a list of tuples, the tuples being the captures.
To substitute for a pattern, analogous to Perl’s
s// operator, use
re.sub() is analogous to
s//g since it replaces all instances of a pattern by default. To change this behavior, you can specify the maximum number of instances to replace using the max parameter to
re.sub(). Setting this parameter to 1 causes only the first instance to be substituted, as in Perl’s
To make a regular expression case-insensitive, pass the argument
re.IGNORECASE) as the final argument to
re.sub does not take flags such as
re.I. So in order to make the regular expression match case-insensitive, one must modify the regular expression itself by adding
(?i) to the beginning of the expression. (The modifier
(?i) can go anywhere, but the regular expression will be most readable if the modifier goes at the beginning or possibly at the end.) Also,
re.sub does not modify its argument but returns a new string, unlike Perl’s
For daily tips on regular expressions, follow @RegexTip on Twitter.