Getting started with C++ TR1 regular expressions
This article is written for the benefit of someone familiar with regular expressions but not with the use of regular expressions in C++ via the TR1 (C++ Standards Committee Technical Report 1) extensions. Comparisons will be made with Perl for those familiar with Perl, though no knowledge of Perl is required. The focus is not on the syntax of regular expressions per se but rather how to use regular expressions to search for patterns and make replacements.
The C++ TR1 regular expression specification has an intimidating array of options. This article is intended to get you started, not to explore every nook and cranny. Getting started is the harder part since it's easier to find API details than basic examples.
The examples below use fully qualified namespaces for clarity. You could
make your code more succinct by adding a few
to eliminate namespace qualifiers.
The C++ TR1 regular expressions can follow the syntax of several regular expression environments depending on the optional flags sent to the regular expression class constructor. The six options given in the Microsoft implementation are as follows.
The default for the Microsoft implementation is
language, which is very similar to that in Perl 5.
The choice of flavors is extensible and implementation-specific. For
example, the Boost implementation adds
perl as an option, which
presumably follows Perl 5 syntax more closely than the
For someone familiar with regular expressions the difficulty in using regular expressions in C++ TR1 is not in the syntax of regular expressions themselves, but rather in using regular expressions to do work.
The C++ regular expression functions are defined in the
header and contained in the namespace
std::tr1. Note that
tr is lowercase in C++. In English prose “TR” is capitalized.
The first surprise you may run into with the C++ regular expression
implementation is that
regex_match does not "match" in the
usual sense. It will return true only when the entire string matches the
regular expression. The function
regex_search works more like
the match operator in other environments, such as the
operator in Perl.
regex_search start with
a C++ string
std::string str = "Hello world";
and construct a regular expression
regex_match(str.begin(), str.end(), rx)
false because the string
str contains more character
beyond the match of the regular expression
regex_search(str.begin(), str.end(), rx)
true because the regular expression matches a substring of
After performing a match in Perl, the captured matches are stored in the
$2, etc. Similarly, after a C++
places matches in a
match_result object. However, while Perl
$nvariables, C++ does not store matches unless you call
an overloaded form of
regex_search that takes a
match_result object. The class
match_result is a
template; often people use the class
cmatch defined by
typedef match_results<const char*> cmatch
The following example shows how retrieve captured matches.
std::tr1::cmatch res; str = "<h2>Egg prices</h2>"; std::tr1::regex rx("<h(.)>([^<]+)"); std::tr1::regex_search(str.c_str(), res, rx); std::cout << res << ". " << res << "\n";
The code above will output
2. Egg prices
] corresponds to
The following code will replace “world” in the string “Hello world”
with “planet”. The string
str2 will contain “Hello planet”
and the string
str will remain unchanged.
std::string str = "Hello world"; std::tr1::regex rx("world"); std::string replacement = "planet"; std::string str2 = std::tr1::regex_replace(str, rx, replacement);
regex_replace does not change its arguments,
unlike the Perl command
Note also that the third argument to
regex_replace must be a
string class and not a string literal. You could, however,
eliminate the temporary variable
replacement by changing the call to
with a string literal cast to a
regex_replace(str, rx, std::string("planet"))
By default, all instances of the pattern that match the regular expression
are replaced. In the example above, if
str had been
"Hello world world"
the result would have been
"Hello planet planet".
To replace only the first instance (to produce
"Hello planet world"
you would need to add the flag
as the fourth argument to
Because the default behavior of
regex_replaceis a global replace, the
function is analogous to the
s///g operator in Perl. With the
flag the function is analogous to the unmodified
s/// Perl operator.
Regular expression processing is not as convenient in C++ as it is in
languages such as Perl that have built-in regular expression support. One
reason is escape sequences. To send a backslash
\ to the regular expression
engine, you have to type
\\ in the source code. For example, consider these
std::string str = "Hello\tworld"; std::tr1::regex rx("o\\tw");
str contains a tab character between the
w. The regular expression
rx does not
contain a tab character; it contains
\t, the regular expression
syntax for matching a tab character.
C++ regular expressions are case-sensitive by default, as in Perl and
many other environments. To specify that a regular expression is
case-insensitive, add the flag
as a second argument to the
regex constructor. (The constructor
flags can be combined with a bit-wise. So if you're specifying a flag for
the regular expression flavor, you can follow it with
to combine the two.)
Support for case-sensitivity highlights the differences between C++ and
scripting languages. C++ allows more control over regular expressions but
also requires more input. For example, Perl makes the
operators case-insensitive by simply appending an
i. While the
regular expression syntax in C++ is more cluttered than that of scripting
languages, people who use C++ are doing so because they value control over
If you have trouble linking with the regex library in Visual Studio 2008, this post may help.
Other C++ articles:
Using regular expressions in other languages: