Regular expressions in PowerShell and Perl
Overview
PowerShell's regular expression flavor
Matching and replacing
Case-sensitivity
Retrieving single matches
Global matching and replacing
Replacing with captures
Overview
This page is written for the benefit of someone familiar with regular expressions but not with the use of regular expressions in Microsoft's PowerShell. Comparisons will be made with Perl for those familiar with the language, though no knowledge of Perl is required. The focus is not on the syntax of regular expressions per se but rather how to use regular expressions to search for patterns and make replacements.
PowerShell's regular expression flavor
PowerShell is built upon Microsoft's .NET framework. In regular expressions, as in much else, PowerShell uses the .NET implementation. And .NET in turn essentially uses Perl 5's regular expression syntax, with a few added features such as named captures. For someone familiar with regular expressions, especially as they are implemented in .NET or Perl, the difficulty in using PowerShell is not in the syntax of regular expressions themselves, but rather in using regular expressions to do work. Because PowerShell is new, detailed documentation and examples are harder to find than for .NET or Perl.
Matching and replacing
PowerShell has -match
and -replace
operators
that are roughly analogous to the m//
and s///
operators in Perl. For example, start with the command
$greeting = "Hello world"
which would be legal PowerShell or Perl code. The PowerShell statement
$greeting -match "ello"
would return a Boolean true value just as the Perl statement
$greeting =~ m/ello/;
would return 1. Similarly, the PowerShell statement
$greeting -replace "world", "planet"
would return "Hello planet", as would the statement
$greeting =~ s/world/planet/;
in Perl. However, the PowerShell statement returns a new string, leaving
$greeting
unchanged, while the corresponding Perl
statement changes the string $greeting
in place.
Case-sensitivity
Perl modifies the behavior of the m//
and s///
operators by adding characters to indicate such things as case-sensitivity.
Both operators, like nearly everything else in Perl, are case-sensitive by
default. Appending an 'i
' causes them to be case-insensitive.
PowerShell's -match
and -replace
operators,
like nearly everything else in PowerShell, are case-insensitive by default.
PowerShell offers the highly recommended option of specifying -imatch
and -ireplace
to make case-insensitivity explicit. The
case-sensitive counterparts are -cmatch
and
-creplace
.
Retrieving single matches
After performing a match in Perl, the captured matches are stored in the
variables $1
, $2
, etc. Similarly, after a match
PowerShell creates an array
$matches
with $matches
[n] corresponding to
Perl's
$
n.
Global matching and replacing
Perl
Perl uses a 'g
' option to specify global matches and
replacements. The unmodified operator m//
returns a Boolean
value, but the modified operator
m//g
returns an array containing all substrings matching the
specified pattern. Both the s///
and s///g
operators modify their string in place. The unmodified s///
operator replaces only the first match in the string, while the modified
s///g
operator replaces all occurrences of the pattern.
For example, first set
$str = "Cookbook";
Then executing
$b = $str =~ m/(\woo\w)/;
sets $b
to 1 (true) and populates $1
with
"Cook". Executing
@a = $str =~ m/(\woo\w)/g;
sets @a
to the array containing "Cook" and "book".
The statement
$str =~ s/oo/u/;
would convert "Cookbook" into "Cukbook", while the statement
$str =~ s/oo/u/g;
would convert "Cookbook" into "Cukbuk".
PowerShell
PowerShell's -creplace
and -ireplace
work very
much like the
s///g
and
s///ig
operators in Perl: all matches are replaced. PowerShell
version 1.0 does not have a direct analog to Perl's s///
and
s///i
operators without the 'g
' option.
Neither does PowerShell 1.0 have an analog to the m//g
option in
Perl, though
Keith Hill has proposed that
Microsoft add a
-matches
operator in a future release of PowerShell analogous
to Perl's m//g
.
In order to have more fine control over match and replace operations in
PowerShell, one must use the [regex]
class rather than the
-match
and
-replace
operators. The following example shows how to do a
global match in PowerShell.
$str = "Cookbook"
[regex]::matches($str, "\woo\w")
The last command returns a compound structure containing more information
than just the matches. To fill an array $words
with just the
matches, use the command
$words = ([regex]::matches($str, "\woo\w") | %{$_.value})
which pipes the [regex]::matches
output through the
foreach
operator
%
and selects the matched text.
The [regex]
class in PowerShell is invoking the Regex
class in .NET, and so all the options of the .NET class are available. For
example, the .NET class library has a RegexOptions
enumeration
with useful values such as
IgnoreCase
, RightToLeft
, etc. However, it is not
obvious how one should translate from .NET to PowerShell syntax. You can't
write
RegexOptions.IgnoreCase
in PowerShell code the way you would in
C#. It turns out the thing to do is to quote the option as a string. For
example, the code
[regex]::matches($str, "[a-z]ook", "IgnoreCase")
will match "Cook" and "book", whereas without the
IgnoreCase
option only "book" would match.
Replacing with captures
Often you want to replace a pattern not with a constant string but with a string based on the text that matched. For example, suppose you want to replace italicized words with double words. If $str contained "<i>big</i>" then after executing the Perl statement
$str =~ s{<i>(\w+)</i>}{$1 $1};
$str
would contain "big big". (This illustrates an
alternative Perl syntax for s///
. The s{}{}
alternative prevents having to escape backslashes as in </i>
above.)
The equivalent PowerShell code would be
$str = [regex]::Replace($str, "<i>(\w+)</i>", '$1 $1');
Notice the single quotes around the replacement pattern. This is to keep PowerShell from interpreting "$1" before passing it to the Regex class. We could use double quotes if we also put back ticks in front of the dollar signs to escape them.
Further resources
Notes on using regular expression in other languages:
Daily tips on regular expressions
Other PowerShell articles: