=head1 NAME
X X X
perlre - Perl regular expressions
=head1 DESCRIPTION
This page describes the syntax of regular expressions in Perl.
If you haven't used regular expressions before, a quick-start
introduction is available in L, and a longer tutorial
introduction is available in L.
For reference on how regular expressions are used in matching
operations, plus various examples of the same, see discussions of
C, C, C and C?> in L.
=head2 Modifiers
Matching operations can have various modifiers. Modifiers
that relate to the interpretation of the regular expression inside
are listed below. Modifiers that alter the way a regular expression
is used by Perl are detailed in L and
L.
=over 4
=item m
X X X X
Treat string as multiple lines. That is, change "^" and "$" from matching
the start or end of the string to matching the start or end of any
line anywhere within the string.
=item s
X X X
X
Treat string as single line. That is, change "." to match any character
whatsoever, even a newline, which normally it would not match.
Used together, as C, they let the "." match any character whatsoever,
while still allowing "^" and "$" to match, respectively, just after
and just before newlines within the string.
=item i
X X X
X
Do case-insensitive pattern matching.
If locale matching rules are in effect, the case map is taken from the
current
locale for code points less than 255, and from Unicode rules for larger
code points. However, matches that would cross the Unicode
rules/non-Unicode rules boundary (ords 255/256) will not succeed. See
L.
There are a number of Unicode characters that match multiple characters
under C. For example, C
should match the sequence C. Perl is not
currently able to do this when the multiple characters are in the pattern and
are split between groupings, or when one or more are quantified. Thus
"\N{LATIN SMALL LIGATURE FI}" =~ /fi/i; # Matches
"\N{LATIN SMALL LIGATURE FI}" =~ /[fi][fi]/i; # Doesn't match!
"\N{LATIN SMALL LIGATURE FI}" =~ /fi*/i; # Doesn't match!
# The below doesn't match, and it isn't clear what $1 and $2 would
# be even if it did!!
"\N{LATIN SMALL LIGATURE FI}" =~ /(f)(i)/i; # Doesn't match!
Perl doesn't match multiple characters in an inverted bracketed
character class, which otherwise could be highly confusing. See
L.
Also, Perl matching doesn't fully conform to the current Unicode C
recommendations, which ask that the matching be made upon the NFD
(Normalization Form Decomposed) of the text. However, Unicode is
in the process of reconsidering and revising their recommendations.
=item x
X
Extend your pattern's legibility by permitting whitespace and comments.
Details in L"/x">
=item p
X
X X
Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and
${^POSTMATCH} are available for use after matching.
=item g and c
X X
Global matching, and keep the Current position after failed matching.
Unlike i, m, s and x, these two flags affect the way the regex is used
rather than the regex itself. See
L for further explanation
of the g and c modifiers.
=item a, d, l and u
X X X X
These modifiers, new in 5.14, affect which character-set semantics
(Unicode, ASCII, etc.) are used, as described below in
L.
=back
These are usually written as "the C modifier", even though the delimiter
in question might not really be a slash. The modifiers C
may also be embedded within the regular expression itself using
the C<(?...)> construct, see L below.
The C, C, C, C and C modifiers need a little more
explanation.
=head3 /x
C tells
the regular expression parser to ignore most whitespace that is neither
backslashed nor within a character class. You can use this to break up
your regular expression into (slightly) more readable parts. The C<#>
character is also treated as a metacharacter introducing a comment,
just as in ordinary Perl code. This also means that if you want real
whitespace or C<#> characters in the pattern (outside a character
class, where they are unaffected by C), then you'll either have to
escape them (using backslashes or C<\Q...\E>) or encode them using octal,
hex, or C<\N{}> escapes. Taken together, these features go a long way towards
making Perl's regular expressions more readable. Note that you have to
be careful not to include the pattern delimiter in the comment--perl has
no way of knowing you did not intend to close the pattern early. See
the C-comment deletion code in L. Also note that anything inside
a C<\Q...\E> stays unaffected by C. And note that C doesn't affect
space interpretation within a single multi-character construct. For
example in C<\x{...}>, regardless of the C modifier, there can be no
spaces. Same for a L such as C<{3}> or
C<{5,}>. Similarly, C<(?:...)> can't have a space between the C> and C<:>,
but can between the C<(> and C>. Within any delimiters for such a
construct, allowed spaces are not affected by C, and depend on the
construct. For example, C<\x{...}> can't have spaces because hexadecimal
numbers don't have spaces in them. But, Unicode properties can have spaces, so
in C<\p{...}> there can be spaces that follow the Unicode rules, for which see
L.
X
=head3 Character set modifiers
C, C, C, and C, available starting in 5.14, are called
the character set modifiers; they affect the character set semantics
used for the regular expression.
At any given time, exactly one of these modifiers is in effect. Once
compiled, the behavior doesn't change regardless of what rules are in
effect when the regular expression is executed. And if a regular
expression is interpolated into a larger one, the original's rules
continue to apply to it, and only it.
Note that the modifiers affect only pattern matching, and do not extend
to any replacement done. For example,
s/foo/\Ubar/l
will uppercase "bar", but the C does not affect how the C<\U>
operates. If C