Regular Expressions in IDE
Purpose
Regular expressions can be used for searching for patterns
rather than literals. For example, it is possible to
search for variables in IDE property files,
which look like $(name) with the regular expression:
\$([a-z.]+)
Replacement with regular expressions allows complex
transformations with the use of tagged expressions.
For example, pairs of numbers separated by a ',' could
be reordered by replacing the regular expression:
\([0-9]+\),\([0-9]+\)
with:
\2,\1
Syntax
-
char matches itself, unless it is a special character
(metachar): . \ [ ] * + ^ $
-
. matches any character.
-
\ matches the character following it, except
when followed by a left or right round bracket,
a digit 1 to 9 or a left or right angle bracket.
(see [7], [8] and [9])
It is used as an escape character for all
other meta-characters, and itself. When used
in a set ([4]), it is treated as an ordinary
character.
-
[set] matches one of the characters in the set.
If the first character in the set is "^",
it matches a character NOT in the set, i.e.
complements the set. A shorthand S-E is
used to specify a set of characters S up to
E, inclusive. The special characters "]" and
"-" have no special meaning if they appear
as the first chars in the set.
examples: match:
[a-z] any lowercase alpha
[^]-] any char except ] and -
[^A-Z] any char except uppercase
alpha
[a-zA-Z] any alpha
-
* any regular expression form [1] to [4], followed by
closure char (*) matches zero or more matches of
that form.
-
+ same as [5], except it matches one or more.
-
a regular expression in the form [1] to [10], enclosed
as \(form\) matches what form matches. The enclosure
creates a set of tags, used for [8] and for
pattern substitution. The tagged forms are numbered
starting from 1.
-
a \ followed by a digit 1 to 9 matches whatever a
previously tagged regular expression ([7]) matched.
-
\< a regular expression starting with a \< construct
\> and/or ending with a \> construct, restricts the
pattern matching to the beginning of a word, and/or
the end of a word. A word is defined to be a character
string beginning and/or ending with the characters
A-Z a-z 0-9 and _. It must also be preceded and/or
followed by any character outside those mentioned.
-
a composite regular expression xy where x and y
are in the form [1] to [10] matches the longest
match of x followed by a match for y.
-
^ a regular expression starting with a ^ character
$ and/or ending with a $ character, restricts the
pattern matching to the beginning of the line,
or the end of line. [anchors] Elsewhere in the
pattern, ^ and $ are treated as ordinary characters.
Acknowledgments
Most of this documentation was originally written by Ozan S. Yigit.
Additions by Neil Hodgson.
All of this document is in the public domain.
|