Regular Expression (regex - regexp) is a sequence of characters that define a search pattern. Usually such patterns are used by string searching algorithms for “find” or “find and replace” operations on strings, or for input validation
Regular Expressions - Relation to Finite-State Automata (FSA)
- a regular expression is 1 way of describing a finite-state automata
- any regular expression can be implemented as a finite-state automata
- any finite-state automata can be described as a regular expression
- a regular expression is 1 way of characterizing a particular kind of formal language called regular language
- both regular expressions and finite-state automata can be used to describe regular languages
- regular grammar - is another way of describing regular languages
Regular Expressions - Operators
|
syntax |
description |
example use |
example matches |
|---|---|---|---|
|
| |
boolean “or” separates alternatives |
gray|grey |
|
|
[] |
square brackets is another way of | |
gr[ae]y |
|
|
[A-Z] |
an upper case letter | ||
|
[a-z] |
a lower case letter | ||
|
[0-9] |
a single digit | ||
|
[ |
carat means negation only when first in [] |
[ | |
|
() |
grouping are used to define the scope and precedence of the operators |
gr(a|e)y |
|
|
? |
indicates zero or one occurrences of the preceding element |
colou?r |
|
|
|
indicates zero or more occurrences of the preceding element |
ab*c |
|
|
|
indicates one or more occurrences of the preceding element |
ab+c |
|
|
{n} |
the preceding item is matched exactly n times | ||
|
{min,} |
the preceding item is matched min or more times | ||
|
{min, max} |
the preceding item is matched at least min times, but not more than max times | ||
|
. |
wildcard matches any character |
a.b |
|
|
^ |
anchors beginning | ||
|
$ |
anchors end |