Regex
For Real
Examples
"^L\d{7}[A-Z]$"
".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*"
'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.]+'
"addUrl.+(?P<url>http://[^']+)'"
".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"
Major features
character classes
\d : a digit, eg. 0, 1, 2, 3, 4
\s : whitespace, eg. space character, \t tab, \n newline
[a-z] : any character between 'a' and 'z' inclusive
[0-9] : any character between '0' and '9' inclusive
groups
".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*
Groups can be used to isolate certain sub-patterns within the overall string
Groups can be named in Python, but not in Javascript
modifiers
\ : escape the preceding character, eg. \$ or \\
. : any character, excluding a newline
^ : beginning of string
$ : end of string
| : or, matches patterns on either side
+ : one, or more of the preceding character or set
* : zero, or more of the preceding character or set
? : zero, or one of the preceding character or set
{n} : n occurrences of the preceding character or set
{m,n} : m, to n, occurrences of the preceding character or set
flags
I|i : case insensitive match
G|g : global, or greedy match
M|m : multiline search, ie. pattern spans newlines
Upper or lowercase is language dependent
Examples
Named groups
python
>>> s = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12"
>>> m = re.match(".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*", s)
>>> m.groupdict()
{'lat': '1.280676', 'lng': '103.853177'}
Groups
Javascript
> var s = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12
> s.match(/.+lon=(\d+\.\d+)&.*lat=(\d+\.\d+).*/)
["http://map.com/?lon=103.853177&lat=1.280676&zoom=12", "103.853177", "1.280676"]
Character sets
\d : all digits, ie. [0-9]
\D : characters EXCLUDING [0-9]
\s : whitespace characters, eg. newline (\n, \r\n), tabs (\t), space
\S : characters EXCLUDING \s
[a-zA-Z] : all English letters in upper and lowercase
[!@#$%^&*()] : characters typed using Shift-<number>
Custom Character Sets
>>> a
'[{"two": "public", "one": "secret"}, {"two": "guava", "one": "banana"}]'
>>> re.findall("\"two\": \"(.+)\"", a)
['public", "one": "secret"}, {"two": "guava", "one": "banana']
>>> re.findall("\"two\": \"([^\"]+)\"", a)
['public', 'guava']
Python regexes are greedy by default
Custom Character Sets
.+lon=(\d+\.\d+)&.*lat=(\d+\.\d+).*/
.+ : one or more non-newline characters
lon= : the literal string, "lon="
(\d+\.\d+): group, >1 digits, a literal dot, >1 digits
& : a literal '&' character
.* : zero or more non-newline characters
lat= : the literal string, 'lat='
(\d+\.\d+): group, >1 digits, a literal dot, >1 digits
.* : zero or more non-newline characters
^[R|P]\d{6}[A-Z]$
^[R|P] : string must begin with 'R' or 'P'
\d{6} : 6 digits in a row
[A-Z] : one instance of any character in the range A-Z
$ : end of string
".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"
.* : zero or more non-newline characters
\. : literal '.' character
.(png|jpg|jpeg|gif)$ : ends with either of 'png', 'jpg', 'jpeg', 'gif'
| : OR
^https://graph.face..: begins with Facebook Graph URL ...
\d+ : has a sequence of one or more digits
/pictures$ : ends with '/pictures'
A note on cases
Regex operators are case sensitive, so
\d is different from \D
and
\s is different from \S
Regex
By Ruiwen Chua
Regex
- 818