Regex

For Real
Examples
"^L\d{7}[A-Z]$"
".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*"
'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.]+'
"addUrl.+(?P<url>http://[^']+)'"
".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"
Major features

character classes
\d : a digit, eg. 0, 1, 2, 3, 4\s : whitespace, eg. space character, \t tab, \n newline[a-z] : any character between 'a' and 'z' inclusive[0-9] : any character between '0' and '9' inclusive
groups
".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*
Groups can be used to isolate certain sub-patterns within the overall string
Groups can be named in Python, but not in Javascript
modifiers
\ : escape the preceding character, eg. \$ or \\. : any character, excluding a newline^ : beginning of string$ : end of string| : or, matches patterns on either side+ : one, or more of the preceding character or set* : zero, or more of the preceding character or set? : zero, or one of the preceding character or set{n} : n occurrences of the preceding character or set{m,n} : m, to n, occurrences of the preceding character or set
flags
I|i : case insensitive matchG|g : global, or greedy matchM|m : multiline search, ie. pattern spans newlines
Upper or lowercase is language dependent
Examples
Named groups
python
>>> s = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12">>> m = re.match(".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*", s)>>> m.groupdict(){'lat': '1.280676', 'lng': '103.853177'}
Groups
Javascript
> var s = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12> s.match(/.+lon=(\d+\.\d+)&.*lat=(\d+\.\d+).*/)["http://map.com/?lon=103.853177&lat=1.280676&zoom=12", "103.853177", "1.280676"]
Character sets
\d : all digits, ie. [0-9]\D : characters EXCLUDING [0-9]\s : whitespace characters, eg. newline (\n, \r\n), tabs (\t), space\S : characters EXCLUDING \s[a-zA-Z] : all English letters in upper and lowercase[!@#$%^&*()] : characters typed using Shift-<number>
Custom Character Sets
>>> a'[{"two": "public", "one": "secret"}, {"two": "guava", "one": "banana"}]'
>>> re.findall("\"two\": \"(.+)\"", a)
['public", "one": "secret"}, {"two": "guava", "one": "banana']
>>> re.findall("\"two\": \"([^\"]+)\"", a)['public', 'guava']
Python regexes are greedy by default
Custom Character Sets
.+lon=(\d+\.\d+)&.*lat=(\d+\.\d+).*/
.+ : one or more non-newline characterslon= : the literal string, "lon="(\d+\.\d+): group, >1 digits, a literal dot, >1 digits& : a literal '&' character.* : zero or more non-newline characterslat= : the literal string, 'lat='(\d+\.\d+): group, >1 digits, a literal dot, >1 digits.* : zero or more non-newline characters
^[R|P]\d{6}[A-Z]$
^[R|P] : string must begin with 'R' or 'P'\d{6} : 6 digits in a row[A-Z] : one instance of any character in the range A-Z$ : end of string
".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"
.* : zero or more non-newline characters\. : literal '.' character.(png|jpg|jpeg|gif)$ : ends with either of 'png', 'jpg', 'jpeg', 'gif'| : OR^https://graph.face..: begins with Facebook Graph URL ...\d+ : has a sequence of one or more digits/pictures$ : ends with '/pictures'
A note on cases
Regex operators are case sensitive, so
\d is different from \D
and
\s is different from \S
Regex
By Ruiwen Chua
Regex
- 923
