Regex

For Real

Examples

"^L\d{7}[A-Z]$"

".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*"

'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.]+'

"addUrl.+(?P<url>http://[^']+)'"

".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"

Major features

character classes

 \d    :     a digit, eg. 0, 1, 2, 3, 4 \s    :     whitespace, eg. space character, \t tab, \n newline [a-z] :     any character between 'a' and 'z' inclusive [0-9] :     any character between '0' and '9' inclusive

groups

".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*

Groups can be used to isolate certain sub-patterns within the overall string

Groups can be named in Python, but not in Javascript

modifiers

 \     :    escape the preceding character, eg. \$ or \\ .     :    any character, excluding a newline ^     :    beginning of string $     :    end of string |     :    or, matches patterns on either side +     :    one, or more of the preceding character or set *     :    zero, or more of the preceding character or set ?     :    zero, or one of the preceding character or set {n}   :    n occurrences of the preceding character or set {m,n} :    m, to n, occurrences of the preceding character or set

flags

 I|i  :    case insensitive match G|g  :    global, or greedy match M|m  :    multiline search, ie. pattern spans newlines

Upper or lowercase is language dependent

Examples

Named groups

python

>>> s = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12"
>>> m = re.match(".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*", s) >>> m.groupdict(){'lat': '1.280676', 'lng': '103.853177'}

Groups

Javascript

> var s  = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12> s.match(/.+lon=(\d+\.\d+)&.*lat=(\d+\.\d+).*/)["http://map.com/?lon=103.853177&lat=1.280676&zoom=12", "103.853177", "1.280676"]

Character sets

 \d  :  all digits, ie. [0-9] \D  :  characters EXCLUDING [0-9] \s  :  whitespace characters, eg. newline (\n, \r\n), tabs (\t), space \S  :  characters EXCLUDING \s  [a-zA-Z]      :  all English letters in upper and lowercase [!@#$%^&*()]  :  characters typed using Shift-<number>

Custom Character Sets

>>> a'[{"two": "public", "one": "secret"}, {"two": "guava", "one": "banana"}]'

>>> re.findall("\"two\": \"(.+)\"", a)
['public", "one": "secret"}, {"two": "guava", "one": "banana']

 >>> re.findall("\"two\": \"([^\"]+)\"", a)['public', 'guava']

Python regexes are greedy by default

Custom Character Sets

.+lon=(\d+\.\d+)&.lat=(\d+\.\d+)./

 .+        : one or more non-newline characters lon=      : the literal string, "lon=" (\d+\.\d+): group, >1 digits, a literal dot, >1 digits &         : a literal '&' character .*        : zero or more non-newline characters lat=      : the literal string, 'lat=' (\d+\.\d+): group, >1 digits, a literal dot, >1 digits .*        : zero or more non-newline characters

^[R|P]\d{6}[A-Z]$

  ^[R|P] :  string must begin with 'R' or 'P'  \d{6}  :  6 digits in a row  [A-Z]  :  one instance of any character in the range A-Z  $      :  end of string

".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"

 .*                   :  zero or more non-newline characters \.                   :  literal '.' character .(png|jpg|jpeg|gif)$ :  ends with either of 'png', 'jpg', 'jpeg', 'gif'
 |                    :  OR ^https://graph.face..:  begins with Facebook Graph URL ... \d+                  :  has a sequence of one or more digits /pictures$           :  ends with '/pictures'

A note on cases

Regex operators are case sensitive, so

\d is different from \D

and

\s is different from \S