Regex

For Real

Examples

"^L\d{7}[A-Z]$"

".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*"

'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.]+'

"addUrl.+(?P<url>http://[^']+)'"

".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"

Major features

character classes

 \d    :     a digit, eg. 0, 1, 2, 3, 4 \s    :     whitespace, eg. space character, \t tab, \n newline [a-z] :     any character between 'a' and 'z' inclusive [0-9] :     any character between '0' and '9' inclusive

groups

".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*

Groups can be used to isolate certain sub-patterns within the overall string

Groups can be named in Python, but not in Javascript

modifiers

 \     :    escape the preceding character, eg. \$ or \\ .     :    any character, excluding a newline ^     :    beginning of string $     :    end of string |     :    or, matches patterns on either side +     :    one, or more of the preceding character or set *     :    zero, or more of the preceding character or set ?     :    zero, or one of the preceding character or set {n}   :    n occurrences of the preceding character or set {m,n} :    m, to n, occurrences of the preceding character or set

flags

 I|i  :    case insensitive match G|g  :    global, or greedy match M|m  :    multiline search, ie. pattern spans newlines

Upper or lowercase is language dependent

Examples

Named groups

python

>>> s = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12"
>>> m = re.match(".+lon=(?P<lng>\d+\.\d+)&.*lat=(?P<lat>\d+\.\d+)&.*", s) >>> m.groupdict(){'lat': '1.280676', 'lng': '103.853177'}

Groups

Javascript

> var s  = "http://map.com/?lon=103.853177&lat=1.280676&zoom=12> s.match(/.+lon=(\d+\.\d+)&.*lat=(\d+\.\d+).*/)["http://map.com/?lon=103.853177&lat=1.280676&zoom=12", "103.853177", "1.280676"]

Character sets

 \d  :  all digits, ie. [0-9] \D  :  characters EXCLUDING [0-9] \s  :  whitespace characters, eg. newline (\n, \r\n), tabs (\t), space \S  :  characters EXCLUDING \s  [a-zA-Z]      :  all English letters in upper and lowercase [!@#$%^&*()]  :  characters typed using Shift-<number>

Custom Character Sets

>>> a'[{"two": "public", "one": "secret"}, {"two": "guava", "one": "banana"}]'

>>> re.findall("\"two\": \"(.+)\"", a)
['public", "one": "secret"}, {"two": "guava", "one": "banana']

 >>> re.findall("\"two\": \"([^\"]+)\"", a)['public', 'guava']

Python regexes are greedy by default

Custom Character Sets

.+lon=(\d+\.\d+)&.lat=(\d+\.\d+)./

 .+        : one or more non-newline characters lon=      : the literal string, "lon=" (\d+\.\d+): group, >1 digits, a literal dot, >1 digits &         : a literal '&' character .*        : zero or more non-newline characters lat=      : the literal string, 'lat=' (\d+\.\d+): group, >1 digits, a literal dot, >1 digits .*        : zero or more non-newline characters

^[R|P]\d{6}[A-Z]$

  ^[R|P] :  string must begin with 'R' or 'P'  \d{6}  :  6 digits in a row  [A-Z]  :  one instance of any character in the range A-Z  $      :  end of string

".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"

 .*                   :  zero or more non-newline characters \.                   :  literal '.' character .(png|jpg|jpeg|gif)$ :  ends with either of 'png', 'jpg', 'jpeg', 'gif'
 |                    :  OR ^https://graph.face..:  begins with Facebook Graph URL ... \d+                  :  has a sequence of one or more digits /pictures$           :  ends with '/pictures'

A note on cases

Regex operators are case sensitive, so

\d is different from \D

and

\s is different from \S

Regex

By Ruiwen Chua

Regex

For Real

Examples

Major features

character classes

groups

modifiers

flags

Examples

Named groups

python

Groups

Character sets

Custom Character Sets

Custom Character Sets

.+lon=(\d+\.\d+)&.*lat=(\d+\.\d+).*/

^[R|P]\d{6}[A-Z]$

".*\.(png|jpg|jpeg|gif)$|^https://graph.facebook.com/\d+/picture$"

A note on cases

Regex

More from Ruiwen Chua

.+lon=(\d+\.\d+)&.lat=(\d+\.\d+)./