Regular Expressions
What are they?
Regular expressions are patterns that can be
used to match against a string.
What are they good for?
- Matching - Does this string contain this pattern?
- Extracting data - What does a string with this pattern have as a value for this piece of the pattern?
This talk will mostly be matching
What do they look like?
/pattern/modifiers
- Slash (/) denotes the beginning and end of the pattern.
- The pattern defines the set of strings this regex will match.
- Modifiers (flags) change attributes of how the pattern matches a string.
Project
Does a string contain "allen"?
Write a regular expression to check.
Literal Characters
/allen/
New Requirements
Match 'allan' as well.
Character Classes
/all[ea]n/
/all[^b-df-z]n/
Ranges
/[a-z][A-Z][0-9]/
Negation
/[^a]/
i Modifier
i means case insensitive
/a/i # Matches a and A
New Project
Does the string contain a number?
Shorthand Character Classes
/\d/
What's available?
/\w/ # Same as /[A-Za-z0-9_]/
/\W/ # Same as /[^A-Za-z0-9_]/
/\d/ # Same as /[0-9]/
/\D/ # Same as /[^0-9]/
/\s/ # Same as /[ \t\r\n]/
/\S/ # Same as /[^ \t\r\n]/
/./ # Same as /[^\n]/
Can be nested
/[\D\S]/
m modifier
/./ # same as /[^\n]/ - except /./m
Except when we use the m modifier. With the m modifier it matches every character. This is ruby only.
Repetition
Let's make our number match more robust
/-?[1-9]\d*/ # Can match -1, 90, 7777, etc...
What's available?
/\d?/ # zero or one times aka optional
/\d+/ # one or more times
/\d*/ # zero or more times
/\d{4}/ # exactly four times
/\d{2, 4}/ # two to four times
/\d{2,}/ # at least two times
/\d{0,2}/ # at most two times
Greedy vs Lazy
Lets do a match against this string:
bobbobbob
Greedy (default)
/bob.*bob/ # Matches bobbobbob
Lazy
/bob.*?bob/ # Matches bobbob
Anchors
Let's check that our string only contains a number
/^-?[1-9]\d*$/ # Can match 1, -90, 7777, etc...
What's available?
/^\d/ # digit at the beginning of the line
/\d$/ # digit at the end of the line
/\b\d/ # word boundary - one side matches \w, other side matches \s
/\B\d/ # one side matches \W or \S
/\A\d/ # digit at the beginning of the string (ruby only)
/\d\Z/ # digit at the end of the string (ruby only)
Groups & Alternation
Lets improve our number matching
/^(-?[1-9]\d*(\.\d*[1-9])|0)$/
Backreferences
Let's you refer to groups that have already been matched.
A good use case is matching XML.
/<([\w\-]+)></\1>/i # Matches <regex></regex>
Lookaround
Allows you to specify your own anchors. Javascript can only look ahead. Ruby can also look behind.
Look ahead
/bob(?=by)/ # Matches the bob in bobby, but not bob
Negative look ahead
/bob(?!by)/ # Matches bob in bober, but not bob in bobby
Look behind
/(?<=billy)bob/ Matches bob in billybob, but not bob in jimmybob
Negative look behind
/(?<!billy)bob/ Matches bob in jimmybob, but not billybob
Special Characters
Bracket [
Backslash \
Forward slash /
Caret ^
Dollar $
Dot .
Pipe |
Question Mark ?
Star *
Plus +
Parens ( and )
Useful links
Comprehensive Regular Expression Info
http://www.regular-expressions.info/
Regular Expression Testers
http://rubular.com/
http://scriptular.com/
Regex State Machine
http://www.regexper.com/
Regex Crosswords
http://regexcrossword.com/
Regular Expressions
By blatyo
Regular Expressions
- 756