Regular Expressions

What are they?




Regular expressions are patterns that can be 
used to match against a string.

What are they good for?



  • Matching - Does this string contain this pattern?
  • Extracting data - What does a string with this pattern have as a value for this piece of the pattern?

This talk will mostly be matching

What do they look like?



/pattern/modifiers

  • Slash (/) denotes the beginning and end of the pattern.
  • The pattern defines the set of strings this regex will match.
  • Modifiers (flags) change attributes of how the pattern matches a string.

Project




Does a string contain "allen"?

Write a regular expression to check.

Literal Characters




 /allen/

New Requirements



Match 'allan' as well.

Character Classes


/all[ea]n//all[^b-df-z]n/
Ranges
 /[a-z][A-Z][0-9]/
Negation
 /[^a]/

i Modifier


i means case insensitive

 /a/i # Matches a and A

New Project




Does the string contain a number?

Shorthand Character Classes


 /\d/
What's available?
/\w/ # Same as /[A-Za-z0-9_]//\W/ # Same as /[^A-Za-z0-9_]//\d/ # Same as /[0-9]//\D/ # Same as /[^0-9]//\s/ # Same as /[ \t\r\n]//\S/ # Same as /[^ \t\r\n]//./  # Same as /[^\n]/
Can be nested
 /[\D\S]/








m modifier


 /./ # same as /[^\n]/ - except /./m

Except when we use the m modifier. With the m modifier it matches every character. This is ruby only.








Repetition

Let's make our number match more robust
 /-?[1-9]\d*/ # Can match -1, 90, 7777, etc...

What's available?
/\d?/      # zero or one times aka optional/\d+/      # one or more times/\d*/      # zero or more times/\d{4}/    # exactly four times/\d{2, 4}/ # two to four times/\d{2,}/   # at least two times/\d{0,2}/   # at most two times








Greedy vs Lazy

Lets do a match against this string: 
bobbobbob

Greedy (default)
 /bob.*bob/ # Matches bobbobbob

Lazy
 /bob.*?bob/ # Matches bobbob


Anchors

Let's check that our string only contains a number
 /^-?[1-9]\d*$/ # Can match 1, -90, 7777, etc...

What's available?
/^\d/    # digit at the beginning of the line/\d$/    # digit at the end of the line/\b\d/   # word boundary - one side matches \w, other side matches \s/\B\d/   # one side matches \W or \S/\A\d/   # digit at the beginning of the string (ruby only)/\d\Z/   # digit at the end of the string (ruby only)








Groups & Alternation


Lets improve our number matching
 /^(-?[1-9]\d*(\.\d*[1-9])|0)$/






















Backreferences

Let's you refer to groups that have already been matched.

A good use case is matching XML.

 /<([\w\-]+)></\1>/i # Matches <regex></regex>











Lookaround

Allows you to specify your own anchors. Javascript can only look ahead. Ruby can also look behind.

      Look ahead
 /bob(?=by)/ # Matches the bob in bobby, but not bob
      Negative look ahead
 /bob(?!by)/ # Matches bob in bober, but not bob in bobby
      Look behind
 /(?<=billy)bob/ Matches bob in billybob, but not bob in jimmybob
      Negative look behind
 /(?<!billy)bob/ Matches bob in jimmybob, but not billybob



Special Characters

Bracket [
Backslash \
Forward slash /
Caret ^
Dollar $
Dot .
Pipe |
Question Mark ?
Star *
Plus +
Parens ( and )

Useful links

Comprehensive Regular Expression Info
http://www.regular-expressions.info/

Regular Expression Testers
http://rubular.com/
http://scriptular.com/

Regex State Machine
http://www.regexper.com/

Regex Crosswords
http://regexcrossword.com/

Regular Expressions

By blatyo

Regular Expressions

  • 756