presented by Kris Koskelin
kkoskelin@gmail.com

A powerful tool for string manipulation


  • Identify substring/patterns in the subject
  • Perform simple or complex search & replace
  • quick & dirty data transformation

If you're old school,  some of these regex tools are probably familiar to you:
 grep, sed & AWK      

But I still don't understand...


Some of my own

while(my $line = <SRC>) {
  my $original = $line;
  ($line = $original) =~ s/^\s*(.*?)\s*$/$1/g;
  $line =~ s/^.+alt="([^"]+?)[.,]*".*$/$1/sg;
  push @sources, $line;
}
close(SRC);

open(TTS, "< $tts") or die "$!";
{
  local $/ = undef;
  my $tmp = <TTS>;
  @meta = ($tmp =~ m#\[([\d\.]{3,})#g);
}
close(TTS);

... or is it more like this?




creating regexp objects

There's also more than one way to create a Regular Expression object in Javascript.

RegExp constructor:  runtime compilation 
var myRe1 = new RegExp("reg(ular)?\s*exp(ression)?");
RegExp literal:  compilation on script load
var myRe2 = /reg(ular)?\s*exp(ression)?/;
These instantiate new RegExp objects, too!
if ( url.match("MadJS") ) ...

Some basic concepts

substring:  /foo/
anchoring:  /^foo$/
capturing:  /foo(ba.)/
alternation:  /find(this|that)/
character classes:  /[0-9a-f]/
character class negation:  /[^g-z]/
quantifiers:   * ? + {n} {n,} {n,m}
special characters:    . \b \d \w \s
And their inverses: \B \D \W \S
pattern modifiers:  /regex/gimy

Advanced concepts

back-references 

 /<([^>]+)>foo<\/\1>/
would match  <span>foo</span> 
or <div>foo</div>
but not  <span>foo</div>

look-ahead, negative look-ahead assertions

 x(?=y)   x(?!y)

Regular Expressions in Javascript

There's more than one way to skin a cat.

RegExp.prototype.test
RegExp.prototype.exec
String.prototype.search
String.prototype.match
String.prototype.replace
String.prototype.split

Visit now: http://jsperf.com/regexp-test-v-match

RegExp.prototype.test


myRe = /foo(bar)?baz/i

myRe.test("foobaz") - true
myRe.test("foobar") - true
myRe.test("foobbaz") - false 

Very computationally inexpensive.

RegExp.prototype.exec

re = /\D(\d+)/g;
phone = "call 800-555-1212";

re.exec(phone); // [" 800", "800"] re.exec(phone); // ["-555", "555"] re.exec(phone); // ["-1212", "1212"] re.exec(phone); // null


full match & captured string returned each time exec() is invoked.

String.prototype.search

var myRe = /(mad)\s\w+$/i;
"mad, mad world".search(myRe); 
result: 5 - index of first found match.

String.prototype.match

The grand-daddy of abused JS regex methods.
var content = "case-oriented methods compare cases and consider combinations or conjectures of causal conditions";
var re = /(c\w+)\s[^c]\w+/ig;
content.match(re);
["cases and", "combinations or", "conjectures of"] 

Did you actually need to know what was matched?

String.prototype.replace

Regexes, not just strings!
rock.replace("Bon Scott", "Brian Johnson"); // is really...rock.replace(new RegExp("Bon Scott"), "Brian Johnson"); 

Use back-references.  Use pattern modifiers.
var rock = "Singers for AC/DC have included BON SCOTT and Brian Johnson.";rock.replace(/(Bon|Brian)\s+(Scott|Johnson)/ig, "Mr. $2");// "Singers for AC/DC have included Mr. SCOTT and Mr. Johnson." 
var roll = "I enjoy many many kinds of of music.";roll.replace(/(\w+)\s*\1/g, "!$1");// "I enjoy !many kinds !of music.

String.prototype.split

Once again, we're using regexes here, not just strings.
"Rage Against \n\n  The Machine".split(/\s+/)// ["Rage", "Against", "The", "Machine"] 
"rage against the dying of the light".split(/g\w+/);// ["ra", " a", " the dying of the li", ""]

COMing soon in es6

fewer reasons to use regexes
String.prototype.contains = function(s) {
  return this.indexOf(s) !== -1;
};
String.prototype.startsWith = function(s) {
  return this.indexOf(s) === 0;
};
String.prototype.endsWith = function(s) {
    var t = String(s);
    var index = this.lastIndexOf(t);
    return index >= 0 && index === this.length - t.length;
};

I came here for an argument,
but this is abuse.

From a project I'm currently working on...
9 sound instances of RegExp.prototype.test3 sound instances of RegExp.prototype.exec11 sound instances of String.prototype.match
1 String.prototype.search where String.prototype.indexOf would suffice
  e.g.   if (url.match(/something/))
9 String.prototype.match where String.prototype.indexOf would suffice
  e.g.   if (url.match(/something/))
10 String.prototype.match where RegExp.prototype.test would suffice
  e.g.   if (url.match(/foo(bar|baz)/i)) 

Speaking of abuse... know your domain and its limits.

 The Scunthorpe Problem 

Homosexual eases into 100 final at Olympic Trials
2008 event featuring sprinter Tyson Gay

clbuttic.  buttbuttination.
classic.  assassination.

the big takeaways

Only ask for as much information as you need:
RegExp.prototype.test(String)  >> 
String.prototype.search(RegExp)  >> 
RegExp.prototype.exec(String) >> 
String.prototype.match(RegExp)  

DO NOT use RegExp if
 String.prototype.indexOf(String) 
is sufficient.
Other: validating HTML markup, numbers, email addresses, URLs; use built-in tools, functions, external libs

i need a hero!



Resources to Google

"MDN Regular Expressions"
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
"Ben Nadel"
http://www.bennadel.com/
"perlre"
http://perldoc.perl.org/perlre.html
"The Scunthorpe Problem"
http://en.wikipedia.org/wiki/Scunthorpe_problem
"O'Reilly Regular Expressions"
Made with Slides.com