presented by Kris Koskelin
kkoskelin@gmail.com
A powerful tool for string manipulation
- Identify substring/patterns in the subject
- Perform simple or complex search & replace
- quick & dirty data transformation
If you're old school, some of these regex tools are probably familiar to you:
grep, sed & AWK
But I still don't understand...
Some of my own
while(my $line = <SRC>) {
my $original = $line;
($line = $original) =~ s/^\s*(.*?)\s*$/$1/g;
$line =~ s/^.+alt="([^"]+?)[.,]*".*$/$1/sg;
push @sources, $line;
}
close(SRC);
open(TTS, "< $tts") or die "$!";
{
local $/ = undef;
my $tmp = <TTS>;
@meta = ($tmp =~ m#\[([\d\.]{3,})#g);
}
close(TTS);
... or is it more like this?
creating regexp objects
There's also more than one way to create a Regular Expression object in Javascript.
RegExp constructor: runtime compilation
var myRe1 = new RegExp("reg(ular)?\s*exp(ression)?");
RegExp literal: compilation on script load
var myRe2 = /reg(ular)?\s*exp(ression)?/;
These instantiate new RegExp objects, too!
if ( url.match("MadJS") ) ...
Some basic concepts
substring: /foo/
anchoring: /^foo$/
capturing: /foo(ba.)/
alternation: /find(this|that)/
character classes: /[0-9a-f]/
character class negation: /[^g-z]/
quantifiers: * ? + {n} {n,} {n,m}
special characters: . \b \d \w \s
And their inverses: \B \D \W \S
pattern modifiers: /regex/gimy
Advanced concepts
back-references
/<([^>]+)>foo<\/\1>/
would match <span>foo</span>
or <div>foo</div>
but not <span>foo</div>
look-ahead, negative look-ahead assertions
x(?=y) x(?!y)
Regular Expressions in Javascript
There's more than one way to skin a cat.
RegExp.prototype.test
RegExp.prototype.exec
String.prototype.search
String.prototype.match
String.prototype.replace
String.prototype.split
Visit now: http://jsperf.com/regexp-test-v-match
RegExp.prototype.test
myRe = /foo(bar)?baz/i
myRe.test("foobaz") - true
myRe.test("foobar") - true
myRe.test("foobbaz") - false
Very computationally inexpensive.
RegExp.prototype.exec
re = /\D(\d+)/g; phone = "call 800-555-1212";
re.exec(phone); // [" 800", "800"]
re.exec(phone); // ["-555", "555"]
re.exec(phone); // ["-1212", "1212"]
re.exec(phone); // null
full match & captured string returned each time exec() is invoked.
String.prototype.search
var myRe = /(mad)\s\w+$/i;
"mad, mad world".search(myRe);
result: 5 - index of first found match.
String.prototype.match
The grand-daddy of abused JS regex methods.
var content = "case-oriented methods compare cases and consider combinations or conjectures of causal conditions";
var re = /(c\w+)\s[^c]\w+/ig;
content.match(re);
["cases and", "combinations or", "conjectures of"]
Did you actually need to know what was matched?
String.prototype.replace
Regexes, not just strings!
rock.replace("Bon Scott", "Brian Johnson");
// is really...
rock.replace(new RegExp("Bon Scott"), "Brian Johnson");
Use back-references. Use pattern modifiers.
var rock = "Singers for AC/DC have included BON SCOTT and Brian Johnson.";
rock.replace(/(Bon|Brian)\s+(Scott|Johnson)/ig, "Mr. $2");
// "Singers for AC/DC have included Mr. SCOTT and Mr. Johnson."
var roll = "I enjoy many many kinds of of music.";
roll.replace(/(\w+)\s*\1/g, "!$1");
// "I enjoy !many kinds !of music.
String.prototype.split
Once again, we're using regexes here, not just strings.
"Rage Against \n\n The Machine".split(/\s+/)
// ["Rage", "Against", "The", "Machine"]
"rage against the dying of the light".split(/g\w+/);
// ["ra", " a", " the dying of the li", ""]
COMing soon in es6
fewer reasons to use regexes
String.prototype.contains = function(s) { return this.indexOf(s) !== -1; };
String.prototype.startsWith = function(s) { return this.indexOf(s) === 0; };
String.prototype.endsWith = function(s) { var t = String(s); var index = this.lastIndexOf(t); return index >= 0 && index === this.length - t.length; };
I came here for an argument,
but this is abuse.
From a project I'm currently working on...
9 sound instances of RegExp.prototype.test
3 sound instances of RegExp.prototype.exec
11 sound instances of String.prototype.match
1 String.prototype.search where String.prototype.indexOf would suffice e.g. if (url.match(/something/))
9 String.prototype.match where String.prototype.indexOf would suffice e.g. if (url.match(/something/))
10 String.prototype.match where RegExp.prototype.test would suffice e.g. if (url.match(/foo(bar|baz)/i))
Speaking of abuse... know your domain and its limits.
The Scunthorpe Problem
Homosexual eases into 100 final at Olympic Trials
2008 event featuring sprinter Tyson Gay
clbuttic. buttbuttination.
classic. assassination.
the big takeaways
Only ask for as much information as you need:
RegExp.prototype.test(String) >>
String.prototype.search(RegExp) >>
RegExp.prototype.exec(String) >>
String.prototype.match(RegExp)
DO NOT use RegExp if
String.prototype.indexOf(String)
is sufficient.
Other: validating HTML markup, numbers, email addresses, URLs; use built-in tools, functions, external libs
i need a hero!
Resources to Google
"MDN Regular Expressions"
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
"Ben Nadel"
http://www.bennadel.com/
"perlre"
http://perldoc.perl.org/perlre.html
"The Scunthorpe Problem"
http://en.wikipedia.org/wiki/Scunthorpe_problem"O'Reilly Regular Expressions"
Regular Expressions in Javascript
By kkoskelin
Regular Expressions in Javascript
Regular expressions are frequently seen as black magic. But they're more accessible than you think, and you're probably already using them. Learn how (and when) to use their full potential.
- 1,654