Everything you need to know about /REGEX/g
And about 20% more that you don't
RegEx
That weird thing that you have to look up on stackoverflow once a year, but it's really useful when you do
This talk will:
- Start very theoretical
- End very practical
What is "language"?
"language" is the ability to use and acquire systems of communication
"a language" is an implementation of one of those systems
Grammar
Rules for the structure of the language, and how that translates to MEANING
Syntax
Rules for more nuanced differentiation in meaning:
active vs passive
declarative vs imperative
Grammar is the difference between "Dog bites man" and "Man bites dog."
subject verb object
When you come to donuts, have a donut.
Some rules that apply to this sentence:
- In a complex sentence, a dependent clause must follow or precede an independent clause
- (you) have a donut: an independent clause must have a subject, verb, and an optional object
Jan walks.
Jan walks the dog.
Rule: Independent Clauses contain a Subject, verb, and optional object
IC = S, V, *O
A combination of symbols that represents a grammar rule is an
expression
IC = SV*O
Note: a regular language is a language that has a well defined and strict grammar
Many spoken languages are not regular, because humans.
So an expression that defines the grammar of a regular language is a
Regular Expression
Pivot
What is a grammar, in the context of a programming language?
programming languages as Languages
Sets of rules (grammars) which defined what keywords, expressions, characters can appear in what order, and what the meaning is of those things.
For example:
An expression is anything that can be on the left of an equals sign.
const a = b + 3
expression
Recursive Definition
Exp =:: Exp
Exp =:: Exp + Exp
Exp =:: Exp - Exp
Exp =:: (Exp)
Exp =:: Exp && Exp
Exp =:: Exp || Exp
Exp =:: ! Exp
State Machines
Any computer program can be modeled as a "state machine", shown as a graph with edges and nodes
State Machine Example
State Machines === Regular Expressions
ab
a*
a|b
abc+
an 'a', followed by a 'b', followed by one or more 'c's
Pivot
Epiphany: Hey, we can use this for pattern matching!
When you define a regular expression for a string, you are (behind the scenes) defining a grammar for a new language that the string is a part of
/Hello World/
is a regular expression for the grammar of a language in which "Hello World" is the only valid entry
So, when
"Hello World"
matches the regex
/Hello World/
what that really means is that it is a valid part of the new Hello World Language
Parts of a Regular Expression
/expression pattern/flags
/Hello World/i
match exactly "Hello World" case insensitive
/Hello/m
Match exactly "Hello" over multiple lines
Some useful regex rules
ab an 'a' followed by a 'b' a|b an 'a' or a 'b' a* any number of 'a's (including 0) a+ at least one 'a'
a? an optional single 'a'
\? an escaped '?'
Some useful regex rules
. any character [a-z] any character in the range of a->z [A-Z] " " capital A->Z [0-9] " " 0->9
\d any single-digit
\w any single "word" character (alpha numeric)
Some useful regex rules
^ Beginning of a string or line \b Word boundary () Group, Capture Group $ End of a string or line
Let's build one
Beginning of word
Group of 1 digit & optional 2nd digit with a slash
Occurs exactly twice
4 digits
Let's build one
Beginning of a word
A group of 1+ characters, digits, periods or plus signs
The string
"@ldschurch."
The string "org" or "net"
Note: The actual rule to validate spec email addresses is ... longer
^(?:(?:[\w`~!#$%^&*\-=+;:{}'|,?\/]+(?:(?:\.(?:"(?:\\?[\w`~!#$%^&*\-=+;:{}'|,?\/\.()<>\[\] @]|\\"|\\\\)*"|[\w`~!#$%^&*\-=+;:{}'|,?\/]+))*\.[\w`~!#$%^&*\-=+;:{}'|,?\/]+)?)|(?:"(?:\\?[\w`~!#$%^&*\-=+;:{}'|,?\/\.()<>\[\] @]|\\"|\\\\)+"))@(?:[a-zA-Z\d\-]+(?:\.[a-zA-Z\d\-]+)*|\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\])$
Let's build one
A "$" and a "{"
1 or more "word" characters
A single "}"
Using
Regular Expressions
In JavaScript
Primitive Data Type
Regular expressions in javascript are primitive data types.
You can save them to variables or call their methods directly
const exp = /Hello World/;
exp.test("Hello World"); //true
/Goodbye World/.test('Hello World'); //false
Methods that use RegEx
Regex Methods
RegEx.test(string) //boolean
RegEx.exec(string) //data
const isFound = /Hello World/.test("Hello World");
//true
const foundData = /Hello/.exec("Hello World");
//Array
// "Hello"
// index: 0
// input: "Hello World"
Methods that use RegEx
String Methods
String.match(RegEx) //array
String.search(RegEx) //index
String.replace(RegEx) //string
String.split(RegEx) //array
"Hello World".match(/World/); //Data
"Hello World".search(/World/); //6
"Hello World".replace(/World/, "Everybody!"); //Hello Everybody!
"Hello World".split(/\s/); //["Hello", "World"]
Live code a thing
/end|beginning/igm
Regular Expressions
By Michael Jasper
Regular Expressions
- 781