Everything you need to know about /REGEX/g

And about 20% more that you don't

RegEx

That weird thing that you have to look up on stackoverflow once a year, but it's really useful when you do

This talk will:

  • Start very theoretical
  • End very practical

What is "language"?

"language" is the ability to use and acquire systems of communication

"a language" is an implementation of one of those systems

Grammar

Rules for the structure of the language, and how that translates to MEANING

Syntax

Rules for more nuanced differentiation in meaning:

 

active vs passive

declarative vs imperative 

Grammar is the difference between "Dog bites man" and "Man bites dog."

 

subject verb object 

When you come to donuts, have a donut.

Some rules that apply to this sentence:

 

  • In a complex sentence, a dependent clause must follow or precede an independent clause

 

  • (you) have a donut: an independent clause must have a subject, verb, and an optional object

Jan walks.

Jan walks the dog.

Rule: Independent Clauses contain a Subject, verb, and optional object

IC = S, V, *O

A combination of symbols that represents a grammar rule is an

 expression

IC = SV*O

Note: a regular language is a language that has a well defined and strict grammar

Many spoken languages are not regular, because humans.

So an expression that defines the grammar of a regular language is a 

 

Regular Expression

Pivot

What is a grammar, in the context of a programming language?

programming languages as​ Languages

Sets of rules (grammars) which defined what keywords, expressions, characters can appear in what order, and what the meaning is of those things.

For example:

An expression is anything that can be on the left of an equals sign.

 

const a = b + 3

expression

Recursive Definition

Exp =:: Exp

 

Exp =:: Exp + Exp

Exp =:: Exp - Exp

Exp =:: (Exp)

Exp =:: Exp && Exp

Exp =:: Exp || Exp

Exp =:: ! Exp

State Machines

Any computer program can be modeled as a "state machine", shown as a graph with edges and nodes

State Machine Example

State Machines === Regular Expressions

ab
a*
a|b
abc+

an 'a', followed by a 'b', followed by one or more 'c's

Pivot

Epiphany: Hey, we can use this for pattern matching!

When you define a regular expression for a string, you are (behind the scenes) defining a grammar for a new language that the string is a part of 

/Hello World/

 

is a regular expression for the grammar of a language in which "Hello World" is the only valid entry

So, when

"Hello World"

matches the regex

/Hello World/

what that really means is that it is a valid part of the new Hello World Language

Parts of a Regular Expression

/expression pattern/flags

/Hello World/i

 

match exactly "Hello World" case insensitive

 

 

/Hello/m

 

Match exactly "Hello" over multiple lines

Some useful regex rules

ab       an 'a' followed by a 'b'

a|b      an 'a' or a 'b'

a*       any number of 'a's (including 0)

a+       at least one 'a'

a?                 an optional single 'a'

 

\?                  an escaped '?'

 

Some useful regex rules

.        any character

[a-z]    any character in the range of a->z

[A-Z]    "   "                 capital A->Z

[0-9]    "   "                         0->9

 

\d                 any single-digit

 

\w                any single "word" character (alpha numeric)

Some useful regex rules

^        Beginning of a string or line

\b       Word boundary

()       Group, Capture Group

$        End of a string or line

Let's build one

Match some dates in this format

 

12/25/2017

 

http://regexr.com/

Beginning of word

Group of 1 digit & optional 2nd digit with a slash

Occurs exactly twice

4 digits

Let's build one

Match emails of a certain domain

 

mdjasper@ldschurch.org

 

http://regexr.com/

Beginning of a word

A group of 1+ characters, digits, periods or plus signs

The string

"@ldschurch."

The string "org" or "net"

Note: The actual rule to validate spec email addresses is ... longer

^(?:(?:[\w`~!#$%^&*\-=+;:{}'|,?\/]+(?:(?:\.(?:"(?:\\?[\w`~!#$%^&*\-=+;:{}'|,?\/\.()<>\[\] @]|\\"|\\\\)*"|[\w`~!#$%^&*\-=+;:{}'|,?\/]+))*\.[\w`~!#$%^&*\-=+;:{}'|,?\/]+)?)|(?:"(?:\\?[\w`~!#$%^&*\-=+;:{}'|,?\/\.()<>\[\] @]|\\"|\\\\)+"))@(?:[a-zA-Z\d\-]+(?:\.[a-zA-Z\d\-]+)*|\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\])$

Let's build one

Find text between "${" and "}"

 

Hello, ${name}

 

http://regexr.com/

A "$" and a "{"

1 or more "word" characters

A single "}"

Using

Regular Expressions

In JavaScript

Primitive Data Type

Regular expressions in javascript are primitive data types.

 

You can save them to variables or call their methods directly

const exp = /Hello World/;

exp.test("Hello World"); //true
/Goodbye World/.test('Hello World'); //false

Methods that use RegEx

Regex Methods

RegEx.test(string) //boolean

 

 

 

RegEx.exec(string) //data

const isFound = /Hello World/.test("Hello World"); 
//true
const foundData = /Hello/.exec("Hello World");

//Array
//  "Hello"
//  index: 0
//  input: "Hello World"

Methods that use RegEx

String Methods

String.match(RegEx) //array

 

 

String.search(RegEx) //index

 

 

String.replace(RegEx) //string

 

 

String.split(RegEx) //array

"Hello World".match(/World/); //Data
"Hello World".search(/World/); //6
"Hello World".replace(/World/, "Everybody!"); //Hello Everybody!
"Hello World".split(/\s/); //["Hello", "World"]

Live code a thing

/end|beginning/igm

Regular Expressions

By Michael Jasper

Regular Expressions

  • 781