/(Every|some)thing you wanted to know about RegEx/
"Regular Expression"
"Regex"
"Regexp"
Rules for the structure of the language, and how that translates to MEANING
Rules for more nuanced differentiation in meaning:
active vs passive
declarative vs imperative
Grammar is the difference between "Dog bites man" and "Man bites dog."
subject verb object
I have.
IC = S, V
I have the high ground.
, O*
Many spoken languages are not regular, because humans.
Sets of rules (grammars) which defined what keywords, expressions, characters can appear in what order, and what the meaning is of those things.
An expression is anything that can be on the right of an equals sign.
const a = b + 3
expression
Exp =:: Exp
Exp =:: Exp + Exp
Exp =:: Exp - Exp
Exp =:: (Exp)
Exp =:: Exp && Exp
Exp =:: Exp || Exp
Exp =:: ! Exp
Any computer program can be modeled as a "state machine", shown as a graph with edges and nodes
Input to a state machine moves the current state from Start to End
ab
a*
a|b
"abc"
"abc...c"
"ab"
End
A regular expression is a definition of a language grammar
State machines validate input for a regular expressions
Valid input == part of a language
Pattern:
a set of characters which describe possible matches (or rather, described the grammar of a language which words may or may not be a part of)
Flags:
describe the way that the pattern should be applied to possible matches
global
ignore case
multi-line
Characters:
character literals on which to match
Metacharacters:
provide instructions on how to interpret characters
"no"
/no/
"nooo"
/nooo/
"nooo"
/no{3}/
"no...o"
/no{3,10}/
"no..."
/no{3,}/
"no..."
/no{1,}/
"n..."
/no*/
quantifier
any
"n..."
/na{0,}/
"no..."
/no+/
one or more
"abc"
/abc/
"abc"
/[abc]*/
"abcd"
/[a-d]*/
"abYZ"
/[a-zA-Z]*/
"aB123"
/[a-zA-Z0-9]*/
"aB123"
/\w*/
set
word character
set and range
"a3ç∂eƒ"
/[a-zA-Z0-0\W\s]*/
"a3ç∂eƒ"
/.*/
any character
quantifiers
{n,n}
* any of
+ one or more
? zero or one
sets, ranges
[n-n]
\w alphanumeric
\d digit
\W symbol
grouping
(nn)
(nn) capture
(?=nn) non-capture
^ start of word
$ end of word
| or
"Dexter Jettster"
"Dexter"
Regular expressions by default will match the longest string possible
"<p>TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡</p>"
match as few characters as possible
"<p>TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡</p>"
Should validate these:
$1 $1.00 $1,000 $1,000,000.00
And invalidate these
$1, $1,00 $11,00
Version: 1
- A currency symbol
- One or more digits
- Zero or one:
- period followed by 2 digits
$1
$1.00
$1,000
$1,000.00
$0.10
$1
$1.10
$1,000
$1,000.00
$100,000,000
$0 .10
$1
$1 .10
$1 ,000
$1 ,000 .00
$111 ,000,000
$1,
$1,00
1.00
$11,00
$01,000
$00,000
$1 ,
$1 ,00
1 .00
$11 ,00
$01 ,000
$000 ,000
Currency Symbol
1-3 digits
(no leading 0 unless it is the only character)
0 or more groups of a comma and 3 digits
0 or 1 of a period and 2 digits
Currency Symbol
1-3 digits (no leading zeros)
0 or more: groups of a comma and 3 digits
0 or 1 of: a period and 1 or 2 digits
\$(0|[1-9][0-9]{0,2})
(,\d{3})*
(\.\d{1,2})?
Version: 2
- implement 3 rules
- add start and end characters
$1
$1.00
$1,000
$1,000.00
$1,
$1,00
$01.00
$00,000.00
Version: 1
1 or 2 digits
a slash
1 or 2 digits
a slash
4 digits
allows:
1/1/2018
11/12/2018
99/99/9999
Version: 2
A group of
- "0" and a single 1-9 digit
- OR a "1" and single 0-2 digit
A slash
A group of
- an optional "0" and a single 1-9 digit
- OR "12"
A slash
4 digits
allows:
1/1/2018
11/12/2018
Fails:
99/99/2018
11/12/0000
211/12/20189
Version: 3
Start of word
A group of
- "0" and a single 1-9 digit
- OR a "1" and single 0-2 digit
A slash
A group of
- an optional "0" and a single 1-9 digit
- OR "12"
A slash
"19" or "20 and any 2 digits
End of word
allows:
1/1/2018
11/12/2018
Fails:
99/99/2018
11/12/0000
211/12/20189
^(((0[1-9]|[12][0-9]|3[01])[- /.](0[13578]|1[02])|(0[1-9]|[12][0-9]|30)[- /.](0[469]|11)|(0[1-9]|1\d|2[0-8])[- /.]02)[- /.]\d{4}|29[- /.]02[- /.](\d{2}(0[48]|[2468][048]|[13579][26])|([02468][048]|[1359][26])00))$
A word character, hyphen, underscore, or period
an @ symbol
1 or more word characters
a period
between 2 to 5 word characters
international TLD
like .co.uk
an ip address instead of a domain
spaces inside quotation marks
comments inside parenthesis (WHAT?)
international characters or 😀.com
Uhhh...
/.+@.+/
/b?eg?i?n?ni?n?g?d?/
"end"
"beginning"
usd currency parsing https://regexr.com/3ivk1
metacharacters https://help.relativity.com/9.0/Content/Relativity/Regular_expressions/Regular_expression_metacharacters.htm
RFC Email spec http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
regular languages https://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html
lea verou regex talk https://www.youtube.com/watch?v=EkluES9Rvak
greedy vs lazy https://www.regular-expressions.info/repeat.html
parsing regular expressions http://matt.might.net/articles/parsing-regex-with-recursive-descent/