Last time, we learned the minimum required to use regular expression.
One thing I should have mentioned if you're still intimidated
Remember the first time you looked at code? Did it look like a giant pile of text without any meaning?
Well, you can learn to read/write regexes as you learned to code!
Regular expressions are not only used for matching, but replacing is a very important feature too. Let's have an example.
https://regex101.com/r/QyDVie/1
Rule of thumb: () will create a capturing group, and what is "captured" can be reused in a "replacing expression" by referencing to it via $[group number]
Named grouping (?<name>.*)
"Ignored" grouping (if you want to apply a quantifier but not use the group) (?:
https://regex101.com/r/NNf0Ns/1
Reuse capture in matching 🤯
https://regex101.com/r/MiW0fF/1
Last time we learnt about the [] keyword, that allows to match a range of characters.
There are some handy shortcuts for some of those:
\b matches "word boundary" - it matches beginning or end of word, similar to ^ and $ for lines
\s and \S matches any space character and any non-space character respectively
\d and \D are for digits,
\w and \W are for "word" characters, i.e. [0-9a-zA-Z]
Quantifiers are the *, +, ? and {} characters as seen last session.
Their default behavior is to be greedy; they'll try to match as many characters as possible
Appending ? to a quantifier will make it non-greedy (or lazy)
Greedy (default)
Lazy (append ?)
And a somewhat practical use case:
https://regex101.com/r/jFGSyi/3
Let's see how each quantifier behave with the lazy flag:
https://regex101.com/r/LQ0tYW/1
Lookaround allows you to match based on context, without actually matching the context. Of course this is particularly useful if you want to use the matches.
Positive lookahead (?= mean "that is followed with"
https://regex101.com/r/fid3Dr/4
Negative lookahead (?! mean "that is NOT followed with"
Positive lookbehind (?<= mean "that is preceded with"
https://regex101.com/r/VdAeWG/1
Negative lookbehind (?<! mean "that is NOT preceded with"
Lookbehind usually comes with a performance cost as more steps are required by the parser.
Patterns can be "configured" to be case insensitive, allow matching of new lines, ignore spaces, ...
This varies a fair bit by language, here's a reference for Java
And Javascript
https://www.codeguage.com/courses/regexp/flags
Note that Javascript doesn't support the "comment" flag - you can build your regex using standard string concatenation to add comments.