Overview
- Present `RegExp.escape`
- Problem being solved.
- TC Decisions.
Regexp.escape
RegExp.escape takes a String and returns it "escaped" for RegExp.
Lets us build a regular expression out of a string without treating special characters from the string as special regular expression tokens.
let needle = input.value; // "Hello.Friend"
let re = new RegExp(needle); // problem `.`
let haystack = re.exec("HellofFriend"); // matches
Regexp.escape - examples
RegExp.escape("The Quick Brown Fox");
// "The Quick Brown Fox"
RegExp.escape("Buy it. use it. break it. fix it.");
// "Buy it\. use it\. break it\. fix it\."
RegExp.escape("(*.*)");
// "\(\*\.\*\)"
RegExp.escape("。^・ェ・^。")
// "。\^・ェ・\^。"
RegExp.escape("😊 *_* +_+ ... 👍");
// "😊 \*_\* \+_\+ \.\.\. 👍"
RegExp.escape("\d \D (?:)");
// "\\d \\D \(\?\:\)"
Treat a string literally when part of a RegExp.
Use Cases
- Primary: Building dynamic RegExp for text search using user input.
- Also: Routers (push/popState) use this. Node routers too.
- More: https://esdiscuss.org/topic/regexp-escape
Elsewhere
- Other languages: Native in PHP, PERL, Python, C#/VB.net, Java, Ruby. (Research here).
- Userland: lodash, and polyfill based on previous proposal. Both on board with the proposal.
The Proposal
- Add `escape` method to the RegExp object that escapes "special characters".
- More on which characters are special later.
- In spec language: http://benjamingr.github.io/RegExp.escape/
Where we stand
- TC needs to choose between escaped set alternatives.
- Otherwise - everything ready for stage advancement.
- Extensive discussion on repo, experts invited and participated from other languages (PHP internals and Python), RegExp experts invited and so on.
Other People involved
- Elad Kats and Uri shaked - Cross language research.
- ljharb - Jordan Harband - maintains polyfill, discussion.
- Allen Wirfs-Brock - identifying cross cutting concerns and spec language help.
- Denis Pushkarev - core-js polyfill and discussion.
- Martijn Pieters - RegExp expert - proof reading and Python's new re perspective.
- Nikita Popov and Bob Weinand - PHP internals - proof reading and discussion.
- C. Scott Ananian - discussion and proof reading.
-
John-David Dalton, Ryan O'Hara, André Bargull, Joshua Appelman,
Mariusz Nowak - discussion of escaped set. - Mathias Bynens - proof reading and escaped set discussion.
TC Input
- Choose between 3 alternative escape sets.
- Progress proposal stage.
Escape Set Options
- SyntaxCharacter Proposal
- Safe With Extra Escape Set Proposal
- Extended Safe
Full escape sets rationale explained here:
https://github.com/benjamingr/RegExp.escape/blob/master/EscapedChars.md
Proposal rationales explained here:
Proposals ruled out
- re`with ${str}` - Template tag (ruled out on esdiscuss, in an issue, and as a separate proposal as a worse primitive).
- RegExp.fromString - Ruled out in this issue as less pragmatic as it doesn't solve the primary use case well.
SyntaxCharacter proposal
- Escapes string just enough to be used as a RegExp interpreted literally.
- That is `new RegExp(RegExp.escape(str)` matches str.
- Creates readable input
Safe With Extra Escape Set
-
Escapes everything the SyntaxCharacter proposal does.
-
Escapes `-` additionally for context sensitive inside-character-class matching.
-
Escapes hex numeric literals (0-9a-f) at the start of the string in order to avoid hitting matching groups and lookahead/lookbehind control characters.
-
Less readable output but safer.
Extended "Safe" Proposal
- Escapes everything the previous proposals escape.
- Also escapes whitespace and `/`s so `eval` can be used on a `RegExp.escape`d string.
- Data indicates `eval('/'+regexpStr+'/')` not a used pattern. (Does not exist in top 1M websites, node code bases and exists in less than 5 repos on GH).
- Safest proposal with least readable output.
Which do we want?
Thank You
RegExp.escape
By Benjamin Gruenbaum
RegExp.escape
- 3,181