RegExp.escape

Domenic Denicola & Benjamin Gruenbaum

https://github.com/benjamingr/RegExp.escape

Overview

  • Present `RegExp.escape`
  • Problem being solved.
  • TC Decisions. 

Regexp.escape

 

RegExp.escape takes a String and returns it "escaped" for RegExp.

 

Lets us build a regular expression out of a string without treating special characters from the string as special regular expression tokens. 

 

let needle = input.value; // "Hello.Friend"

let re = new RegExp(needle); // problem `.`

let haystack = re.exec("HellofFriend"); // matches

Regexp.escape - examples

RegExp.escape("The Quick Brown Fox"); 
// "The Quick Brown Fox"
RegExp.escape("Buy it. use it. break it. fix it.");
// "Buy it\. use it\. break it\. fix it\."
RegExp.escape("(*.*)");
// "\(\*\.\*\)"
RegExp.escape("。^・ェ・^。") 
// "。\^・ェ・\^。"
RegExp.escape("😊 *_* +_+ ... 👍"); 
// "😊 \*_\* \+_\+ \.\.\. 👍"
RegExp.escape("\d \D (?:)"); 
// "\\d \\D \(\?\:\)"

Treat a string literally when part of a  RegExp.

https://esdiscuss.org/topic/regexp-escape

Use Cases

Elsewhere

  • Other languages: Native in PHP, PERL, Python, C#/VB.net, Java,  Ruby. (Research here).
  • Userland: lodash, and polyfill based on previous proposal. Both on board with the proposal.

The Proposal

Where we stand

  • TC needs to choose between escaped set alternatives.
  • Otherwise - everything ready for stage advancement. 
  • Extensive discussion on repo, experts invited and participated from other languages (PHP internals and Python), RegExp experts invited and so on.

Other People involved

  • Elad Kats and Uri shaked - Cross language research.
  • ljharb - Jordan Harband - maintains polyfill, discussion.
  • Allen Wirfs-Brock - identifying cross cutting concerns and spec language help.
  • Denis Pushkarev - core-js polyfill and discussion.
  • ​Martijn Pieters - RegExp expert - proof reading and Python's new re perspective.
  • Nikita Popov and Bob Weinand - PHP internals - proof reading and discussion.
  • C. Scott Ananian - discussion and proof reading.
  • John-David Dalton, Ryan O'Hara, André Bargull, Joshua Appelman, 
    Mariusz Nowak - discussion of escaped set.
  • Mathias Bynens - proof reading and escaped set discussion.

TC Input

  • Choose between 3 alternative escape sets.
  • Progress proposal stage.

Escape Set Options

  • SyntaxCharacter Proposal
  • Safe With Extra Escape Set Proposal
  • Extended Safe

Full escape sets rationale explained here: 

https://github.com/benjamingr/RegExp.escape/blob/master/EscapedChars.md

Proposal rationales explained here:

https://github.com/benjamingr/RegExp.escape/issues/29

Proposals ruled out

  • re`with ${str}` - Template tag (ruled out on esdiscuss,  in an issue, and as a  separate proposal as a worse primitive).
  • RegExp.fromString - Ruled out   in this issue as less pragmatic as it doesn't solve the primary use case well.

SyntaxCharacter proposal

  • Escapes string just enough to be used as a RegExp interpreted literally.
  • That is `new RegExp(RegExp.escape(str)` matches str.
  • Creates readable input

Safe With Extra Escape Set

  • Escapes everything the SyntaxCharacter proposal does.

  • Escapes `-` additionally for context sensitive inside-character-class matching.

  • Escapes hex numeric literals (0-9a-f) at the start of the string in order to avoid hitting matching groups and lookahead/lookbehind control characters.

  • Less readable output but safer.

Extended "Safe" Proposal

  • Escapes everything the previous proposals escape. 
  • Also escapes whitespace and `/`s so `eval` can be used on a `RegExp.escape`d string.
  • Data indicates `eval('/'+regexpStr+'/')` not a used pattern. (Does not exist in top 1M websites, node code bases and exists in less than 5 repos on GH).
  • Safest proposal with least readable output.

Which do we want?

Thank You

Made with Slides.com