Trojan Source Code

Can we trust open-source anymore?

(In Python perspective)

What's wrong?

hello = "Hello world"
world = hello
print(world)
hello = "Hello world"
world = hell‮o
print(world)

Unicode is weird

...and it provides good lightning talk materials...

import pandas as 🐼🐼

... but it's more than just fun

  • help handle the worlds writing system
  • using functional characters
  • 1st version dated back to 1991
  • the most popular is UTF-8
  • modern emojis mainly added in 2014

👍🏻

Bidirectional text

Some languages are not written from left to right

 

So Unicode has to address this

Characters to format the direction of the flow of text

Result?

 

Code could be interpreted or compiled differently than what appears to be

If we use them in code...

Why it creates a problem

Thus... create Trojan Source Malware

Types of Trojan Source Code

  • Early Returns
  • Commenting-Out
  • Stretched Strings

Early Returns:
force function to terminate early

What it looks like

What it actually is

Commenting-Out: code appears to be executed are secretly commented out

What it looks like

What it actually is

Stretched Strings: text that appears to be not in the string are actuary part of it

What it looks like

What it actually is

Similar exploitations

  • Invisible Character
  • Homoglyphs

What works in Python

Durm rolls...

Early Return

As shown before

Commenting-Out

  • Confirmed working on Python 3.9.5 (on MacOS via clang 12.0.0)
  • Confirmed working on Python 3.7.10 (on Ubuntu via GCC 7.5.0)

Invisible Function

Does not work on Python 3.9.5 (on MacOS via clang 12.0.0): Throws Syntax Error

Homoglyph Function

  • Confirmed working on Python 3.9.5 (on MacOS via clang 12.0.0)
  • Confirmed working on Python 3.7.10 (on Ubuntu via GCC 7.5.0)

What should we do?

Ban the use of text directionality control characters?

Use a text editor with good syntax highlighting?

Scan the invisible characters and directionality control characters?

Look before running the code!!!

Good linter - e.g. Pylint

We ❤️

Unicode and emojis

But we will also be careful with codes from unidentifiable sources

Before we go...

Trojan Source

By Cheuk Ting Ho

Trojan Source

  • 639