Trojan Source Code
Can we trust open-source anymore?
Cheuk Ting Ho
(In Python perspective)
What's wrong?
hello = "Hello world"
world = hello
print(world)
hello = "Hello world"
world = hello
print(world)
Unicode is weird
...and it provides good lightning talk materials...
import pandas as 🐼🐼
... but it's more than just fun
- help handle the worlds writing system
- using functional characters
- 1st version dated back to 1991
- the most popular is UTF-8
- modern emojis mainly added in 2014
👍🏻
Bidirectional text
Some languages are not written from left to right
So Unicode has to address this
Characters to format the direction of the flow of text
Result?
Code could be interpreted or compiled differently than what appears to be
If we use them in code...
Why it creates a problem
Thus... create Trojan Source Malware
Types of Trojan Source Code
- Early Returns
- Commenting-Out
- Stretched Strings
Early Returns:
force function to terminate early
What it looks like
What it actually is
Commenting-Out: code appears to be executed are secretly commented out
What it looks like
What it actually is
Stretched Strings: text that appears to be not in the string are actuary part of it
What it looks like
What it actually is
Similar exploitations
- Invisible Character
- Homoglyphs
What works in Python
Durm rolls...
Early Return
As shown before
Commenting-Out
- Confirmed working on Python 3.9.5 (on MacOS via clang 12.0.0)
- Confirmed working on Python 3.7.10 (on Ubuntu via GCC 7.5.0)
Invisible Function
Does not work on Python 3.9.5 (on MacOS via clang 12.0.0): Throws Syntax Error
Homoglyph Function
- Confirmed working on Python 3.9.5 (on MacOS via clang 12.0.0)
- Confirmed working on Python 3.7.10 (on Ubuntu via GCC 7.5.0)
What should we do?
Ban the use of text directionality control characters?
Use a text editor with good syntax highlighting?
Scan the invisible characters and directionality control characters?
Look before running the code!!!
Good linter - e.g. Pylint
We ❤️
Unicode and emojis
But we will also be careful with codes from unidentifiable sources
Dive deeper at https://trojansource.codes/