SmotherSpectre

Exploiting Speculative Execution through Port Contention

The 2018 Shock

Meltdown Leaks Kernel Memory

Spectre leaks arbitrary memory outside bounds and even sandboxes

Detour to a quick crash course on Meltdown and Spectre

Genesis

There are two key steps in Meltdown & Spectre
- Source of the attack (OoO andSpeculative Execution)
- Medium of leak (predominantly Caches)
Umpteen works on sealing the medium of leakage (Caches, BPU, etc.)
But how do we know there are only a finite number of mediums? And sealing a medium kills the attack?
This work shows that there's yet another medium that's the source of leakage
(What does it tell us?) - It might be more worthwhile to go behind the source than behind the medium

Another disadvantage with Cache side channels

Consider Spectre V2 - Branch Target injection
Pollute BTB - speculate target to branch to a gadget code and execute it (gadget code with loads from critical memory)
But finding a gadget code that gets the data to the cache is very hard except in a few cases such eBPF (Berkeley's Packet Filter)
Hence, costly solutions like Retpoline (indirect branches don't speculate) is not applied widely
However, the current work claims to make V2 more practical - since we are not limited to load gadgets alone!

Idea

SMoTher

The attack is based on the following observation
- Two SMT threads share Execution units
- Instructions that are scheduled to execute on the same port will contend

SMoTher Differentiability

Let one SMT thread run a set of instructions \( V = \{V0, V1, V2,...\} \)
Attacker runs a set of instructions A
If Attacker can infer some \( V_i \in V\) the sequences in V are said to be SMoTher Differentiable

Victim: Either popcnt (port 1) or ror (port 6)

Attacker: Contending and timing port 1

Victim: Either cmovz (port 6) or popcnt (port 1)

Attacker: Contending and timing port 6 using bts

Pitfalls

Synchronization of attacker and victim is crucial. If the code sequence is short, this is hard to achieve
Pipeline bottlenecks other than port contention can overshadow the side channel (e.g. RAW hazards)
The CPU may eliminate the execution of some instructions (zero idioms). This removes contention
Some instructions (e.g. SSE and AVX) are subject to aggressive power-saving features on modern CPUs. This makes measuring very difficult

SMoTherSpectre

Leverages BTI to poison BPU making one SMT thread jump to an address upon an indirect jump to execute a code with SMoTher Differentiable code
Unlike cache side channels, you don't need a measurable change in the micro-architectural state here
Apparently, SMoTher differentiable gadgets are easy to find unlike cache load gadgets
libcrypto in OpenSSL has 12,000 such gadgets readily available

Key Assumptions

Gadgets

There are two types of gadgets that's required to make SMoTherSpectre successful i.e. BTI gadget and SMoTher gadget

BTI Gadgets

Pass a secret through a register to an arbitrary code in the same process
Ideal BTI targets are virtual function calls in C++ that happens through a vtable and dynamically linked ELF calls that will go through GOT
If we evict vtable and GOT from cache, it gives around ~200 cycles to reliably mount the attack

SMoTher Gadgets

The gadget is either a part of the victim or an additional attack vector by the attacker
The gadget consists of,
- Instruction comparing the secret register to a known value
- Followed by a conditional-flow transfer depending on the comparison
- The instructions should have distinct port fingerprint
  - Port fingerprinting is the task of ranking two instruction sequences based on their port utilization
- Instructions should not have other memory references to caches - that will inject noise

Real World OpenSSH attack

This calls bzero function using a volatile pointer

Chosen BTI gadget

This ins checks if 8th bit is set in address location within (rdi) - secret

This code times the addl and checks for port contention with sub

Secret

Mitigations

Disable SMT entirely (15% overhead on Intel machines)
OS can employ side-channel aware strategy i.e. colocate only threads from the same user on same SMT threads
Can explore the utilities of coarse-grained or interleaved Multithreading as viable alternatives for SMT
Figure out a solution for BTI (such as Retpoline)
- But it should have very low overheads (<Disabling SMT obviously)
- However SMoTherspectre can also employ other Spectre Variants (e.g. RSB overflow) to mount the attack