ENPM809V

Sandboxing

How did it all start?

Before operating systems, all software ran on bare metal
Operating systems came along
- Separated "User" and "Kernel" space
  - System and process code is different
  - User space is "sandboxed" from accessing the hardware
  - Significantly safer

Further Separation

Virtual Memory
- A hardware mechanism for separate memory space for various processes
In-process separation
- The separation between the interpreter and the interpreted code

Then Browser Hacking...

Vulnerabilities in the browser could wreak havoc on a victim's machine
- 2000's: Adobe Flash, Active X, Java Applets
- 2010's: JavaScript Engine, media codec, imaging library vulnerabilities

Caused the rise of sandoxing!

Anything untrusted should live in a process with zero privileges
- JavaScript, PNGs, PDF's, etc.
How does this work?
- Browser creates "privileged" parent process
- Parent creates "sandboxed" child process for untrusted code
- When a child process needs to perform a "privileged" task, it asks the parent to do it.

Exposing

Anything untrusted should live in a process with zero privileges
- JavaScript, PNGs, PDF's, etc.
How does this work?
- Browser creates "privileged" parent process
- Parent creates "sandboxed" child process for untrusted code
- When a child process needs to perform a "privileged" task, it asks the parent to do it.

Chroot

One of the First Sandboxing Utilities

First appeared in Unix in 1979
Changes the meaning of the root directory for a process
Former Defacto Sandboxing Utility
Example:
- chroot("/tmp/jail") will prevent a process from getting out of /tmp/jail

It does not filter syscalls or any do other isolation

chroot("/tmp/jail") does

Changes "/" to "/tmp/jail" for the process
Cannot go any directory higher than "/tmp/jail"
- "cd /tmp/jail/.." sends someone back to "/tmp/jail"

chroot("/tmp/jail") does not

Close resources that reside outside of the jail
cd into the jail
Do anything else!

Chroot Pitfalls

Previously Open Resources

Can be abused if not closed or invalidated
How can it be abused:
- int openat(int dirfd, char *pathname, int flags)
- int execveat(int dirfd, char *pathname, char **argv, char **envp, int flags)
dirfd can be a file descriptor of a previously opened directory or the special value AT_FDCWD.
- Remember: chroot does not change the current working directory!

User Error

What happens if you chroot again?
- Kernel has no memory of previous chroots for a process!
Giving root permissions inside a chroot process

Missing Other Forms of Isolation

PID
Network
IPC

Replacements

cgroups: a Linux kernel feature that limits, accounts for, and isolates resource usage
- Including memory, CPU, network and IO
namespaces: Linux kernel feature that partitions kernel resources such that only certain processes can see them
seccomp: what we are going to get into next

Namespaces

What are they?

Feature of the Linux Kernel where it partitions resources such that one process only sees one set of resources.
- Other processes with a different namespace has a different set of resources
Used by Docker, Linux Container, and cloud systems underneath the hood.

How do they work?

Isolate by a number factors
- cgroup - as mentioned earlier. Namespaces is a way to manage cgroups, but can also be standalone
- IPC - Isolate which System V IPC objects, Posix messages processes can utilize/receive
- Networrk - isolate network devices, stack, ports
- Mount - group by mount points
- PID - isolate process ID number space meaning different Process ID namespaces can have a process with the same PID
- Time
- User - isolate user ID's and group IDs
- UTS - isolate hostname and the NIS Domain name

How do they work?

Isolate by a number factors
- cgroup - man cgroup_namespaces
- IPC - man ipc_namespaces
- Network - man network_namespaces
- Mount - man mount_namespaces
- PID - man pid_namespaces
- Time - man time_namespaces
- User - man user_namespaces
- UTS - man uts_namespaces

Namespace Systemcalls

clone() - creates a new process
- Has field flag which if set to CREATE_NEW*, creates a new namespace for that process
setns() - allows the calling process to join an existing namespace
unsahre() - moves the calling process to a new namespace
- CLONE_NEW* flags will create new namespaces based on how many CLONE_NEW* flags are set.
ioctl() - Can potentially be used to discover information about namespaces

*Note some of these system calls might be used for more than just namespaces*

/proc/<pid>/ns directory

Each process has a /proc/[pid]/ns/ subdirectory containing one entry for each namespace that supports being manipulated by setns(2):

$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children -> pid:[4026531834]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]

from man namespaces

/proc/<pid>/ns directory

These files were originally hard-links in Linux 3.7 and below
Progressively became soft links starting in Linux 3.8
- All of the sub directories became soft links by 4.12

$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children -> pid:[4026531834]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]

from man namespaces

Global/User Namespace Config

/proc/sys/user directory (Linux 4.9 and up)_ expose limits on number of namespaces of various types that can be created
- They include files like max_ipc_namespaces or max_cgroups_namespaces.
The values are modifiable by privileged processes (root processes)
Limits are per-user and apply to all users (even root user)
- Apply in addition to the per-namespace limits
Upon encountering the limits, the systemcalls will fail with an error of ENOSPC

from man namespaces

Namespace Lifecycle

Barring any other factors, namespaces are automatically torn down when no more processes exist in that namespace
- Can be from termination or from leaving the namespace
Factors that prevent this
- An open file descriptor or a bind mount exists for any file inside of /proc/<pid>/ns
- Namespace is hierarchical has a child namespace in use
- A bunch of others - see man pages for more details

from man namespaces

Resources

https://man7.org/linux/man-pages/man7/namespaces.7.html

Seccomp

What is it?

A computer security feature in the Linux Kernel that allows processes to make a one-way transition to a secured state, limiting the number of syscalls they can execute.

What can it do?

Allow/disallow system calls
Filter allowed/disallowed system calls based on arguments
Have all children have the same rules.

How does it work?

Create a filter using eBPF
- A virtual machine in the kernel for tracing
- Will get into this more later in the course
Enable seccomp by applying the filter
Also has its own API

Can we escape seccomp?

If everything is done correctly, no
- Generally, user error allows us to escape
  - The syscall filter is too permissive
  - Syscall confusion
- Otherwise, a kernel vulnerability must be present
  - Many (tens) of Chrome sandbox escape
  - The kernel functions that certain system calls execute could be vulnerable

Example of Permissive Policies

Developers avoid breaking functioality by allowing too many system calls
- Ptrace and write are common system calls that are abused
- sendmsg() can transfer file descriptors between processes
- process_vm_writev() allows direct access to other process memory
- prctl() - this has many like ptrace()

Example of Permissive Policies

System confusion
- Many 64-bit architectures are backward compatible
  - Can even switch between 32-bit and 64-bit modes
- System call numbers differ between 32-bit and 64-bit architectures.
How to mitigate system confusion
- Prevent policies that allow for both 32-bit and 64-bit system calls
- Ensure that the policies are extremely strict.

Qwerky Ideas

Let's say your goal is data exfiltration (because it often is)
- We can get more information about the process through various system calls/program behavior
  - Runtime of a process (ex. using sleep())
  - Clean termination or a crash?
  - The return value of the program (ex. using exit())
- When does seccomp get called?
  - If the parent enables after calling fork(), the child may not have a seccomp filter.

Demo

References

Class Assignment and Homework

Class Assignment

Classwork 1: You are given a program that is sandboxed by chroot. You will need to break out of the chroot jail.

Classwork 2: You are given a program via seccomp. Figure out a way to add 3-4 rules, block all rules, or whatever. See how it affects getting /flag

Additional Resources

https://itnext.io/chroot-cgroups-and-namespaces-an-overview-37124d995e3d - differences between cgroups, chroot, and namespaces
https://btholt.github.io/complete-intro-to-containers/namespaces - how to build your own containers
https://pwn.college/system-security/sandboxing - much of where today's resources came from, but also has some challenges you can start practicing.
https://www.kernel.org/doc/html/v4.19/userspace-api/seccomp_filter.html - Resources I used to learn about seccomp

ENPM809V - Sandboxing

By Ragnar Security

ENPM809V

How did it all start?

Further Separation

Then Browser Hacking...

Caused the rise of sandoxing!

Exposing

Chroot

One of the First Sandboxing Utilities

It does not filter syscalls or any do other isolation

chroot("/tmp/jail") does

chroot("/tmp/jail") does not

Chroot Pitfalls

Previously Open Resources

User Error

Missing Other Forms of Isolation

Replacements

Namespaces

What are they?

How do they work?

How do they work?

Namespace Systemcalls

/proc/<pid>/ns directory

/proc/<pid>/ns directory

Global/User Namespace Config

Namespace Lifecycle

Resources

Seccomp

What is it?

What can it do?

How does it work?

Can we escape seccomp?

Example of Permissive Policies

Example of Permissive Policies

Qwerky Ideas

Demo

References

Class Assignment and Homework

Class Assignment

Additional Resources

ENPM809V - Sandboxing

More from Ragnar Security