ENPM809V
Sandboxing
How did it all start?
- Before operating systems, all software ran on bare metal
- Operating systems came along
- Separated "User" and "Kernel" space
- System and process code is different
- User space is "sandboxed" from accessing the hardware
- Significantly safer
- Separated "User" and "Kernel" space
Further Separation
- Virtual Memory
- A hardware mechanism for separate memory space for various processes
- In-process separation
- The separation between the interpreter and the interpreted code
Then Browser Hacking...
- Vulnerabilities in the browser could wreak havoc on a victim's machine
- 2000's: Adobe Flash, Active X, Java Applets
- 2010's: JavaScript Engine, media codec, imaging library vulnerabilities
Caused the rise of sandoxing!
- Anything untrusted should live in a process with zero privileges
- JavaScript, PNGs, PDF's, etc.
- How does this work?
- Browser creates "privileged" parent process
- Parent creates "sandboxed" child process for untrusted code
- When a child process needs to perform a "privileged" task, it asks the parent to do it.
Exposing
- Anything untrusted should live in a process with zero privileges
- JavaScript, PNGs, PDF's, etc.
- How does this work?
- Browser creates "privileged" parent process
- Parent creates "sandboxed" child process for untrusted code
- When a child process needs to perform a "privileged" task, it asks the parent to do it.
Chroot
One of the First Sandboxing Utilities
- First appeared in Unix in 1979
- Changes the meaning of the root directory for a process
- Former Defacto Sandboxing Utility
- Example:
-
chroot("/tmp/jail")
will prevent a process from getting out of/tmp/jail
-
It does not filter syscalls or any do other isolation
chroot("/tmp/jail") does
- Changes "/" to "/tmp/jail" for the process
- Cannot go any directory higher than "/tmp/jail"
-
"cd /tmp/jail/.."
sends someone back to"/tmp/jail"
-
chroot("/tmp/jail") does not
- Close resources that reside outside of the jail
- cd into the jail
- Do anything else!
Chroot Pitfalls
Previously Open Resources
- Can be abused if not closed or invalidated
- How can it be abused:
- int openat(int dirfd, char *pathname, int flags)
- int execveat(int dirfd, char *pathname, char **argv, char **envp, int flags)
-
dirfd can be a file descriptor of a previously opened directory or the special value AT_FDCWD.
- Remember: chroot does not change the current working directory!
User Error
- What happens if you chroot again?
- Kernel has no memory of previous chroots for a process!
- Giving root permissions inside a chroot process
Missing Other Forms of Isolation
- PID
- Network
- IPC
Replacements
- cgroups: a Linux kernel feature that limits, accounts for, and isolates resource usage
- Including memory, CPU, network and IO
- namespaces: Linux kernel feature that partitions kernel resources such that only certain processes can see them
- seccomp: what we are going to get into next
Namespaces
What are they?
- Feature of the Linux Kernel where it partitions resources such that one process only sees one set of resources.
- Other processes with a different namespace has a different set of resources
- Used by Docker, Linux Container, and cloud systems underneath the hood.
How do they work?
- Isolate by a number factors
- cgroup - as mentioned earlier. Namespaces is a way to manage cgroups, but can also be standalone
- IPC - Isolate which System V IPC objects, Posix messages processes can utilize/receive
- Networrk - isolate network devices, stack, ports
- Mount - group by mount points
- PID - isolate process ID number space meaning different Process ID namespaces can have a process with the same PID
- Time
- User - isolate user ID's and group IDs
- UTS - isolate hostname and the NIS Domain name
How do they work?
- Isolate by a number factors
- cgroup - man cgroup_namespaces
- IPC - man ipc_namespaces
- Network - man network_namespaces
- Mount - man mount_namespaces
- PID - man pid_namespaces
- Time - man time_namespaces
- User - man user_namespaces
- UTS - man uts_namespaces
Namespace Systemcalls
- clone() - creates a new process
- Has field flag which if set to CREATE_NEW*, creates a new namespace for that process
- setns() - allows the calling process to join an existing namespace
- unsahre() - moves the calling process to a new namespace
- CLONE_NEW* flags will create new namespaces based on how many CLONE_NEW* flags are set.
- ioctl() - Can potentially be used to discover information about namespaces
*Note some of these system calls might be used for more than just namespaces*
/proc/<pid>/ns directory
- Each process has a /proc/[pid]/ns/ subdirectory containing one entry for each namespace that supports being manipulated by setns(2):
$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children -> pid:[4026531834]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]
from man namespaces
/proc/<pid>/ns directory
- These files were originally hard-links in Linux 3.7 and below
- Progressively became soft links starting in Linux 3.8
- All of the sub directories became soft links by 4.12
$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children -> pid:[4026531834]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]
from man namespaces
Global/User Namespace Config
- /proc/sys/user directory (Linux 4.9 and up)_ expose limits on number of namespaces of various types that can be created
- They include files like max_ipc_namespaces or max_cgroups_namespaces.
- The values are modifiable by privileged processes (root processes)
- Limits are per-user and apply to all users (even root user)
- Apply in addition to the per-namespace limits
- Upon encountering the limits, the systemcalls will fail with an error of ENOSPC
from man namespaces
Namespace Lifecycle
- Barring any other factors, namespaces are automatically torn down when no more processes exist in that namespace
- Can be from termination or from leaving the namespace
- Factors that prevent this
- An open file descriptor or a bind mount exists for any file inside of /proc/<pid>/ns
- Namespace is hierarchical has a child namespace in use
- A bunch of others - see man pages for more details
from man namespaces
Resources
Seccomp
What is it?
A computer security feature in the Linux Kernel that allows processes to make a one-way transition to a secured state, limiting the number of syscalls they can execute.
What can it do?
- Allow/disallow system calls
- Filter allowed/disallowed system calls based on arguments
- Have all children have the same rules.
How does it work?
- Create a filter using eBPF
- A virtual machine in the kernel for tracing
- Will get into this more later in the course
- Enable seccomp by applying the filter
- Also has its own API
Can we escape seccomp?
- If everything is done correctly, no
- Generally, user error allows us to escape
- The syscall filter is too permissive
- Syscall confusion
- Otherwise, a kernel vulnerability must be present
- Many (tens) of Chrome sandbox escape
- The kernel functions that certain system calls execute could be vulnerable
- Generally, user error allows us to escape
Example of Permissive Policies
- Developers avoid breaking functioality by allowing too many system calls
- Ptrace and write are common system calls that are abused
- sendmsg() can transfer file descriptors between processes
- process_vm_writev() allows direct access to other process memory
- prctl() - this has many like ptrace()
Example of Permissive Policies
- System confusion
- Many 64-bit architectures are backward compatible
- Can even switch between 32-bit and 64-bit modes
- System call numbers differ between 32-bit and 64-bit architectures.
- Many 64-bit architectures are backward compatible
- How to mitigate system confusion
- Prevent policies that allow for both 32-bit and 64-bit system calls
- Ensure that the policies are extremely strict.
Qwerky Ideas
- Let's say your goal is data exfiltration (because it often is)
- We can get more information about the process through various system calls/program behavior
- Runtime of a process (ex. using sleep())
- Clean termination or a crash?
- The return value of the program (ex. using exit())
- When does seccomp get called?
- If the parent enables after calling fork(), the child may not have a seccomp filter.
- We can get more information about the process through various system calls/program behavior
Demo
References
Class Assignment and Homework
Class Assignment
Classwork 1: You are given a program that is sandboxed by chroot. You will need to break out of the chroot jail.
Classwork 2: You are given a program via seccomp. Figure out a way to add 3-4 rules, block all rules, or whatever. See how it affects getting /flag
Additional Resources
- https://itnext.io/chroot-cgroups-and-namespaces-an-overview-37124d995e3d - differences between cgroups, chroot, and namespaces
- https://btholt.github.io/complete-intro-to-containers/namespaces - how to build your own containers
- https://pwn.college/system-security/sandboxing - much of where today's resources came from, but also has some challenges you can start practicing.
- https://www.kernel.org/doc/html/v4.19/userspace-api/seccomp_filter.html - Resources I used to learn about seccomp
ENPM809V - Sandboxing
By Ragnar Security
ENPM809V - Sandboxing
- 84