ENPM809V

Sandboxing

How did it all start?

  • Before operating systems, all software ran on bare metal
  • Operating systems came along
    • Separated "User" and "Kernel" space
      • System and process code is different
      • User space is "sandboxed" from accessing the hardware
      • Significantly safer

Further Separation

  • Virtual Memory
    • A hardware mechanism for separate memory space for various processes
  • In-process separation
    • The separation between the interpreter and the interpreted code

Then Browser Hacking...

  • Vulnerabilities in the browser could wreak havoc on a victim's machine
    • 2000's: Adobe Flash, Active X, Java Applets
    • 2010's: JavaScript Engine, media codec, imaging library vulnerabilities

Caused the rise of sandoxing!

  • Anything untrusted should live in a process with zero privileges 
    • JavaScript, PNGs, PDF's, etc. 
  • How does this work?
    • Browser creates "privileged" parent process
    • Parent creates "sandboxed" child process for untrusted code
    • When a child process needs to perform a "privileged" task, it asks the parent to do it. 

Exposing

  • Anything untrusted should live in a process with zero privileges 
    • JavaScript, PNGs, PDF's, etc. 
  • How does this work?
    • Browser creates "privileged" parent process
    • Parent creates "sandboxed" child process for untrusted code
    • When a child process needs to perform a "privileged" task, it asks the parent to do it. 

Chroot

One of the First Sandboxing Utilities

  • First appeared in Unix in 1979
  • Changes the meaning of the root directory for a process
  • Former Defacto Sandboxing Utility
  • Example:
    • chroot("/tmp/jail") will prevent a process from getting out of /tmp/jail

It does not filter syscalls or any do other isolation

chroot("/tmp/jail") does

  • Changes "/" to "/tmp/jail" for the process
  • Cannot go any directory higher than "/tmp/jail"
    • "cd /tmp/jail/.." sends someone back to "/tmp/jail"

chroot("/tmp/jail") does not

  • Close resources that reside outside of the jail
  • cd into the jail 
  • Do anything else!

Chroot Pitfalls

Previously Open Resources

  • Can be abused if not closed or invalidated
  • How can it be abused: 
    • int openat(int dirfd, char *pathname, int flags)
    • int execveat(int dirfd, char *pathname, char **argv, char **envp, int flags)
  • dirfd can be a file descriptor of a previously opened directory or the special value AT_FDCWD.
    • Remember: chroot does not change the current working directory!

User Error

  • What happens if you chroot again?
    • Kernel has no memory of previous chroots for a process!
  • Giving root permissions inside a chroot process

Missing Other Forms of Isolation

  • PID
  • Network
  • IPC 

Replacements

  • cgroups: a Linux kernel feature that limits, accounts for, and isolates resource usage
    • Including memory, CPU, network and IO
  • namespaces: Linux kernel feature that partitions kernel resources such that only certain processes can see them
  • seccomp: what we are going to get into next

Namespaces

What are they?

  • Feature of the Linux Kernel where it partitions resources such that one process only sees one set of resources.
    • Other processes with a different namespace has a different set of resources
  • Used by Docker, Linux Container, and cloud systems underneath the hood.

How do they work?

  • Isolate by a number factors
    • cgroup - as mentioned earlier. Namespaces is a way to manage cgroups, but can also be standalone
    • IPC - Isolate which System V IPC objects, Posix messages processes can utilize/receive
    • Networrk - isolate network devices, stack, ports
    • Mount - group by mount points
    • PID - isolate process ID number space meaning different Process ID namespaces can have a process with the same PID
    • Time
    • User - isolate user ID's and group IDs
    • UTS - isolate hostname and the NIS Domain name

How do they work?

  • Isolate by a number factors
    • cgroup - man cgroup_namespaces
    • IPC - man ipc_namespaces
    • Network - man network_namespaces
    • Mount - man mount_namespaces
    • PID - man pid_namespaces
    • Time - man time_namespaces
    • User - man user_namespaces
    • UTS - man uts_namespaces

Namespace Systemcalls

  • clone() - creates a new process
    • Has field flag which if set to CREATE_NEW*, creates a new namespace for that process
  • setns() - allows the calling process to join an existing namespace
  • unsahre() - moves the calling process to a new namespace
    • CLONE_NEW* flags will create new namespaces based on how many CLONE_NEW* flags are set.
  • ioctl() - Can potentially be used to discover information about namespaces

 

*Note some of these system calls might be used for more than just namespaces*

/proc/<pid>/ns directory

  •  Each process has a /proc/[pid]/ns/ subdirectory containing one entry for  each  namespace that supports being manipulated by setns(2):

 

$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children -> pid:[4026531834]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]

from man namespaces

/proc/<pid>/ns directory

  • These files were originally hard-links in Linux 3.7 and below
  • Progressively became soft links starting in Linux 3.8
    • All of the sub directories became soft links by 4.12
$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children -> pid:[4026531834]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]

from man namespaces

Global/User Namespace Config

  • /proc/sys/user directory (Linux 4.9 and up)_ expose limits on number of namespaces of various types that can be created
    • They include files like max_ipc_namespaces or max_cgroups_namespaces.
  • The values are modifiable by privileged processes (root processes)
  • Limits are per-user and apply to all users (even root user)
    • Apply in addition to the per-namespace limits
  • Upon encountering the limits, the systemcalls will fail with an error of ENOSPC

from man namespaces

Namespace Lifecycle

  • Barring any other factors, namespaces are automatically torn down when no more processes exist in that namespace
    • Can be from termination or from leaving the namespace
  • Factors that prevent this
    • An open file descriptor or a bind mount exists for any file inside of         /proc/<pid>/ns
    • Namespace is hierarchical has a child namespace in use
    • A bunch of others - see man pages for more details

from man namespaces

Resources

Seccomp

What is it?

A computer security feature in the Linux Kernel that allows processes to make a one-way transition to a secured state, limiting the number of syscalls they can execute. 

What can it do?

  • Allow/disallow system calls
  • Filter allowed/disallowed system calls based on arguments
  • Have all children have the same rules. 

How does it work?

  • Create a filter using eBPF
    • A virtual machine in the kernel for tracing
    • Will get into this more later in the course
  • Enable seccomp by applying the filter
  • Also has its own API

Can we escape seccomp?

  • If everything is done correctly, no
    • Generally, user error allows us to escape
      • The syscall filter is too permissive
      • Syscall confusion
    • Otherwise, a kernel vulnerability must be present
      • Many (tens) of Chrome sandbox escape
      • The kernel functions that certain system calls execute could be vulnerable

Example of Permissive Policies

  • Developers avoid breaking functioality by allowing too many system calls
    • Ptrace and write are common system calls that are abused
    • sendmsg() can transfer file descriptors between processes 
    • process_vm_writev() allows direct access to other process memory
    • prctl() - this has many like ptrace()

Example of Permissive Policies

  • System confusion
    • Many 64-bit architectures are backward compatible
      • Can even switch between 32-bit and 64-bit modes
    • System call numbers differ between 32-bit and 64-bit architectures. 
  • How to mitigate system confusion
    • Prevent policies that allow for both 32-bit and 64-bit system calls
    • Ensure that the policies are extremely strict. 

Qwerky Ideas

  • Let's say your goal is data exfiltration (because it often is)
    • We can get more information about the process through various system calls/program behavior
      • Runtime of a process (ex. using sleep())
      • Clean termination or a crash?
      • The return value of the program (ex. using exit())
    • When does seccomp get called?
      • If the parent enables after calling fork(), the child may not have a seccomp filter. 

Demo

References

Class Assignment and Homework

Class Assignment

Classwork 1: You are given a program that is sandboxed by chroot. You will need to break out of the chroot jail.

 

Classwork 2: You are given a program via seccomp. Figure out a way to add 3-4 rules, block all rules, or whatever. See how it affects getting /flag

Additional Resources