saschagrunert
mail@
.de
First talk scoped to Linux kernel related topics
Isolated groups of processes running on a single host, which fulfill a set of “common” features.
Change the root directory of the current running process
(and its children)
CAP_SYS_CHROOT
capability neededprefferred to chroot
nowadays
separates old mounts into dedicated directory
rootfs needed for more useful jails
can be extracted from existing container image
wrap certain global system resources in an abstraction layer
Linux version 3.8 in 2013 made namespaces “container ready”
Seven distinct namespaces available: mnt
, pid
, net
, ipc
, uts
, user
and cgroup
three main system calls
/proc/$PID/ns
contains symbolic links to namespaces
https://github.com/karelzak/util-linux
contains dedicated wrapper programs for the mentioned syscalls, like
lsns
isolate a set of mount points by a group of processes
CLONE_NEWNS
memory resides in Virtual File System (VFS)
namespace gets destroyed: memory is unrecoverable lost
keep a file handle on /proc/$PID/ns/mnt
create flexible container filesystem trees
Great read:
shared subtree documentation of the Linux kernel
https://www.kernel.org/doc/Documentation/filesystems/
sharedsubtree.txt
unshare the domain- and hostname from the current host system
CLONE_NEWUTS
isolate interprocess communication resources:
System V IPC objects and POSIX message queues
CLONE_NEWIPC
gives processes an independent set of process identifiers (PIDs)
CLONE_NEWPID
virtualize the network stack
CLONE_NEWNET
each namespace contains own resource properties within /proc/net
contains only a loopback interface on initial creation
interfaces can be moved between namespaces
private set of IP addresses, own routing table, socket listing, connection tracking table, firewall, …
isolation of user and group IDs
since Linux 3.8 without being fully privileged
CLONE_NEWUSER
Use case: unprivileged user outside a namespace while being fully privileged inside
/proc/$PID/{u,g}id_map
expose mappings for user and group IDs0
inside
outside
length
1000
1
resource limiting, prioritization, accounting and controlling
initial implementation in 2008
major redesign started from 2013
CLONE_NEWCGROUP
makes “containers“ possible
use runc
https://github.com/opencontainers/runc
to run a container from the extracted rootfs
Linux has great isolation techniques built in and a container runtime uses all these isolation features
NAMESPACES(7)
http://man7.org/linux/man-pages/man7/namespaces.7.html
Future topics: runtimes, security, images and orchestration
https://github.com/
saschagrunert/demystifying-containers