ENPM809V

Linux Kernel Internals - Part 1

Some Resources to Look At

Bootlin Elixir - Contains the source code
- Will be referenced in the slides
Userspace Documentation
- Some concepts are very similar (especially during synchronization)

What we will be learning

Linux Kernel Fundamentals
Linux Kernel Modules
System Calls in the Kernel
Interrupt Handling
Kernel Threads

Linux Kernel Fundamentals

What is the Kenrel?

Code in the operating system that interfaces between hardware and higher-level applications.
The Linux kernel is a free-open source operating system in Linux Distributions
- Modular, monolithic, multitasking, Unix-Like

Application

System Call Interface/Interrupt Handling

Kernel Subsystem

Device Drivers

Application

System Call Interface/Interrupt Handling

Kernel Subsystem

Device Drivers

x86 Protection Rings

An protection mechanism in x86_64 CPUs to prevent unauthorized access to the kernel.
3 protection rings (but mostly use level 0 and 3)
- Level 0 = Kernel and Drivers
- Level 3 = Applications

x86 Protection Rings

At ring 3, the CPU can
- Use most x86 instructions
- Access unprivileged memory
At ring 0, the CPU can
- Do almost everything at ring 3
- Access Privileged memory
- Use Special instructions

Switching Protection Rings

Userspace programs can ask the kernel to execute something through a few vectors:
- System calls - occurs by calls directly from userspace applications
- Interrupts - occurs indirectly through the use of instruction that cause exceptional conditions

System Calls

A userspace program executes the syscall instruction
How does this happen?
1. The address of the instruction following the syscall is placed in to RXC
2. RIP is now the Kernel's System call handler
  - Provided by the OS at boot time
  - Generally stored in the LSTAR register on x86 machines
3. Ring level is set to 0 (CPL)
After the kernel finishes, RIP is set to whatever is in RCX, transferred back to ring 3

Privileged Instructions

Ring 0 Code has access to privileged instructions
- Reacts to how the system reacts to interrupts/exceptions
  - LIDT - Load Interrupt Descriptor Table Register
  - LLDT - Load Local Descriptor Table
  - LGDT - Load Global Descriptor Table Register
  - LTR - Load Task Register
- Reading/Writing Mahcine-specific registers
  - RDMSR, WRMSR
- Virtual machine opcodes
  - VMCALL, VMLAUNCH, VMRESUME, VMXON, VMXOFF
- Others too...

Kernel Data Structures

Many Many Structures

Structures contain data for the majority of kernel data
- Tasks
- Kthreads
- Audit
- Files

Many Many Structures

Tend to be generalized so that it can be applied anywhere without sacrificing performance
- Linked lists - /include/linux/list.h
- Queues - /include/linux/kfifo.h
- Hash maps - /include/linux/hashtable.h
- Radix trees - /include/linux/generic-radix-tree.h
- RB trees - /include/linux/rbtree.h

Slightly Different Than Traditional Datastructures

DATA

Slightly Different Than Traditional Datastructures

DATA

typedef struct list_head 
{
    struct list_head *prev;
    struct list_head *next;
};

struct some_other_struct
{
    char *data1;
    int data2;
    struct list_head *head;
}

https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch10s05.html

Embedding Structures

typedef struct example_struct 
{
    struct example_struct *prev;
    struct example_struct *next;
};

struct some_other_struct
{
    char *data1;
    int data2;
    struct example_struct *head;
};

Embedding structures is quite common in the Linux Kernel
- task -> file
- task -> audit
- task -> another task
Structures can also be randomized
- Security Feature __randomize_layout

Struture Randomization

typedef struct example_struct 
{
    struct example_struct *prev;
    struct example_struct *next;
};

struct some_other_struct
{
    char *data1;
    int data2;
    struct example_struct *head;
} __randomize_struct;

Many structures are randomized at compile time
- Difficult to attack based on offset
- This is where macros come in

offsetof()

Finds the offset of a member given a structure type
This is defined as a standard part of C

#define offsetof(a,b) ((int)(&(((a*)(0))->b)))

container_of

#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) ); })

Built-in Macro to determine who the parent structure is
Takes a pointer to the member (child) structure
- Subtracts the pointer of the member to the offset it is located in the parent definition.
- End result = address to parent

task_struct

The task_struct is used to manage tasks
- A task is the kernel's way of managing processes/execution context
Contains MANY data fields ranging from memory, CPU Usage, or other data structures (such as the audit_context)
Also contains information like UID and EUID
Access the task_struct of the currently running process by using the macro - current
Located in /include/linux/shed.h

Scheduler Classes

What is it?

A way for the Linux Kernel to manage task execution
Module - allowing different algorithms to operate a scheduler
Each scheduler class runs a different type of process/task
Base implementation: /kernel/sched/core.c
It is tracked in the task_struct - sched_entity field

Completely Fair Scheduler

Responsible for scheduling processes of normal priority
Provides processes with a proportion of CPU time
Aims to maximize overall CPU time
Implemented based on per-CPU run queues
- Nodes are ordered in a time-based manner
- Kept sorted by red-black trees

Red-Black Trees in Completely Fair Scheduler

Objective: Keep track of how long a process has been running (part of the completely-fair algorithm)
- Red-Black Trees are binary search trees, but totally balanced
- Tracked in nano-seconds by vruntime field
How it is performed
- Insert tasks into the tree based on vruntime
- Pick the one with the smallest vruntime
- During context switching, update the vruntime (increasing it by the time elapsed)
  - Put it back into the tree

https://www.geeksforgeeks.org/introduction-to-red-black-tree/

Red-Black Trees

How is it vrruntime calculated?

New Tasks - newvruntime = minimum_vruntime
After execution
- newvruntime = time_elapsed * niceness
- Niceness is based on priority

How is it invoked?

/kernel/core/sched.c
schedule - the main function
- Chooses what task to run and performs context switching
- Also updates vruntime
Can be invoked in a few ways
- update_process_times
- Kernel Threads/Drivers calling the schedule function
- Preemptively by the kernel
- Being called explicitly

Kernel Threads

What are they?

Kernel threads are tasks. As such they run in their own context
API can be found in /include/linux/kthread.h
- Has functions like kthread_create
Kernel threads can only be created by other kernel threads
We can track kernel threads through the task_struct
- Can you figure out how/why?

API Calls

kthread_create - creates a new kernel thread
wake_up_process - start a kernel thread (or other task)
do_exit - terminate a kernel thread
kthread_stop - Flag the kernel thread that it should stop
- It will wake up a sleeping kthread if necessary to set the flag
kthread_should_stop - check to see if the kernel thread should stop
allow_signal - indicates that the particular kthread can recieve the indicated signal
set_current_state - sets the state (TASK_INTERRUPTABLE) makes it interruptable
schedule/ssleep - give up the CPU

Synchronization

What the Kernel Proides

Wait Queues - FIFO based on sleep
Completionn Variables - Sleep until a certain condition is met
Spinlocks - Very similar to POSIX Spinlocks
- If you don't know what it is man pthread_spin_lock
Semaphores - Similar to POSIX Semaphores
- man sem_overview
Atomic Operations
Mutexes - Similar to POSIX Mutexes

What the Kernel Proides

Wait Queues - /include/linux/wait.h
Completion Variables - include/linux/completion
Spinlocks - /include/linux/spinlock.h
Semaphores - /include/linux/semaphore.h
Atomic Operations
- /include/linux/types.h (for types)
- /include/asm-generic/atomic-instrumented (operations)
Mutexes - /include/linux/mutex.h
We are not going to go over these in depth, you need to do your homework on this.

Interrupts

x86 Interrupt Handling

Interrupt: A "signal" that stops the current process as it is and does something else.
- Identified by an interrupt vector number (between 0 and 256)
- Can be software and hardware based
Hardware Interrupts managed by the Advanced Programmable Interrupt Controller (APIC)
- Programmable interrupt controllers developed by Intel
- Receives a signal from hardware device - says something needs to be done through a signal
- Redirects it to the correct system interrupt (Programmable piece)

The Basics

Asynchronous/hardware interrupts
- CPU Timer Expires
- User presses key on keyboard
- Network Card Receives data
Synchronous/software interrupts
- Errors (Divide By Zero, etc).
- Page Faults
- Interrupt instruction (like int 3)
  - What is int 3?

Types of Synchronous Interrupts

What kind of interrupt is an int 3 instruction?
Traps - Pauses execution of a program. Generally executed after an instruction.
- Preserves program continuity (breakpoint)
Fault - An error happens, but can possibly be corrected
- State is saved and processor restores state to where it was before faulting via the interrupt handler
Aborts - Unrecoverable error - program exits after interrupt handler runs.

x86 Interrupt Handling

Once it receives it, it raises the interrupt line for a CPU
- This CPU must not be masking the interrupt
The CPU then stops what is doing and handles the interrupt
- Checks the interrupt vector number
- Executes the interrupt handling code based on the interrupt descriptor table
- After execution is completed, it informs the APIC via the out instruction

x86 Interrupt Handling

Some things to note:

The CPU saves the state of the running program if an interrupt has occurred on the stack
Sets RIP to an address on the interrupt descriptor table (calculated by interrupt vector number

What is an Interrupt Descriptor Table?

A function table containing code to handle various interrupts
- Mapped by interrupt vector number
Set at kernel boot time via the lidt instruction.
- Contains one operand: a structure containg size and starting address of the IDT
- Informs the CPU how big the IDT is and where it is located

Linux Interrupt Handling

On bootup, the kernel initializes a global variable called idt_table with the proper gates
During cpu_init, the kernel calls load_current_idt, which calls load_idt, which in turn executes the lidt instruction
When the kernel's interrupt handlers are invoked they run in the ring level specified in the given interrupt entry in the IDT
After an interrupt handler runs, it terminates in an iret instruction, which restores state for the code that are interrupted
Will continue a little more later...

Into the weeds of Interrupts

Interrupt Descriptor Table

CPU reads an interrupt descriptor table to determine how to handle interrupts
Reference: /arch/x86/kernel/idt.c
- Look at def_idts, apic_idts, idt_table
- Entries are of type idt_data - Not what the CPU Uses
Linux converts idt_data into correct format for the CPU
- idt_init_desc converts a single idt_data to a gate_desc
- gate_desc is the format x86 CPU Wants

Interrupt Descriptor Table

First 32 entries are reserved for exceptions
The other interrupt vectors are usable by external IRQs
- Can be mapped to any interrupt vector greater than 31

Interupt Handlers

Often known as Interrupt Service Routines (ISRs)
Functions invoked from receiving an interrupt
- Perform any computation or processing needed to handle the interrupt
- Can you think of any examples?
  - Handling a keystroke
- They shouldn't block or do a lot of processing

Interupt Handlers

There is a common handler called common_interrupt
- Shared by all IRQ interrupts
common_interrupt calls do_IRQ, which finds the right Interrupt handler on the vector and calls it
Some important notes:
- Interrupt vectors 0-31 share some macro code
  - All are distinct handlers
- Actual interrupt handlers referenced in the IDT are defined in /arch/x86/entry/entry_64.S

How do interrupts work?

The task switches context to the interrupt context
- This is where interrupts and their respective handlers can operate
- In the interrupt context, all other interrupts are still enabled (can have two interrupts happen at once).
  - This can be disabled by the programmer
Interrupt Handlers operate in its own context
- Have their own stack (very small - one page)

How do interrupts work?

For asynchronous interrupts: the device sends a signal to the interrupt controller on the CPU
- Lookup signals in the Linux manual
- This is called an Interrupt Request (IRQ)
Interrupt Controller (in the kernel) monitors IRQ lines
Interrupt controller sends a signal to the processor

How do interrupts work?

Based on the IRQ, runs the kernel function defined in the IDT
The interrupt handler routine (function) runs
Handler exits, kernel resumes normal execution
- Also executes ret_from _intr

Programming the APIC

The APIC routes IRQs to vectors
- Helps to tell the CPU which vector to run
- APIC needs to tell which CPU the interrupt request to be routed to
APIC is programmed at boot time
- Done by reading/writing various memory-mapped registers
References:
- /arch/x86-/apic/io_apic.c
- /arch/x86/include/asm/io_apic.h (structure sent to APIC)
  - struct IO_APIC_route_entry

Programming the APIC

Some IRQ numbers are legacy or from standards
- IRQ 1 is the keyboard

Lets See this Visually

User Presses Key
Raise IRQ Line (Raise an interrupt)
Map IRQ to interrupt Vector
Send vector to local APIC
Save State, Switch stacks, put interrupt vector on stack
Call the interrupt handler
Kenrel tells APIC that interrupt is handled

IO APIC

CPU/IDT

Registering and Handling an interrupt

Think of it in two phases - Top half and Bottom Half
Top half refers to the handler & APIC - and it cannot block.
- Must execute briefly so that it doesn't stall the CPU
Any more processing must be deferred to the bottom half
- These are scheduled

Registering and Handling an interrupt

What is the bottom half?
- It is where deferred work is handled
Three ways this is handled
- Softirq
- Tasklets
- Workqueues

Softirq

Determined statically at compile-time - kernel/softirq.c
An array that contains NR_SOFTIRQ (10) softirq's, and each one has a particular action
- Can also be observed via /proc/softirqs
- Softirqs need to be re-entrant

enum
{
        HI_SOFTIRQ=0,
        TIMER_SOFTIRQ,
        NET_TX_SOFTIRQ,
        NET_RX_SOFTIRQ,
        BLOCK_SOFTIRQ,
        BLOCK_IOPOLL_SOFTIRQ,
        TASKLET_SOFTIRQ,
        SCHED_SOFTIRQ,
        HRTIMER_SOFTIRQ,
        RCU_SOFTIRQ,
        NR_SOFTIRQS
};

Softirq

Can only be executed if raised - raise_softirq(TIMER_SOFTIRQ)
How is it executed
- Returning from a hardware interrupt
- Explicitly called by some subsystem or kernel thread
Extremely time-sensitive processing

From: https://www.oreilly.com/library/view/understanding-the-linux/0596005652/ch04s07.html

Tasklet

An implementation on top of softirq (particularly HI_SOFTIRQ and TASKLET_SOFTIRQ)
Operate on a list of tasklets that are initialized and allocated at runtime
Tasklet can be run only on one CPU at a time
Important functions:
- tasklet_schedule and tasklet_hi_schedule
- tasklet_init - initialize a tasklet
- tasklet_disable - disables a tasklet
- tasklet_enable - enables a tasklet
- tasklet_kill - deletes a tasklet from the queue
- Tasklet handler definition: void tasklet_hadnler(unsigned long data);

Work Queues

Defer interrupt work to a kernel thread by operating in process context
- Can handle synchronization better than softirq and tasklet (waiting on a semaphore, block I/O, etc.)
To handle worker queues, can create your own kernel thread or from the generic worker threads already created
- worker_thread function is used for the kernel worker thread
  - Puts a thread to sleep until it is woken up to perform work
  - Operate on a linked list of work_struct

Work Queues Functions

See /includ/linux/workqueue.h
DECLARE_WORK or INIT_WORK - initailize a worker queue (work_struct)
schedule_work - don't need to describe this one
flush_scheduled_work - wait for work to be done
Others....

The Interrupt Handler Interface

request_irq/free_irq - register and unregister an interrupt handler
- Flags used to register a handler
  - IRQF_DISABLED - Disable all interrupts when this handler executes
  - IRQF_SAMPLE_RANDOM - Use this handler as an entropy source
  - IRQF_TIMER - Processes system timer interrupts
  - IRQF_SHARED - Can be shared by mutliple handlers
local_irq_disable/local_irq_enable
in_interrupt
in_irq
local_irq_save/local_irq_restore
See more details in /linux/interrupt.h

Working with Linux Kernel

What do you need?

Linux Kernel Source
- Obtainable from apt
- Get it from kernel.org

Compiling the Kernel

Why would one want to compile the kernel themselves?
- Enabling debugging Features
- Add functionality
- Change Functionality
- Building for a new architecture
The virtual machine has a customized kernel
- We will compile it once, but not more than that because it takes. a long time to do

Creating our debugging Environment

We will be spending some time creating our debugging environment
- We will be using VMWare Workstation/Fusion to do this
- We will create a virtual serial port to communicate over
- We will also use dmesg for print statements (quickest way to debug)
- You might need to continue to this at home
Alternatively, you can use pwn.college in practice mode

Creating our debugging Environment

The quick and dirty way of doing it - just use dmesg and printf
The not-so-quick way - compiling a kernel and enabling kernel gdb

Kernel Modules

The primary way for extending kernel functionality
Allows for various different functionality within the Linux kernel
- Support a new filesystem
- Implement a device/driver
- Implement a new protocol
- New Scheduling algorithm

Kernel Modules

#include <linux/module.h>

static int __init start(void)
{
    printk(KERN_INFO "Hello World!\n");
    return 0; 
}

static void __exit mod_stop(void)
{
    printk(KERN_INFO "Goodbye World\n");
    return;
}

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Michael Wittner");
MODULE_DESCRIPTION("Simple Demo.");
module_init(start);
module_exit(mod_stop);

Defines which functions called on load/removal of a kernel module

Macros for licensing and defining init and exit

Where can you find printk messages?

Kernel Modules

# Basic Makefile for Kernel Modules - Kernel module with one C file

obj-m := example.o # Your C file should match the H file

all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
    
clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

# Inserting kernel modules

insmod example.ko optparam1="param" optparam2=2

#Removing modules

rmmod 

#If on pwn.college practice mode, do this instead
vm build /path/to/.c/file
vm start
vm connect
#Look at vm --help and vm <command> --help for more details

When a kernel module is inserted...

The system call sys_init_module is invoked
The code is copied into memory
The license is checked
The symbols used by the module are checked in the kernel symbol table (and resolved if found)
- That symbol must be exported, can find it in /proc/kallsyms
The module's init function is invoked

Note: to export symbols, use macro EXPORT_SYMBOL

Character Devices

A kernel I/O method that uses a stream of data
All operations (reading, writing, etc.) are performed on a per-byte/character basis.
Accessed through the Linux FS (/dev/ttyXX)
Acts like a file (have to implement open, read, write for interaction)

Block Device

Similar to a character device, but performs operations on chunks of data
- Typically powers of two (128, 256, etc.)
Linux allows block devices to be accessed as a stream of bytes by applications; thus, very similar to character devices
- The kernel interface must be a full block
Accessed through /dev (e.g. /dev/sda)
Examples: Disk drive

Network Devices

Not accessible by the file system - provides interfaces to various networks instead
Facilitates the transmission and reception of data packets
Implement a backend for kernel requests for sending and receiving data

Time to build your own Kernel Module!

Homework

Kernel Internals Homework 1

For this homework, you will be creating a kernel module that implements an interrupt handler.

You will need to create an interrupt handler for an IRQ number and share it with another handler. Every time it gets interrupted, a kernel thread should be created where it increases a counter by 5. After completing it, it should print out the value using a deferred work mechanism.

Things you need to keep in mind for this homework:

How many times it is counting (make sure to remember to add necessary protections)
Print out the current value of the counter after deferring work.
How do you see what interrupt handles are already taken?

ENPM809V - Kernel Internals Part 1

By Ragnar Security

ENPM809V - Kernel Internals Part 1

ENPM809V

Some Resources to Look At

What we will be learning

Linux Kernel Fundamentals

What is the Kenrel?

x86 Protection Rings

x86 Protection Rings

Switching Protection Rings

System Calls

Privileged Instructions

Kernel Data Structures

Many Many Structures

Many Many Structures

Slightly Different Than Traditional Datastructures

Slightly Different Than Traditional Datastructures

Embedding Structures

Struture Randomization

offsetof()

container_of

task_struct

Scheduler Classes

What is it?

Completely Fair Scheduler

Red-Black Trees in Completely Fair Scheduler

Red-Black Trees

Red-Black Trees

How is it vrruntime calculated?

How is it invoked?

Kernel Threads

What are they?

API Calls

Synchronization

What the Kernel Proides

What the Kernel Proides

Interrupts

x86 Interrupt Handling

The Basics

Types of Synchronous Interrupts

x86 Interrupt Handling

x86 Interrupt Handling

What is an Interrupt Descriptor Table?

Linux Interrupt Handling

Into the weeds of Interrupts

Interrupt Descriptor Table

Interrupt Descriptor Table

Interupt Handlers

Interupt Handlers

How do interrupts work?

How do interrupts work?

How do interrupts work?

Programming the APIC

Programming the APIC

Lets See this Visually

Registering and Handling an interrupt

Registering and Handling an interrupt

Softirq

Softirq

Tasklet

Work Queues

Work Queues Functions

The Interrupt Handler Interface

Working with Linux Kernel

What do you need?

Compiling the Kernel

Creating our debugging Environment

Creating our debugging Environment

Kernel Modules

Kernel Modules

Kernel Modules

When a kernel module is inserted...

Character Devices

Block Device

Network Devices

Time to build your own Kernel Module!

Homework

Kernel Internals Homework 1

ENPM809V - Kernel Internals Part 1

More from Ragnar Security