CS110 Lecture 5: File Descriptors, System Calls and Multiprocessing

CS110: Principles of Computer Systems

Winter 2021-2022

Stanford University

Instructors: Nick Troccoli and Jerry Cain

The Stanford University logo

CS110 Topic 1: How can we design filesystems to store and manipulate files on disk, and how can we interact with the filesystem in our programs?

Learning About Filesystems

Unix v6 Filesystem design, part 1 (files)

Unix v6 Filesystem design, part 2 (large files + directories)

Interacting with the filesystem from our programs

Lecture 2

Lecture 3

This Lecture

assign2: implement portions of a filesystem!

Learning Goals

  • Understand the use and versatility of file descriptors
  • Learn how file descriptors are used by the operating system to manage open files
  • Learn how system calls are made while preserving privacy and security
  • Become familiar with how to write a program that spawns another program

Lecture Plan

  • Recap: File descriptors, open(), close(), read() and write()
  • Operating system data structures
  • How are system calls made?
  • Introduction to multiprocessing

Lecture Plan

  • Recap: File descriptors, open(), close(), read() and write()
  • Operating system data structures
  • How are system calls made?
  • Introduction to multiprocessing

System Calls

  • Functions to interact with the operating system are part of a group of functions called system calls.
  • A system call is a public function provided by the operating system.  They are tasks the operating system can do for us that we can't do ourselves.
  • open(), close(), read() and write() are 4 system calls we use to interact with files.

open()

A function that a program can call to open a file, and potentially create a file:

// if opening an existing file
int open(const char *pathname, int flags);

// if there's potential to create a new file
int open(const char *pathname, int flags, mode_t mode);
  • pathname: the path to the file you wish to open
  • flags: a bitwise OR of options specifying the behavior for opening the file
  • mode (if applicable): the permissions to attempt to set for a created file
  • the return value is a file descriptor representing the opened file, or -1 on error

 

Many possible flags (see man page).   You must include exactly one of O_RDONLY, O_WRONLY, O_RDWR.

O_TRUNC: if the file exists already, clear it ("truncate it").

O_CREAT: if the file doesn't exist, create it

O_EXCL: the file must be created from scratch, fail if already exists

File Descriptors

  • A file descriptor is like a "ticket number" representing your currently-open file.
  • It is a unique number assigned by the operating system to refer to that file
  • Each program has its own file descriptors
  • When you wish to refer to the file (e.g. read from it, write to it) you must provide the file descriptor.
  • [NEW] file descriptors are assigned in ascending order (next FD is lowest unused)

close()

A function that a program can call to close a file when done with it.

int close(int fd);

It's important to close files when you are done with them to preserve system resources.

  • fd: the file descriptor you'd like to close.

 

You can use valgrind to check if you forgot to close any files.

read() and  write()

// read bytes from an open file
ssize_t read(int fd, void *buf, size_t count);
  • fd: the file descriptor for the file you'd like to read from
  • buf: the memory location where the read-in bytes should be put
  • count: the number of bytes you wish to read
  • The function returns -1 on error, 0 if at end of file, or nonzero if bytes were read (may not read all bytes you ask it to!)
// write bytes to an open file
ssize_t write(int fd, const void *buf, size_t count);

Same as read(), except the function writes the count bytes in buf to the file, and returns the number of bytes written.

Example: Copy

The copy program emulates cp​; it copies the contents of a source file to a specified destination.

copy-soln.c and copy-soln-full.c (with error checking)
void copyContents(int sourceFD, int destinationFD) {
  char buffer[kCopyIncrement];
  while (true) {
    ssize_t bytesRead = read(sourceFD, buffer, sizeof(buffer));
    if (bytesRead == 0) break;

    size_t bytesWritten = 0;
    while (bytesWritten < bytesRead) {
      ssize_t count = write(destinationFD, buffer + bytesWritten, bytesRead - bytesWritten);
      bytesWritten += count;
    }
  }
}

int main(int argc, char *argv[]) {
  int sourceFD = open(argv[1], O_RDONLY);
  int destinationFD = open(argv[2], O_WRONLY | O_CREAT | O_EXCL, kDefaultPermissions);

  copyContents(sourceFD, destinationFD);

  close(sourceFD);
  close(destinationFD);
  return 0;
}

File Descriptors

File descriptors are just integers - for that reason, we can store and access them just like integers.

  • If you're interacting with many files, it may be helpful to have an array of file descriptors

 

There are 3 special file descriptors provided by default to each program:

  • 0: standard input (user input from the terminal) - STDIN_FILENO
  • 1: standard output (output to the terminal) - STDOUT_FILENO
  • 2: standard error (error output to the terminal) - STDERR_FILENO

[NEW] Programs always assume that 0,1,2 represent STDIN/STDOUT/STDERR.  Even if we change them!  (eg. we close FD 1, then open a new file).  (this is how 

cat in.txt > out.txt works)

Example: Copy Extended

The copy-extended program emulates tee​; it copies the contents of a source file to specified destination(s), and also outputs it to the terminal.

// difference #1: an array of destination file descriptors
int destinationFDs[argc - 1];

// Include the terminal (STDOUT) as the first "file" so it's also printed
destinationFDs[0] = STDOUT_FILENO;

for (size_t i = 2; i < argc; i++) {
  destinationFDs[i - 1] = open(argv[i], O_WRONLY | O_CREAT | O_EXCL, kDefaultPermissions);
}
...
// difference #2: we write each chunk to every destination
for (size_t i = 0; i < numDestinationFDs; i++) {
  size_t bytesWritten = 0;
  while (bytesWritten < bytesRead) {
    ssize_t count = write(destinationFDs[i], buffer + bytesWritten, bytesRead - bytesWritten);
    bytesWritten += count;
  }
}
...

File descriptors are a powerful abstraction for working with files and other resources.  They are used for files, networking and user input/output!

Lecture Plan

  • Recap: File descriptors, open(), close(), read() and write()
  • Operating system data structures
  • How are system calls made?
  • Introduction to multiprocessing

What is a file descriptor really mapping to behind the scenes?  How does the operating system manage open files?

Our programs (e.g. FDs)

Filesystem data (e.g. inodes)

???

Filesystem Interaction: Layers

Operating System Data Structures

  • A process is a single instance of a program running
  • For each process, Linux maintains a process control block - a set of relevant information about its execution (user who launched it, CPU state, etc.).  These blocks live in the process table.
  • A process control block also stores a file descriptor table.  This is a list of info about open files/resources for this process.
  • Key idea: a file descriptor is just an index into a file descriptor table!
This diagram shows the "Process Control Blocks", handled by the OS. Specifically, it shows the "descriptor table for process ID 1000", "descriptor table for process ID 1001", and "descriptor table for process ID 1002". Each sub-diagram has an array from 0 to 9.

Operating System Data Structures

  • An entry in the file descriptor table is really a pointer to an entry in another table, the open file table.
  • The open file table is one array of information about open files across all processes.
  • Multiple file descriptor entries (even across processes!) can point to the same open file table entry.
  • An open file table entry stores changing info like "cursor" (how far into file are we?)
Almost the same as the previous diagram, except that there are many descriptor table entries. The idea is that file descriptior 0 from each descriptor table points to the same open file table entry, corresponding to STDIN. Likewise for descriptor 1 (all point to the same open file table entry, for STDOUT, which has a reference count of 3, because there are three files that have STDOUT open), and for descriptor 2 (STDERR). Each descriptor table also has some entries (above 2) that point to open files that are not shared between other processes.

Operating System Data Structures

  • This structure allows the OS to share resources across processes.
  • Also explains common behavior: e.g. multiple programs interleaving terminal output.  This is because all FDs 0,1,2 usually point to the same open file table entries!
  • Problem: we could have multiple open file table entries referring to the same file.  It seems wasteful to store that file's static info many times....
Almost the same as the previous diagram, except that there are many descriptor table entries. The idea is that file descriptior 0 from each descriptor table points to the same open file table entry, corresponding to STDIN. Likewise for descriptor 1 (all point to the same open file table entry, for STDOUT, which has a reference count of 3, because there are three files that have STDOUT open), and for descriptor 2 (STDERR). Each descriptor table also has some entries (above 2) that point to open files that are not shared between other processes.

Operating System Data Structures

  • Each open file table entry has a vnode field; a pointer to the file's static information.
  • vnodes live in the vnode table; a single table referenced by all open file table entries.  A vnode is an abstraction of a file; it includes information on what kind of file it is, how many file table entries reference it, and function pointers for performing operations.  Also cache of inode (if exists).
  • These resources are all freed over time:
    • Free a file table entry when the last file descriptor closes it
    • Free a vnode when the last file table entry is freed
    • Free a file when its reference count is 0 and there is no vnode

Operating System Data Structures

Almost the same as the last diagram, except now many of the open file table vnode entries are expanded for the vnode table.

FD table(s)

Open file table

Vnode table

Operating System Data Structures

All of these data structures are private to the operating system.  They are layered on top of the filesystem data itself.

Lecture Plan

  • Recap: File descriptors, open(), close(), read() and write()
  • Operating system data structures
  • How are system calls made?
  • Introduction to multiprocessing

System Call Execution

  • Key idea: the OS performs private, privileged tasks that regular user programs cannot do, with data that user programs cannot access.
  • Problem: because of this, we can't have system calls behave like regular function calls - there are security risks to having OS data in user-accessible memory!

E.g. loadFiles can poke around in main's stack frame, or main can poke around in the values left behind by loadFiles after it finishes.  

Functions are supposed to be modular, but the function call and return protocol's support for modularity and privacy is pretty soft.

Refresher: Function Call Semantics

  • Refresher: for a normal function call, the stack grows downwards to add a new stack frame.  Parameters are passed in registers like %rdi and %rsi, and the return value (if any) is put in %rax.
  • This means stack frames are adjacent, and can in theory be manipulated via pointer arithmetic when they're not supposed to
  • Solution: a range of addresses will be reserved as "kernel space"; user programs cannot access this memory.  Instead of using the user stack and memory space, system calls will use kernel space and execute in a "privileged mode".  But this means function calls must work differently!

System Call Semantics

New approach for calling functions if they are system calls:

  • put the system call "opcode" in %rax (e.g. 0 for read, 1 for write, 2 for open, 3 for close, and so forth).  Each has its own unique opcode.
  • put up to 6 parameters in normal registers except for %rcx (use %r10 instead)
  • store the address of the next user program instruction in %rcx instead of %rip
  • The syscall assembly instruction triggers a software interrupt that switches execution over to "superuser" mode.
  • The system call executes in privileged mode and puts its return value in %rax, and returns (using iretq, "interrupt" version of retq)
  • If %rax is negative, the global errno is set to abs(%rax), and %rax is changed to -1.
  • The system transfers control back to the user program.

CS110 Topic 1: How can we design filesystems to store and manipulate files on disk, and how can we interact with the filesystem in our programs?

CS110 Filesystems Recap

  • User programs interact with the filesystem via file descriptors and the system calls open, close, read and write
  • The operating system stores a per-process file descriptor table with pointers to open file table entries containing info about the open files
  • The open file table entries point to vnodes, which cache inodes
  • inodes are file system representations of files/directories. We can look at an inode to find the blocks containing the file/directory data.
  • Inodes can use indirect addressing to support large files/directories
  • Key principles: abstraction, layers, naming
Same diagram as earlier: This diagram shows a vertical list of connections from userspace processes down to harder. Specifically: userspace processes<--system calls-->operating system<--vnode interface-->file system (ext4fs, s6fs, btfs)<--block interface-->device driver (SATA, iSCSI)<--hardare interface-->hardware (PCI-E, DMA, RDMA, etc.)

Lecture Plan

  • Recap: File descriptors, open(), close(), read() and write()
  • Operating system data structures
  • How are system calls made?
  • Introduction to multiprocessing

CS110 Topic 2: How can our program create and interact with other programs?

Program: code you write to execute tasks

Process: an instance of your program running; consists of program and execution state.

 

Key idea: multiple processes can run the same program

Multiprocessing Terminology

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    printf("Goodbye!\n");
    return 0;
}

Process 5621

Your computer runs many processes simultaneously - even with just 1 processor core (how?)

  • "simultaneously" = switch between them so fast humans don't notice
  • Your program thinks it's the only thing running
  • OS schedules processes - who gets to run when
  • Each process gets a little time, then has to wait
  • Many times, waiting is good!  E.g. waiting for key press, waiting for disk
  • Caveat: multicore computers can truly multitask

Multiprocessing

Playing With Processes

When you run a program from the terminal, it runs in a new process.

  • The OS gives each process a unique "process ID" number (PID)
  • PIDs are useful once we start managing multiple processes
  • getpid() returns the PID of the current process
// getpid.c
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    pid_t myPid = getpid();
    printf("My process ID is %d\n", myPid);
    return 0;
}
$ ./getpid 
My process ID is 18814

$ ./getpid 
My process ID is 18831
$ ./myprogram 

fork()

fork() creates a second process that is a clone of the first:

pid_t fork();
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process A

$ ./myprogram 
Hello, world!

fork()

fork() creates a second process that is a clone of the first:

pid_t fork();
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process A

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process A

$ ./myprogram 
Hello, world!

fork()

fork() creates a second process that is a clone of the first:

pid_t fork();
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process A

Process A

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process B

$ ./myprogram 
Hello, world!
Goodbye!
Goodbye!

fork()

fork() creates a second process that is a clone of the first:

pid_t fork();
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process A

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process A

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    fork();
    printf("Goodbye!\n");
    return 0;
}

Process B

$ ./myprogram2

fork()

fork() creates a second process that is a clone of the first:

Process A

pid_t fork();
int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}
$ ./myprogram2 
Hello, world!

fork()

fork() creates a second process that is a clone of the first:

Process A

pid_t fork();
int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}
$ ./myprogram2
Hello, world!

fork()

fork() creates a second process that is a clone of the first:

pid_t fork();

Process B

int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}

Process A

int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}
$ ./myprogram2
Hello, world!
Goodbye, 2!
Goodbye, 2!

fork()

fork() creates a second process that is a clone of the first:

pid_t fork();

Process B

int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}

Process A

int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}

fork()

fork() creates a second process that is a clone of the first:

  • parent (original) process forks off a child (new) process
  • The child starts execution on the next program instruction.  The parent continues execution with the next program instruction.  The order from now on is up to the OS!
  • fork() is called once, but returns twice (why?)
  • Everything is duplicated in the child process
    • File descriptor table (increasing reference counts on open file table entries)
    • Mapped memory regions (the address space)
    • Regions like stack, heap, etc. are copied
pid_t fork();

Illustration courtesy of Roz Cyrus.

Process Clones

The parent process’ file descriptor table is cloned on fork and the reference counts within the relevant open file table entries are incremented.  This explains how the child can still output to the same terminal!

Illustration courtesy of Roz Cyrus.

fork()

Process B

int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}

Process A

int main(int argc, char *argv[]) {
    int x = 2;
    printf("Hello, world!\n");
    fork();
    printf("Goodbye, %d!\n", x);
    return 0;
}

(Am I the parent or the child?)

Is there a way for the processes to tell which is the parent and which is the child?

Key Idea: the return value of fork() is different in the parent and the child.

fork()

fork() creates a second process that is a clone of the first:

pid_t fork();
  • parent (original) process forks off a child (new) process
  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
$ ./myprogram

fork()

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork();
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 110

  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
$ ./myprogram2
Hello, world!

fork()

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork();
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 110

  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
$ ./myprogram2
Hello, world!

fork()

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 111
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 110

  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 0
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 111

$ ./myprogram
Hello, world!
fork returned 111
fork returned 0

fork()

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 111
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 110

  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 0
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 111

$ ./myprogram
Hello, world!
fork returned 111
fork returned 0

fork()

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 111
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 110

  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 0
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 111

$ ./myprogram
Hello, world!
fork returned 0
fork returned 111

OR

$ ./myprogram
Hello, world!
fork returned 111
fork returned 0

fork()

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 111
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 110

  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    pid_t pidOrZero = fork(); // 0
    printf("fork returned %d\n", pidOrZero);
    return 0;
}

Process 111

$ ./myprogram
Hello, world!
fork returned 0
fork returned 111

OR

We can no longer assume the order in which our program will execute!  The OS decides the order.

fork()

  • In the parent, fork() will return the PID of the child (only way for parent to get child's PID)
  • In the child, fork() will return 0 (this is not the child's PID, it's just 0)
  • A process can use getppid() to get the PID of its parent
  • if fork() returns < 0, that means an error occurred
// basic-fork.c
int main(int argc, char *argv[]) {
    printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
    pid_t pidOrZero = fork();
    assert(pidOrZero >= 0);
    printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
    return 0;
}
$ ./basic-fork 
Greetings from process 29686! (parent 29351)
Bye-bye from process 29686! (parent 29351)
Bye-bye from process 29687! (parent 29686)

$ ./basic-fork 
Greetings from process 29688! (parent 29351)
Bye-bye from process 29689! (parent 29688
Bye-bye from process 29688! (parent 29351)
  • The parent of the original process is the shell - the program that you run in the terminal.
  • The ordering of the parent and child output is nondeterministic. Sometimes the parent prints first, and sometimes the child prints first!

Recap

  • Recap: File descriptors, open(), close(), read() and write()
  • Operating system data structures
  • How are system calls made?
  • Introduction to multiprocessing

 

 

Next time: more multiprocessing