CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Unix v6 Filesystem design, part 1 (files)
Unix v6 Filesystem design, part 2 (large files + directories)
Interacting with the filesystem from our programs
assign2: implement portions of a filesystem!
Today's Ed Thread: https://edstem.org/us/courses/16701/discussion/996212
open()
A function that a program can call to open a file, and potentially create a file:
// if opening an existing file
int open(const char *pathname, int flags);
// if there's potential to create a new file
int open(const char *pathname, int flags, mode_t mode);
Many possible flags (see man page). You must include exactly one of O_RDONLY, O_WRONLY, O_RDWR.
O_TRUNC: if the file exists already, clear it ("truncate it").
O_CREAT: if the file doesn't exist, create it
O_EXCL: the file must be created from scratch, fail if already exists
close()
A function that a program can call to close a file when done with it.
int close(int fd);
It's important to close files when you are done with them to preserve system resources.
You can use valgrind to check if you forgot to close any files.
read()
and write()
// read bytes from an open file
ssize_t read(int fd, void *buf, size_t count);
// write bytes to an open file
ssize_t write(int fd, const void *buf, size_t count);
Same as read(), except the function writes the count bytes in buf to the file, and returns the number of bytes written.
The copy program emulates cp; it copies the contents of a source file to a specified destination.
copy-soln.c and copy-soln-full.c (with error checking)
void copyContents(int sourceFD, int destinationFD) {
char buffer[kCopyIncrement];
while (true) {
ssize_t bytesRead = read(sourceFD, buffer, sizeof(buffer));
if (bytesRead == 0) break;
size_t bytesWritten = 0;
while (bytesWritten < bytesRead) {
ssize_t count = write(destinationFD, buffer + bytesWritten, bytesRead - bytesWritten);
bytesWritten += count;
}
}
}
int main(int argc, char *argv[]) {
int sourceFD = open(argv[1], O_RDONLY);
int destinationFD = open(argv[2], O_WRONLY | O_CREAT | O_EXCL, kDefaultPermissions);
copyContents(sourceFD, destinationFD);
close(sourceFD);
close(destinationFD);
return 0;
}
File descriptors are just integers - for that reason, we can store and access them just like integers.
There are 3 special file descriptors provided by default to each program:
[NEW] Programs always assume that 0,1,2 represent STDIN/STDOUT/STDERR. Even if we change them! (eg. we close FD 1, then open a new file). (this is how
cat in.txt > out.txt
works)
copy-extended-soln.c and copy-extended-soln-full.c (with error checking)
The copy-extended program emulates tee; it copies the contents of a source file to specified destination(s), and also outputs it to the terminal.
// difference #1: an array of destination file descriptors
int destinationFDs[argc - 1];
// Include the terminal (STDOUT) as the first "file" so it's also printed
destinationFDs[0] = STDOUT_FILENO;
for (size_t i = 2; i < argc; i++) {
destinationFDs[i - 1] = open(argv[i], O_WRONLY | O_CREAT | O_EXCL, kDefaultPermissions);
}
...
// difference #2: we write each chunk to every destination
for (size_t i = 0; i < numDestinationFDs; i++) {
size_t bytesWritten = 0;
while (bytesWritten < bytesRead) {
ssize_t count = write(destinationFDs[i], buffer + bytesWritten, bytesRead - bytesWritten);
bytesWritten += count;
}
}
...
Our programs (e.g. FDs)
Filesystem data (e.g. inodes)
???
FD table(s)
Open file table
Vnode table
All of these data structures are private to the operating system. They are layered on top of the filesystem data itself.
E.g. loadFiles can poke around in main's stack frame, or main can poke around in the values left behind by loadFiles after it finishes.
Functions are supposed to be modular, but the function call and return protocol's support for modularity and privacy is pretty soft.
New approach for calling functions if they are system calls:
read
, 1 for write
, 2 for open
, 3 for close
, and so forth). Each has its own unique opcode.
Program: code you write to execute tasks
Process: an instance of your program running; consists of program and execution state.
Key idea: multiple processes can run the same program
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
printf("Goodbye!\n");
return 0;
}
Process 5621
Your computer runs many processes simultaneously - even with just 1 processor core (how?)
When you run a program from the terminal, it runs in a new process.
// getpid.c
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
pid_t myPid = getpid();
printf("My process ID is %d\n", myPid);
return 0;
}
$ ./getpid
My process ID is 18814
$ ./getpid
My process ID is 18831
$ ./myprogram
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process A
$ ./myprogram
Hello, world!
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process A
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process A
$ ./myprogram
Hello, world!
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process A
Process A
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process B
$ ./myprogram
Hello, world!
Goodbye!
Goodbye!
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process A
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process A
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
fork();
printf("Goodbye!\n");
return 0;
}
Process B
$ ./myprogram2
fork()
fork() creates a second process that is a clone of the first:
Process A
pid_t fork();
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
$ ./myprogram2
Hello, world!
fork()
fork() creates a second process that is a clone of the first:
Process A
pid_t fork();
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
$ ./myprogram2
Hello, world!
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
Process B
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
Process A
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
$ ./myprogram2
Hello, world!
Goodbye, 2!
Goodbye, 2!
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
Process B
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
Process A
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
Illustration courtesy of Roz Cyrus.
The parent process’ file descriptor table is cloned on fork and the reference counts within the relevant open file table entries are incremented. This explains how the child can still output to the same terminal!
Illustration courtesy of Roz Cyrus.
fork()
Process B
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
Process A
int main(int argc, char *argv[]) {
int x = 2;
printf("Hello, world!\n");
fork();
printf("Goodbye, %d!\n", x);
return 0;
}
(Am I the parent or the child?)
Is there a way for the processes to tell which is the parent and which is the child?
Key Idea: the return value of fork() is different in the parent and the child.
fork()
fork() creates a second process that is a clone of the first:
pid_t fork();
$ ./myprogram
fork()
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork();
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 110
$ ./myprogram2
Hello, world!
fork()
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork();
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 110
$ ./myprogram2
Hello, world!
fork()
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 111
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 110
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 0
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 111
$ ./myprogram
Hello, world!
fork returned 111
fork returned 0
fork()
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 111
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 110
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 0
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 111
$ ./myprogram
Hello, world!
fork returned 111
fork returned 0
fork()
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 111
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 110
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 0
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 111
$ ./myprogram
Hello, world!
fork returned 0
fork returned 111
OR
$ ./myprogram
Hello, world!
fork returned 111
fork returned 0
fork()
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 111
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 110
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
pid_t pidOrZero = fork(); // 0
printf("fork returned %d\n", pidOrZero);
return 0;
}
Process 111
$ ./myprogram
Hello, world!
fork returned 0
fork returned 111
OR
We can no longer assume the order in which our program will execute! The OS decides the order.
fork()
// basic-fork.c
int main(int argc, char *argv[]) {
printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
pid_t pidOrZero = fork();
assert(pidOrZero >= 0);
printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
return 0;
}
$ ./basic-fork
Greetings from process 29686! (parent 29351)
Bye-bye from process 29686! (parent 29351)
Bye-bye from process 29687! (parent 29686)
$ ./basic-fork
Greetings from process 29688! (parent 29351)
Bye-bye from process 29689! (parent 29688
Bye-bye from process 29688! (parent 29351)
Next time: more multiprocessing