Principles of Computer Systems
Autumn 2019
Stanford University
Computer Science Department
Lecturer: Philip Levis
process control blocks
, and they are stored in the process table
file descriptor table
read, write,
and close
)$ cat in.txt > out.txt
works)bash
shell calls make
, which itself calls g++
, each of them inserts text into the same terminal window.open
, read, write, close, stat, and lstat. We'll see may others in the coming weeks.%rip
register, and that address is typically drawn from the range of addresses managed by the code segment.malloc
, realloc
, free
, and their C++ equivalents. It's initially very small, but grows as needed for processes requiring a good amount of dynamically allocated memory.rodata
segment also stores global variables, but only those which are immutable—i.e. constants. As the runtime isn't supposed to change anything read-only, the segment housing constants can be protected so any attempts to modify it are blocked by the OS.libc
and libstdc++
with code for routines like C's printf
, C's malloc
, or C++'s getline
. Shared libraries get their own segment so all processes can trampoline through some glue code—that is, the minimum amount of code necessary—to jump into the one copy of the library code that exists for all processes.%rsp
to track the address boundary between the in-use portion of the user stack and the portion that's on deck to be used should the currently executing function invoke a subroutine.callq
and retq
instructions for user function call and return.%rdi
, %rsi
, %rdx
, %rcx
, %r8
, and %r9
. The stack frame is used as general storage for partial results that need to be stored somewhere other than a register (e.g. a seventh incoming parameter)loadFiles
as per the diagram below. Because loadFiles
's stack frame is directly below that of its caller, it can use pointer arithmetic to advance beyond its frame and examine—or even update—the stack frame above it.loadFiles
returns, main
could use pointer arithmetic to descend into the ghost of loadFiles
's stack frame and access data loadFiles
never intended to expose.open
and stat
need access to OS implementation detail that should not be exposed or otherwise accessible to the user program.kernel space
, and none of it is visible to traditional user code.callq
is used for user function call, but callq
would dereference a function pointer we're not permitted to dereference, since it resides in kernel space.callq
.%rax
. Each system call has its own opcode (e.g. 0 for read
, 1 for write
, 2 for open
, 3 for close
, 4 for stat
, and so forth).%rdi
, %rsi
, %rdx
, %r10
, %r8
, and %r9
. Note the fourth parameter is %r10
, not %rcx
.syscall
, which prompts an interrupt handler to execute in superuser mode.%rax
, and then executes iretq
to return from the interrupt handler, revert from superuser mode, and execute the instruction following the syscall
.%rax
is negative, errno
is set to abs(%rax
) and %rax
is updated to contain a -1
. If %rax
is nonnegative, it's left as is. The value in is %rax
then extracted by the caller as any return value would be.Until now, we have been studying how programs interact with hardware, and now we are going to start investigating how programs interact with the operating system.
In the CS curriculum so far, your programs have operated in a single process, meaning, basically, that one program was running your code, line-for-line. The operating system made it look like your program was the only thing running, and that was that.
Now, we are going to move into the realm of multiprocessing, where you control more than one process at a time with your programs. You will tell the OS, “do these things concurrently”, and it will.
// file: getpidEx.c
#include<stdio.h>
#include<stdlib.h>
#include <unistd.h> // getpid
int main(int argc, char **argv)
{
pid_t pid = getpid();
printf("My process id: %d\n",pid);
return 0;
}
cgregg@myth57$ ./getpidEx
My process id: 7526
fork
fork
system call.fork()
does exactly this:
fork
call returns a pid_t
(an integer) to both processes. Neither is the actual pid
of the process that receives it:
getpid
itself to retrieve it.fork
fork
is twofold:
fork
, and it is useful for a process to know whether it is the parent or the child.getppid
)fork
, getpid
, and getppid
. The full program can be viewed right here.int main(int argc, char *argv[]) {
printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
pid_t pid = fork();
assert(pid >= 0);
printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
return 0;
}
myth60$ ./basic-fork
Greetings from process 29686! (parent 29351)
Bye-bye from process 29686! (parent 29351)
Bye-bye from process 29687! (parent 29686)
myth60$ ./basic-fork
Greetings from process 29688! (parent 29351)
Bye-bye from process 29688! (parent 29351)
Bye-bye from process 29689! (parent 29688
int main(int argc, char *argv[]) {
printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
pid_t pid = fork();
assert(pid >= 0);
printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
return 0;
}
fork
is called once, but it returns twice.
fork
knows how to clone the calling process, synthesize a virtually identical copy of it, and schedule the copy as if it were running all along.
getpid
and getppid
return the process id of the caller and the process id of the caller's parent, respectively.gdb
has built-in support for debugging multiple processes, as follows:
set detach-on-fork off
gdb
to capture any fork
'd processes, though it pauses them upon the fork
.
info inferiors
gdb
has captured.inferior X
detach inferior X
gdb
to stop watching the process, and continue itbasic-fork
program right here.fork
and child generated by it:
fork
's return value in the two processes
fork
returns in the parent process, it returns the pid of the new childfork
returns in the child process, it returns 0. Again, that isn't to say the child's pid is 0, but rather that fork
elects to return a 0 as a way of allowing the child process to easily self-identify as the child process.fork
and child generated by it:
fork
's return value in the two processes
fork
returns in the parent process, it returns the pid of the new childfork
returns in the child process, it returns 0. Again, that isn't to say the child's pid is 0, but rather that fork
elects to return a 0 as a way of allowing the child process to easily self-identify as the child process.fork
, there is virtually no difference in the two processes, and they both continue after fork
as if they were the original process.wait
(more below) for child processes to complete.fork
calls (you will not be responsible for shared memory in this course)fork
calls
fork
this way, it's instructive to trace through a short program where spawned processes themselves call fork
. The full program can be viewed right here.static const char const *kTrail = "abcd";
int main(int argc, char *argv[]) {
size_t trailLength = strlen(kTrail);
for (size_t i = 0; i < trailLength; i++) {
printf("%c\n", kTrail[i]);
pid_t pid = fork();
assert(pid >= 0);
}
return 0;
}
fork
calls
a
is printed by the soon-to-be-great-grandaddy process.fork
and continue running in mirror processes, each with their own copy of the global "abcd"
string, and each advancing to the i++
line within a loop that promotes a 0 to 1. It's hopefully clear now that two b
's will be printed.b
's always consecutive?c
's get printed?d
's get printed?myth60$ ./fork-puzzle
a
b
c
b
d
c
d
c
c
d
d
d
d
d
d
myth60$
myth60$ ./fork-puzzle
a
b
b
c
d
c
d
c
d
d
c
d
myth60$ d
d
d
waitpid
waitpid
can be used to temporarily block one process until a child process exits.waitpid
can return.NULL
if we don't care for the information).waitpid
should only return when a process in the supplied wait set exits.waitpid
was called and there were no child processes in the supplied wait set.pid_t waitpid(pid_t pid, int *status, int options);
waitpid
fork
really gets used in practice (full program, with error checking, is right here):int main(int argc, char *argv[]) {
printf("Before.\n");
pid_t pid = fork();
printf("After.\n");
if (pid == 0) {
printf("I am the child, and the parent will wait up for me.\n");
return 110; // contrived exit status
} else {
int status;
waitpid(pid, &status, 0)
if (WIFEXITED(status)) {
printf("Child exited with status %d.\n", WEXITSTATUS(status));
} else {
printf("Child terminated abnormally.\n");
}
return 0;
}
}
waitpid
waitpid
.waitpid
call, and uses the WIFEXITED
WEXITSTATUS
macro to extract the lower eight bits of its argument to produce the child return value (which we can see is, and should be, 110).waitpid
call also donates child process-oriented resources back to the system. myth60$ ./separate
Before.
After.
After.
I am the child, and the parent will wait up for me.
Child exited with status 110.
myth60$
waitpid
fork
really is (full program, with more error checking, is right here).printf
gets executed twice. The child is always the first to execute it, because the parent is blocked in its waitpid
call until the child executes everything
.int main(int argc, char *argv[]) {
printf("I'm unique and just get printed once.\n");
bool parent = fork() != 0;
if ((random() % 2 == 0) == parent) sleep(1); // force exactly one of the two to sleep
if (parent) waitpid(pid, NULL, 0); // parent shouldn't exit until child has finished
printf("I get printed twice (this one is being printed from the %s).\n",
parent ? "parent" : "child");
return 0;
}
fork
multiple times, provided it reaps the child processes (via waitpid
) once they exit. If we want to reap processes as they exit without concern for the order they were spawned, then this does the trick (full program checking right here):int main(int argc, char *argv[]) {
for (size_t i = 0; i < 8; i++) {
if (fork() == 0) exit(110 + i);
}
while (true) {
int status;
pid_t pid = waitpid(-1, &status, 0);
if (pid == -1) { assert(errno == ECHILD); break; }
if (WIFEXITED(status)) {
printf("Child %d exited: status %d\n", pid, WEXITSTATUS(status));
} else {
printf("Child %d exited abnormally.\n", pid);
}
}
return 0;
}
waitpid
. That -1 states we want to hear about any child as it exits, and pids are returned in the order their processes finish.waitpid
correctly returns -1 to signal there are no more processes under the parent's jurisdiction.waitpid
returns -1, it sets a global variable called errno
to the constant ECHILD
to signal waitpid
returned -1 because all child processes have terminated. That's the "error" we want.myth60$ ./reap-as-they-exit
Child 1209 exited: status 110
Child 1210 exited: status 111
Child 1211 exited: status 112
Child 1216 exited: status 117
Child 1212 exited: status 113
Child 1213 exited: status 114
Child 1214 exited: status 115
Child 1215 exited: status 116
myth60$
myth60$ ./reap-as-they-exit
Child 1453 exited: status 115
Child 1449 exited: status 111
Child 1448 exited: status 110
Child 1450 exited: status 112
Child 1451 exited: status 113
Child 1452 exited: status 114
Child 1455 exited: status 117
Child 1454 exited: status 116
myth60$
int main(int argc, char *argv[]) {
pid_t children[8];
for (size_t i = 0; i < 8; i++) {
if ((children[i] = fork()) == 0) exit(110 + i);
}
for (size_t i = 0; i < 8; i++) {
int status;
pid_t pid = waitpid(children[i], &status, 0);
assert(pid == children[i]);
assert(WIFEXITED(status) && (WEXITSTATUS(status) == (110 + i)));
printf("Child with pid %d accounted for (return status of %d).\n",
children[i], WEXITSTATUS(status));
}
return 0;
}
reap-in-fork-order
executable. The pids change between runs, but even those are guaranteed to be published in increasing order.myth60$ ./reap-as-they-exit
Child with pid 4689 accounted for (return status of 110).
Child with pid 4690 accounted for (return status of 111).
Child with pid 4691 accounted for (return status of 112).
Child with pid 4692 accounted for (return status of 113).
Child with pid 4693 accounted for (return status of 114).
Child with pid 4694 accounted for (return status of 115).
Child with pid 4695 accounted for (return status of 116).
Child with pid 4696 accounted for (return status of 117).
myth60$