Principles of Computer Systems
Autumn 2019
Stanford University
Computer Science Department
Lecturer: Philip Levis
process control blocks, and they are stored in the process tablefile descriptor tableread, write, and close)$ cat in.txt > out.txt works)bash shell calls make, which itself calls g++, each of them inserts text into the same terminal window.open, read, write, close, stat, and lstat. We'll see may others in the coming weeks.%rip register, and that address is typically drawn from the range of addresses managed by the code segment.malloc, realloc, free, and their C++ equivalents. It's initially very small, but grows as needed for processes requiring a good amount of dynamically allocated memory.rodata segment also stores global variables, but only those which are immutable—i.e. constants. As the runtime isn't supposed to change anything read-only, the segment housing constants can be protected so any attempts to modify it are blocked by the OS.libc and libstdc++ with code for routines like C's printf, C's malloc, or C++'s getline. Shared libraries get their own segment so all processes can trampoline through some glue code—that is, the minimum amount of code necessary—to jump into the one copy of the library code that exists for all processes.%rsp to track the address boundary between the in-use portion of the user stack and the portion that's on deck to be used should the currently executing function invoke a subroutine.callq and retq instructions for user function call and return.%rdi, %rsi, %rdx, %rcx, %r8, and %r9. The stack frame is used as general storage for partial results that need to be stored somewhere other than a register (e.g. a seventh incoming parameter)loadFiles as per the diagram below. Because loadFiles's stack frame is directly below that of its caller, it can use pointer arithmetic to advance beyond its frame and examine—or even update—the stack frame above it.loadFiles returns, main could use pointer arithmetic to descend into the ghost of loadFiles's stack frame and access data loadFiles never intended to expose.open and stat need access to OS implementation detail that should not be exposed or otherwise accessible to the user program.kernel space, and none of it is visible to traditional user code.callq is used for user function call, but callq would dereference a function pointer we're not permitted to dereference, since it resides in kernel space.callq.%rax. Each system call has its own opcode (e.g. 0 for read, 1 for write, 2 for open, 3 for close, 4 for stat, and so forth).%rdi, %rsi, %rdx, %r10, %r8, and %r9. Note the fourth parameter is %r10, not %rcx.syscall, which prompts an interrupt handler to execute in superuser mode.%rax, and then executes iretq to return from the interrupt handler, revert from superuser mode, and execute the instruction following the syscall.%rax is negative, errno is set to abs(%rax) and %rax is updated to contain a -1. If %rax is nonnegative, it's left as is. The value in is %rax then extracted by the caller as any return value would be.Until now, we have been studying how programs interact with hardware, and now we are going to start investigating how programs interact with the operating system.
In the CS curriculum so far, your programs have operated in a single process, meaning, basically, that one program was running your code, line-for-line. The operating system made it look like your program was the only thing running, and that was that.
Now, we are going to move into the realm of multiprocessing, where you control more than one process at a time with your programs. You will tell the OS, “do these things concurrently”, and it will.
// file: getpidEx.c
#include<stdio.h>
#include<stdlib.h>
#include <unistd.h> // getpid
int main(int argc, char **argv)
{
pid_t pid = getpid();
printf("My process id: %d\n",pid);
return 0;
}cgregg@myth57$ ./getpidEx
My process id: 7526fork
fork system call.fork() does exactly this:
fork call returns a pid_t (an integer) to both processes. Neither is the actual pid of the process that receives it:
getpid itself to retrieve it.fork
fork is twofold:
fork, and it is useful for a process to know whether it is the parent or the child.getppid)fork, getpid, and getppid. The full program can be viewed right here.int main(int argc, char *argv[]) {
printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
pid_t pid = fork();
assert(pid >= 0);
printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
return 0;
}myth60$ ./basic-fork
Greetings from process 29686! (parent 29351)
Bye-bye from process 29686! (parent 29351)
Bye-bye from process 29687! (parent 29686)
myth60$ ./basic-fork
Greetings from process 29688! (parent 29351)
Bye-bye from process 29688! (parent 29351)
Bye-bye from process 29689! (parent 29688int main(int argc, char *argv[]) {
printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
pid_t pid = fork();
assert(pid >= 0);
printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
return 0;
}fork is called once, but it returns twice.
fork knows how to clone the calling process, synthesize a virtually identical copy of it, and schedule the copy as if it were running all along.
getpid and getppid return the process id of the caller and the process id of the caller's parent, respectively.gdb has built-in support for debugging multiple processes, as follows:
set detach-on-fork off
gdb to capture any fork'd processes, though it pauses them upon the fork.
info inferiors
gdb has captured.inferior X
detach inferior X
gdb to stop watching the process, and continue itbasic-fork program right here.fork and child generated by it:
fork's return value in the two processes
fork returns in the parent process, it returns the pid of the new childfork returns in the child process, it returns 0. Again, that isn't to say the child's pid is 0, but rather that fork elects to return a 0 as a way of allowing the child process to easily self-identify as the child process.fork and child generated by it:
fork's return value in the two processes
fork returns in the parent process, it returns the pid of the new childfork returns in the child process, it returns 0. Again, that isn't to say the child's pid is 0, but rather that fork elects to return a 0 as a way of allowing the child process to easily self-identify as the child process.fork, there is virtually no difference in the two processes, and they both continue after fork as if they were the original process.wait (more below) for child processes to complete.fork calls (you will not be responsible for shared memory in this course)fork calls
fork this way, it's instructive to trace through a short program where spawned processes themselves call fork. The full program can be viewed right here.static const char const *kTrail = "abcd";
int main(int argc, char *argv[]) {
size_t trailLength = strlen(kTrail);
for (size_t i = 0; i < trailLength; i++) {
printf("%c\n", kTrail[i]);
pid_t pid = fork();
assert(pid >= 0);
}
return 0;
}fork calls
a is printed by the soon-to-be-great-grandaddy process.fork and continue running in mirror processes, each with their own copy of the global "abcd" string, and each advancing to the i++ line within a loop that promotes a 0 to 1. It's hopefully clear now that two b's will be printed.b's always consecutive?c's get printed?d's get printed?myth60$ ./fork-puzzle
a
b
c
b
d
c
d
c
c
d
d
d
d
d
d
myth60$myth60$ ./fork-puzzle
a
b
b
c
d
c
d
c
d
d
c
d
myth60$ d
d
dwaitpid
waitpid can be used to temporarily block one process until a child process exits.waitpid can return.NULL if we don't care for the information).waitpid should only return when a process in the supplied wait set exits.waitpid was called and there were no child processes in the supplied wait set.pid_t waitpid(pid_t pid, int *status, int options);waitpid
fork really gets used in practice (full program, with error checking, is right here):int main(int argc, char *argv[]) {
printf("Before.\n");
pid_t pid = fork();
printf("After.\n");
if (pid == 0) {
printf("I am the child, and the parent will wait up for me.\n");
return 110; // contrived exit status
} else {
int status;
waitpid(pid, &status, 0)
if (WIFEXITED(status)) {
printf("Child exited with status %d.\n", WEXITSTATUS(status));
} else {
printf("Child terminated abnormally.\n");
}
return 0;
}
}waitpid
waitpid.waitpid call, and uses the WIFEXITEDWEXITSTATUS macro to extract the lower eight bits of its argument to produce the child return value (which we can see is, and should be, 110).waitpid call also donates child process-oriented resources back to the system. myth60$ ./separate
Before.
After.
After.
I am the child, and the parent will wait up for me.
Child exited with status 110.
myth60$waitpid
fork really is (full program, with more error checking, is right here).printf gets executed twice. The child is always the first to execute it, because the parent is blocked in its waitpid call until the child executes everything.int main(int argc, char *argv[]) {
printf("I'm unique and just get printed once.\n");
bool parent = fork() != 0;
if ((random() % 2 == 0) == parent) sleep(1); // force exactly one of the two to sleep
if (parent) waitpid(pid, NULL, 0); // parent shouldn't exit until child has finished
printf("I get printed twice (this one is being printed from the %s).\n",
parent ? "parent" : "child");
return 0;
}fork multiple times, provided it reaps the child processes (via waitpid) once they exit. If we want to reap processes as they exit without concern for the order they were spawned, then this does the trick (full program checking right here):int main(int argc, char *argv[]) {
for (size_t i = 0; i < 8; i++) {
if (fork() == 0) exit(110 + i);
}
while (true) {
int status;
pid_t pid = waitpid(-1, &status, 0);
if (pid == -1) { assert(errno == ECHILD); break; }
if (WIFEXITED(status)) {
printf("Child %d exited: status %d\n", pid, WEXITSTATUS(status));
} else {
printf("Child %d exited abnormally.\n", pid);
}
}
return 0;
}
waitpid. That -1 states we want to hear about any child as it exits, and pids are returned in the order their processes finish.waitpid correctly returns -1 to signal there are no more processes under the parent's jurisdiction.waitpid returns -1, it sets a global variable called errno to the constant ECHILD to signal waitpid returned -1 because all child processes have terminated. That's the "error" we want.myth60$ ./reap-as-they-exit
Child 1209 exited: status 110
Child 1210 exited: status 111
Child 1211 exited: status 112
Child 1216 exited: status 117
Child 1212 exited: status 113
Child 1213 exited: status 114
Child 1214 exited: status 115
Child 1215 exited: status 116
myth60$myth60$ ./reap-as-they-exit
Child 1453 exited: status 115
Child 1449 exited: status 111
Child 1448 exited: status 110
Child 1450 exited: status 112
Child 1451 exited: status 113
Child 1452 exited: status 114
Child 1455 exited: status 117
Child 1454 exited: status 116
myth60$int main(int argc, char *argv[]) {
pid_t children[8];
for (size_t i = 0; i < 8; i++) {
if ((children[i] = fork()) == 0) exit(110 + i);
}
for (size_t i = 0; i < 8; i++) {
int status;
pid_t pid = waitpid(children[i], &status, 0);
assert(pid == children[i]);
assert(WIFEXITED(status) && (WEXITSTATUS(status) == (110 + i)));
printf("Child with pid %d accounted for (return status of %d).\n",
children[i], WEXITSTATUS(status));
}
return 0;
}
reap-in-fork-order executable. The pids change between runs, but even those are guaranteed to be published in increasing order.myth60$ ./reap-as-they-exit
Child with pid 4689 accounted for (return status of 110).
Child with pid 4690 accounted for (return status of 111).
Child with pid 4691 accounted for (return status of 112).
Child with pid 4692 accounted for (return status of 113).
Child with pid 4693 accounted for (return status of 114).
Child with pid 4694 accounted for (return status of 115).
Child with pid 4695 accounted for (return status of 116).
Child with pid 4696 accounted for (return status of 117).
myth60$