Principles of Computer Systems
Spring 2019
Stanford University
Computer Science Department
Instructors: Chris Gregg and
Phil Levis
"The barman asks what the first one wants, two race conditions walk into a bar."
NULL pointer.SIGSEGV, informally known as a segmentation fault (or a SEGmentation Violation, or SIGSEGV, for short).SIGSEGV terminates the program and generates a core dump.SIGSEGV) is represented internally by some number (e.g. 11). In fact, C #defines SIGSEGV to be the number 11.SIGFPE signal to the offending process. By default, the program handles the SIGFPE by printing an error message announcing the zero denominator and generating a core dump.SIGINT to the foreground process (and by default, that foreground is terminated).SIGTSTP to the foreground process (and by default, the foreground process is halted until a subsequent SIGCONT signal instructs it to continue).SIGPIPE to the offending process. The default SIGPIPE handler prints a message identifying the pipe error and terminates the program.SIGCHLD signal to the process's parent.
waitpid call.SIGCHLD handler to be asynchronously invoked whenever a child process changes state.SIGCHLD handlers almost always include calls to waitpid, which can be used to surface the pids of child processes that've changed state. If the child process of interest actually terminated, either normally or abnormally, the waitpid also culls the zombie the relevant child process has become.SIGCHLD handler.static const size_t kNumChildren = 5;
static size_t numDone = 0;
int main(int argc, char *argv[]) {
printf("Let my five children play while I take a nap.\n");
signal(SIGCHLD, reapChild);
for (size_t kid = 1; kid <= 5; kid++) {
if (fork() == 0) {
sleep(3 * kid); // sleep emulates "play" time
printf("Child #%zu tired... returns to dad.\n", kid);
return 0;
}
}reapChild, of course, handles each of the SIGCHLD signals delivered as each child process exits.signal prototype doesn't allow for state to be shared via parameters, so we have no choice but to use global variables. // code below is a continuation of that presented on the previous slide
while (numDone < kNumChildren) {
printf("At least one child still playing, so dad nods off.\n");
snooze(5); // our implementation -- does not wake up upon signal
printf("Dad wakes up! ");
}
printf("All children accounted for. Good job, dad!\n");
return 0;
}
static void reapChild(int unused) {
waitpid(-1, NULL, 0);
numDone++;
}SIGCHLD handler is invoked 5 times, each in response to some child process finishing up.cgregg@myth60$ ./five-children
Let my five children play while I take a nap.
At least one child still playing, so dad nods off.
Child #1 tired... returns to dad.
Dad wakes up! At least one child still playing, so dad nods off.
Child #2 tired... returns to dad.
Child #3 tired... returns to dad.
Dad wakes up! At least one child still playing, so dad nods off.
Child #4 tired... returns to dad.
Child #5 tired... returns to dad.
Dad wakes up! All children accounted for. Good job, dad!
cgregg@myth60$sleep(3 * kid) is now sleep(3) so all five children flashmob dad when they're all done.cgregg*@myth60$ ./broken-pentuplets
Let my five children play while I take a nap.
At least one child still playing, so dad nods off.
Kid #1 done playing... runs back to dad.
Kid #2 done playing... runs back to dad.
Kid #3 done playing... runs back to dad.
Kid #4 done playing... runs back to dad.
Kid #5 done playing... runs back to dad.
Dad wakes up! At least one child still playing, so dad nods off.
Dad wakes up! At least one child still playing, so dad nods off.
Dad wakes up! At least one child still playing, so dad nods off.
Dad wakes up! At least one child still playing, so dad nods off.
^C # I needed to hit ctrl-c to kill the program that loops forever!
cgregg@myth60$SIGCHLD signals are delivered while dad is off the processor, the operating system only records the fact that at one or more SIGCHLDs came in.SIGCHLD handler, it must do so on behalf of the one or more signals that may have been delivered since the last time it was on the processor.SIGCHLD handler needs to call waitpid in a loop, as with:static void reapChild(int unused) {
while (true) {
pid_t pid = waitpid(-1, NULL, 0);
if (pid < 0) break;
numDone++;
}
}reapChild implementation seemingly fixes the pentuplets program, but it changes the behavior of the first five-children program.
SIGCHLD handler will call waitpid once, and it will return the pid of the first child.SIGCHLD handler will then loop around and call waitpid a second time.waitpid to only reap children that have exited but to return without blocking, even if there are more children still running. We use WNOHANG for this, as with:static void reapChild(int unused) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break; // note the < is now a <=
numDone++;
}
}SIGCHLD handlers generally have this while loop structure.
if (pid < 0)test to if (pid <= 0).WNOHANG being passed in as the third argument.waitpid can include several flags bitwise-or'ed together.
WUNTRACED informs waitpid to block until some child process has either ended or been stopped.WCONTINUED informs waitpid to block until some child process has either ended or resumed from a stopped state.WUNTRACED | WCONTINUED | WNOHANG asks that waitpid return information about a child process that has changed state (i.e. exited, crashed, stopped, or continued) but to do so without blocking.fork) and asynchronous signal handling (as you do with signal), concurrency issues and race conditions will creep in unless you code very, very carefully.printf statements stating where pids would be added to and removed from the job list data structure instead of actually doing it.// job-list-broken.c
static void reapProcesses(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
printf("Job %d removed from job list.\n", pid);
}
}
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
for (size_t i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) execvp(kArguments[0], kArguments);
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
}
return 0;
}
myth60$ ./job-list-broken
Sun Jan 27 03:57:30 PDT 2019
Job 27981 removed from job list.
Job 27981 added to job list.
Sun Jan 27 03:57:31 PDT 2019
Job 27982 removed from job list.
Job 27982 added to job list.
Sun Jan 27 03:57:32 PDT 2019
Job 27985 removed from job list.
Job 27985 added to job list.
myth60$ ./job-list-broken
Sun Jan 27 03:59:33 PDT 2019
Job 28380 removed from job list.
Job 28380 added to job list.
Sun Jan 27 03:59:34 PDT 2019
Job 28381 removed from job list.
Job 28381 added to job list.
Sun Jan 27 03:59:35 PDT 2019
Job 28382 removed from job list.
Job 28382 added to job list.
myth60$myth60$ ./job-list-broken
Sun Jan 27 03:57:30 PDT 2019
Job 27981 removed from job list.
Job 27981 added to job list.
Sun Jan 27 03:57:31 PDT 2019
Job 27982 removed from job list.
Job 27982 added to job list.
Sun Jan 27 03:57:32 PDT 2019
Job 27985 removed from job list.
Job 27985 added to job list.
myth60$ ./job-list-broken
Sun Jan 27 03:59:33 PDT 2019
Job 28380 removed from job list.
Job 28380 added to job list.
Sun Jan 27 03:59:34 PDT 2019
Job 28381 removed from job list.
Job 28381 added to job list.
Sun Jan 27 03:59:35 PDT 2019
Job 28382 removed from job list.
Job 28382 added to job list.
myth60$sleep(1) call, which allows the child process to churn through its date program and print the date and time to stdout.sleep(1) is removed, it's possible that the child executes date, exits, and forces the parent to execute its SIGCHLD handler before the parent gets to its own printf. The fact that it's possible means we have a concurrency issue.reapProcesses from running until it's safe or sensible to do so. Restated, we'd like to postpone reapProcesses from executing until the parent's printf has returned.sigset_t type is a small primitive—usually a 32-bit, unsigned integer—that's used as a bit vector of length 32. Since there are just under 32 signal types, the presence or absence of signums can be captured via an ordered collection of 0's and 1's.sigemptyset is used to initialize the sigset_t at the supplied address to be the empty set of signals. We generally ignore the return value.sigaddset is used to ensure the supplied signal number, if not already present, gets added to the set addressed by additions. Again, we generally ignore the return value.sigprocmask adds (if op is set to SIG_BLOCK) or removes (if op is set to SIG_UNBLOCK) the signals reachable from delta to/from the set of signals being ignored at the moment. The third argument is the location of a sigset_t that can be updated with the set of signals being blocked at the time of the call. Again, we generally ignore the return value.int sigemptyset(sigset_t *set);
int sigaddset(sigset_t *additions, int signum);
int sigprocmask(int op, const sigset_t *delta, sigset_t *existing);SIGCHLDs:NULLis passed as the third argument to both sigprocmask calls. That just means that I don't care to hear about what signals were being blocked before the call.static void imposeSIGCHLDBlock() {
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
sigprocmask(SIG_BLOCK, &set, NULL);
}
static void liftSignalBlocks(const vector<int>& signums) {
sigset_t set;
sigemptyset(&set);
for (int signum: signums) sigaddset(&set, signum);
sigprocmask(SIG_UNBLOCK, &set, NULL);
}
// job-list-fixed.c
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
for (size_t i = 0; i < 3; i++) {
sigprocmask(SIG_BLOCK, &set, NULL);
pid_t pid = fork();
if (pid == 0) {
sigprocmask(SIG_UNBLOCK, &set, NULL);
execvp(kArguments[0], kArguments);
}
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
sigprocmask(SIG_UNBLOCK, &set, NULL);
}
return 0;
}
myth60$ ./job-list-fixed
Sun Jan 27 05:16:54 PDT 2019
Job 3522 added to job list.
Job 3522 removed from job list.
Sun Jan 27 05:16:55 PDT 2019
Job 3524 added to job list.
Job 3524 removed from job list.
Sun Jan 27 05:16:56 PDT 2019
Job 3527 added to job list.
Job 3527 removed from job list.
myth60$ ./job-list-fixed
Sun Jan 27 05:17:15 PDT 2018
Job 4677 added to job list.
Job 4677 removed from job list.
Sun Jan 27 05:17:16 PDT 2018
Job 4691 added to job list.
Job 4691 removed from job list.
Sun Jan 27 05:17:17 PDT 2018
Job 4692 added to job list.
Job 4692 removed from job list.
myth60$reapProcesses is the same as before, so I didn't reproduce it.printf—that is, it's added the pid to the job list.forked process inherits blocked signal sets, so it needs to lift the block via its own call to sigprocmask(SIG_UNBLOCK, ...). While it doesn't matter for this example (date almost certainly doesn't spawn its own children or rely on SIGCHLD signals), other executables may very well rely on SIGCHLD, as signal blocks are retained even across execvp boundaries.myth60$ ./job-list-fixed
Sun Jan 27 05:16:54 PDT 2019
Job 3522 added to job list.
Job 3522 removed from job list.
Sun Jan 27 05:16:55 PDT 2019
Job 3524 added to job list.
Job 3524 removed from job list.
Sun Jan 27 05:16:56 PDT 2019
Job 3527 added to job list.
Job 3527 removed from job list.
myth60$ ./job-list-fixed
Sun Jan 27 05:17:15 PDT 2018
Job 4677 added to job list.
Job 4677 removed from job list.
Sun Jan 27 05:17:16 PDT 2018
Job 4691 added to job list.
Job 4691 removed from job list.
Sun Jan 27 05:17:17 PDT 2018
Job 4692 added to job list.
Job 4692 removed from job list.
myth60$int kill(pid_t pid, int signum);
int raise(int signum); // equivalent to kill(getpid(), signum);kill and raisekill system call. And processes can even send themselves signals using raise.kill system call is analogous to the /bin/kill shell command.kill implies SIGKILL implies death.kill and raise. Just make sure you call it properly.pid parameter is overloaded to provide more flexible signaling.pid is a positive number, the target is the process with that pid.pid is a negative number less than -1, the targets are all processes within the process group abs(pid). We'll rely on this in Assignment 4.pid can also be 0 or -1, but we don't need to worry about those. See the man page for kill if you're curious.