Principles of Computer Systems
Winter 2021
Stanford University
Computer Science Department
Instructors: Chris Gregg and
Nick Troccoli
Creating processes and running other programs
Inter-process communication
Signals
Race Conditions
A signal is a way to notify a process that an event has occurred
Here are some examples of signals:
Running - a process is either executing or waiting to execute
Stopped - a process is suspended due to receiving a SIGSTOP or similar signal. A process will resume if it receives a SIGCONT signal.
Terminated - a process is permanently stopped, either due to finishing, or receiving a signal such as SIGSEGV or SIGKILL whose default behavior is to terminate the process.
When a child changes state, the kernel sends a SIGCHLD signal to its parent.
The operating system sends many signals, but we can also send signals manually.
int kill(pid_t pid, int signum);
// same as kill(getpid(), signum)
int raise(int signum);
waitpid()
Waitpid can be used to wait on children to terminate or change state:
pid_t waitpid(pid_t pid, int *status, int options);
The default behavior is to wait for the specified child process to exit. options lets us customize this further (can combine these flags using | ):
We can have a function of our choice execute when a certain signal is received.
typedef void (*sighandler_t)(int);
...
sighandler_t signal(int signum, sighandler_t handler);
// five-children.c
static const size_t kNumChildren = 5;
static size_t numChildrenDonePlaying = 0;
static void reapChild(int sig) {
waitpid(-1, NULL, 0);
numChildrenDonePlaying++;
}
int main(int argc, char *argv[]) {
printf("Let my five children play while I take a nap.\n");
signal(SIGCHLD, reapChild);
for (size_t kid = 1; kid <= kNumChildren; kid++) {
if (fork() == 0) {
sleep(3 * kid); // sleep emulates "play" time
printf("Child #%zu tired... returns to parent.\n", kid);
return 0;
}
}
while (numChildrenDonePlaying < kNumChildren) {
printf("At least one child still playing, so parent nods off.\n");
snooze(5); // custom fn to sleep uninterrupted
printf("Parent wakes up! ");
}
printf("All children accounted for. Good job, parent!\n");
return 0;
}
A signal can be received at any time, and a signal handler can execute at any time.
// five-children.c
static const size_t kNumChildren = 5;
static size_t numChildrenDonePlaying = 0;
static void reapChild(int sig) {
waitpid(-1, NULL, 0);
numChildrenDonePlaying++;
}
int main(int argc, char *argv[]) {
printf("Let my five children play while I take a nap.\n");
signal(SIGCHLD, reapChild);
for (size_t kid = 1; kid <= kNumChildren; kid++) {
if (fork() == 0) {
sleep(3); // sleep emulates "play" time
printf("Child #%zu tired... returns to parent.\n", kid);
return 0;
}
}
while (numChildrenDonePlaying < kNumChildren) {
printf("At least one child still playing, so parent nods off.\n");
snooze(5); // custom fn to sleep uninterrupted
printf("Parent wakes up! ");
}
printf("All children accounted for. Good job, parent!\n");
return 0;
}
What happens if all children sleep for the same amount of time? (E.g. change line 15 from sleep(3 * kid) to sleep(3)).
Problem: a signal handler is called if one or more signals are sent.
Solution: signal handler should clean up as many children as possible.
static void reapChild(int sig) {
waitpid(-1, NULL, 0);
numChildrenDonePlaying++;
}
Let's add a loop to reap as many children as possible.
static void reapChild(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, 0);
if (pid < 0) break;
numDone++;
}
}
Let's add a loop to reap as many children as possible.
static void reapChild(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, 0);
if (pid < 0) break;
numDone++;
}
}
Let's add a loop to reap as many children as possible.
Problem: this may block if other children are taking longer! We only want to clean up children that are done now. Others will signal later.
static void reapChild(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
numDone++;
}
}
Let's add a loop to reap as many children as possible.
Problem: this may block if other children are taking longer! We only want to clean up children that are done now. Others will signal later.
Solution: use WNOHANG, which means don't block. If there are children we would have waited on but aren't, returns 0. -1 typically means no children left.
static void reapChild(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
numDone++;
}
}
Let's add a loop to reap as many children as possible.
Solution: use WNOHANG, which means don't block. If there are children we would have waited on but aren't, returns 0. -1 typically means no children left.
Note: the kernel blocks additional signals of that type while a signal handler is running (they are sent later).
Concurrency means performing multiple actions at the same time.
fork
) and asynchronous signal handling (e.g. signal
), it's possible to have concurrency issues. These are tricky!Consider the following program, which is similar to Assignment 4. (The full program, with error checking, is right here):
printf
statements stating where pids would be added to and removed from the job list data structure instead of actually doing it.// job-list-broken.c
static void reapProcesses(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
printf("Job %d removed from job list.\n", pid);
}
}
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
for (size_t i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) execvp(kArguments[0], kArguments);
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
}
return 0;
}
myth60$ ./job-list-broken
Sun Jan 27 03:57:30 PDT 2019
Job 27981 removed from job list.
Job 27981 added to job list.
Sun Jan 27 03:57:31 PDT 2019
Job 27982 removed from job list.
Job 27982 added to job list.
Sun Jan 27 03:57:32 PDT 2019
Job 27985 removed from job list.
Job 27985 added to job list.
myth60$ ./job-list-broken
Sun Jan 27 03:59:33 PDT 2019
Job 28380 removed from job list.
Job 28380 added to job list.
Sun Jan 27 03:59:34 PDT 2019
Job 28381 removed from job list.
Job 28381 added to job list.
Sun Jan 27 03:59:35 PDT 2019
Job 28382 removed from job list.
Job 28382 added to job list.
myth60$
Symptom: it looks like jobs are being removed from the list before being added! How is this possible?
// job-list-broken.c
static void reapProcesses(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
printf("Job %d removed from job list.\n", pid);
}
}
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
for (size_t i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) execvp(kArguments[0], kArguments);
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
}
return 0;
}
Cause: there is a race condition with the signal handler. It is possible for the child to execute and terminate before the parent adds the job to the job list.
Therefore, the signal handler will be called to remove the job before the parent adds the job!
// job-list-broken.c
static void reapProcesses(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
printf("Job %d removed from job list.\n", pid);
}
}
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
for (size_t i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) execvp(kArguments[0], kArguments);
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
}
return 0;
}
Cause: there is a race condition with the signal handler. It is possible for the child to execute and terminate before the parent adds the job to the job list.
It would be nice if there was a do not disturb for signals so that we could temporarily block SIGCHLD in the parent while adding a new job.
The sigprocmask function lets us temporarily block signals of the specified types. Instead, they will be queued up and delivered when the block is removed.
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);
sigset_t is a special type (usually a 32-bit int) used as a bit vector. It must be created and initialized using special functions (we generally ignore the return values).
// Initialize to the empty set of signals
int sigemptyset(sigset_t *set);
// Set to contain all signals
int sigfillset(sigset_t *set);
// Add the specified signal
int sigaddset(sigset_t *set, int signum);
// Remove the specified signal
int sigdelset(sigset_t *set, int signum);
static void imposeSIGCHLDBlock() {
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
sigprocmask(SIG_BLOCK, &set, NULL);
}
static void unblockSignals(int *signals, int numSignals) {
sigset_t set;
sigemptyset(&set);
for (int i = 0; i < numSignals; i++) {
sigaddset(&set, signals[i]);
}
sigprocmask(SIG_UNBLOCK, &set, NULL);
}
// job-list-broken.c
static void reapProcesses(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
printf("Job %d removed from job list.\n", pid);
}
}
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
for (size_t i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) execvp(kArguments[0], kArguments);
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
}
return 0;
}
Where should we block and unblock SIGCHLD signals in the parent to fix the race condition?
Goal: we want to block signals for as little time as possible to maximize performance.
// job-list-broken.c
static void reapProcesses(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
printf("Job %d removed from job list.\n", pid);
}
}
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
for (size_t i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) execvp(kArguments[0], kArguments);
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
}
return 0;
}
Where should we block and unblock SIGCHLD signals in the parent to fix the race condition?
Goal: we want to block signals for as little time as possible to maximize performance.
Block just before line 14.
block
// job-list-broken.c
static void reapProcesses(int sig) {
while (true) {
pid_t pid = waitpid(-1, NULL, WNOHANG);
if (pid <= 0) break;
printf("Job %d removed from job list.\n", pid);
}
}
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
for (size_t i = 0; i < 3; i++) {
pid_t pid = fork();
if (pid == 0) execvp(kArguments[0], kArguments);
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
}
return 0;
}
Where should we block and unblock SIGCHLD signals in the parent to fix the race condition?
Goal: we want to block signals for as little time as possible to maximize performance.
Block just before line 14.
Unblock after line 17.
block
unblock
Where should we block and unblock SIGCHLD signals in the parent to fix the race condition?
Block SIGCHLD just before creating a child (line 12). Unblock SIGCHLD once we've added the job to the list (line 20).
(Full program here)
// job-list-fixed.c
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
// Create set with just SIGCHLD
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
for (size_t i = 0; i < 3; i++) {
sigprocmask(SIG_BLOCK, &set, NULL);
pid_t pid = fork();
if (pid == 0) {
sigprocmask(SIG_UNBLOCK, &set, NULL);
execvp(kArguments[0], kArguments);
}
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
sigprocmask(SIG_UNBLOCK, &set, NULL);
}
return 0;
}
Side note: forked children inherit blocked signals, so we must remove the block in the child (line 15).
// job-list-fixed.c
char * const kArguments[] = {"date", NULL};
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
// Create set with just SIGCHLD
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
for (size_t i = 0; i < 3; i++) {
sigprocmask(SIG_BLOCK, &set, NULL);
pid_t pid = fork();
if (pid == 0) {
sigprocmask(SIG_UNBLOCK, &set, NULL);
execvp(kArguments[0], kArguments);
}
sleep(1); // force parent off CPU
printf("Job %d added to job list.\n", pid);
sigprocmask(SIG_UNBLOCK, &set, NULL);
}
return 0;
}
Last lecture we implemented a more advanced shell that could run commands in the foreground, or background (with &). Let's see a quick demo (second-shell-soln.c)
Last lecture we implemented a more advanced shell that could run commands in the foreground, or background (with &). Let's see a quick demo (second-shell-soln.c)
There's one core problem with this implementation, that we can now fix with our knowledge of signals and signal handlers. What is it?
Last lecture we implemented a more advanced shell that could run commands in the foreground, or background (with &). Let's see a quick demo (second-shell-soln.c)
There's one core problem with this implementation, that we can now fix with our knowledge of signals and signal handlers. What is it?
If a process runs in the background, it's not cleaned up!
Last lecture we implemented a more advanced shell that could run commands in the foreground, or background (with &). Let's see a quick demo (second-shell-soln.c)
There's one core problem with this implementation, that we can now fix with our knowledge of signals and signal handlers. What is it?
If a process runs in the background, it's not cleaned up!
Fix: let's add a SIGCHLD handler to clean them up.
static void executeCommand(char *command, bool inBackground) {
pid_t pidOrZero = fork();
if (pidOrZero == 0) {
char *arguments[] = {"/bin/sh", "-c", command, NULL};
execvp(arguments[0], arguments);
exit(1);
}
// If we are the parent, either wait or return immediately
if (inBackground) {
printf("%d %s\n", pidOrZero, command);
} else {
waitpid(pidOrZero, NULL, 0);
}
}
static void reapProcesses(int signum) {
while (true) {
pid_t result = waitpid(-1, NULL, WNOHANG);
if (result <= 0) break;
}
}
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
...
}
Now we have a handler that cleans up terminated children.
Issue: this handler will be called to clean up all children, even foreground commands.
static void executeCommand(char *command, bool inBackground) {
pid_t pidOrZero = fork();
if (pidOrZero == 0) {
char *arguments[] = {"/bin/sh", "-c", command, NULL};
execvp(arguments[0], arguments);
exit(1);
}
// If we are the parent, either wait or return immediately
if (inBackground) {
printf("%d %s\n", pidOrZero, command);
} else {
waitpid(pidOrZero, NULL, 0);
}
}
static void reapProcesses(int signum) {
while (true) {
pid_t result = waitpid(-1, NULL, WNOHANG);
if (result <= 0) break;
}
}
int main(int argc, char *argv[]) {
signal(SIGCHLD, reapProcesses);
...
}
Now we have a handler that cleans up terminated children.
Issue: this handler will be called to clean up all children, even foreground commands.
Therefore, the waitpid on line 13 will always return -1. Can we get rid of it?
waitpid
call is redundant and replicates functionality better managed in the SIGCHLD
handler.waitpid
in one place: the SIGCHLD
handler. This will be critical when we implement shells (e.g. Assignment 4's stsh
) where multiple processes are running in the foreground as part of a pipeline (e.g. more words.txt | tee copy.txt | sort | uniq
)waitpid
from only one place.// The currently-running foreground command PID
static pid_t foregroundPID = 0;
static void waitForForegroundCommand(pid_t pid) {
foregroundPID = pid;
while (foregroundPID == pid) {;}
}
static void executeCommand(char *command, bool inBackground) {
// ...(omitted for brevity)...
if (inBackground) {
printf("%d %s\n", pidOrZero, command);
} else {
waitForForegroundCommand(pidOrZero);
}
}
static void reapProcesses(int signum) {
while (true) {
pid_t result = waitpid(-1, NULL, WNOHANG);
if (result <= 0) break;
if (result == foregroundPID) foregroundPID = 0;
}
}
reapProcesses
, we must make fgpid
a global.fgpid
is set to hold that process's pid. The shell then blocks by spinning in place until fgpid
is cleared by reapProcesses
.waitpid
code to reside in the handler and nowhere else.while (fgpid == pid) {;}
is also not good. This allows the shell to spin on the CPU even when it can't do any meaningful work.If our program can do no meaningful work until we receive a signal, we should tell the operating system so that it can take us off the CPU for now.
farm
and your Assignment 4 stsh
.
// simplesh-all-better.c
static void waitForForegroundProcess(pid_t pid) {
fgpid = pid;
sigset_t empty;
sigemptyset(&empty);
while (fgpid == pid) {
sigsuspend(&empty);
}
}
int sigsuspend(const sigset_t *mask);