CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Illustration courtesy of Roz Cyrus.
Creating processes and running other programs
Inter-process communication and Pipes
Signals
Race Conditions
assign3: implement multiprocessing programs like "trace" (to trace another program's behavior) and "farm" (parallelize tasks)
assign4: implement your own shell!
Illustration courtesy of Roz Cyrus.
Here's an example program showing how pipe works across processes (full program link at bottom).
static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
int fds[2];
pipe(fds);
size_t bytesSent = strlen(kPipeMessage) + 1;
pid_t pidOrZero = fork();
if (pidOrZero == 0) {
// In the child, we only read from the pipe
close(fds[1]);
char buffer[bytesSent];
read(fds[0], buffer, sizeof(buffer));
close(fds[0]);
printf("Message from parent: %s\n", buffer);
return 0;
}
// In the parent, we only write to the pipe (assume everything is written)
close(fds[0]);
write(fds[1], kPipeMessage, bytesSent);
close(fds[1]);
waitpid(pidOrZero, NULL, 0);
return 0;
}
Illustrations courtesy of Roz Cyrus.
continued...
Illustrations courtesy of Roz Cyrus.
continued...
Illustrations courtesy of Roz Cyrus.
continued...
Illustrations courtesy of Roz Cyrus.
continued...
Illustrations courtesy of Roz Cyrus.
continued...
Illustrations courtesy of Roz Cyrus.
continued...
Illustrations courtesy of Roz Cyrus.
continued...
This method of communication between processes relies on the fact that file descriptors are duplicated when forking.
This is the core idea behind how a shell can support piping between processes
(e.g. cat file.txt | uniq | sort).
Idea: what happens if we change FD 1 to point somewhere else?
0 | 1 | 2 | 3 |
---|
Terminal
File
0 | 1 | 2 |
---|
Terminal
int main() {
printf("This will print to the terminal\n");
close(STDOUT_FILENO);
// fd will always be 1
int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
printf("This will print to myfile.txt!\n");
close(fd);
return 0;
}
Idea: what happens if we change FD 1 to point somewhere else?
0 | 1 | 2 |
---|
Terminal
int main() {
printf("This will print to the terminal\n");
close(STDOUT_FILENO);
// fd will always be 1
int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
printf("This will print to myfile.txt!\n");
close(fd);
return 0;
}
Idea: what happens if we change FD 1 to point somewhere else?
0 | 1 | 2 |
---|
Terminal
myfile.txt
int main() {
printf("This will print to the terminal\n");
close(STDOUT_FILENO);
// fd will always be 1
int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
printf("This will print to myfile.txt!\n");
close(fd);
return 0;
}
Idea: what happens if we change FD 1 to point somewhere else?
0 | 1 | 2 |
---|
Terminal
myfile.txt
int main() {
printf("This will print to the terminal\n");
close(STDOUT_FILENO);
// fd will always be 1
int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
printf("This will print to myfile.txt!\n");
close(fd);
return 0;
}
Idea: what happens if we change FD 1 to point somewhere else?
0 | 1 | 2 |
---|
Terminal
Idea: what happens if we change a special FD to point somewhere else?
Could we do this with a pipe?
0 | 1 | 2 |
---|
pipe READ
Process 1
Process 2
pipe WRITE
Why would this be useful?
I/O redirection and pipes allow us to handle piping in our shell: e.g. cat file.txt | sort
0 | 1 | 2 |
---|
Terminal
0 | 1 | 2 |
---|
pipe READ
cat
sort
pipe WRITE
This allows the shell to link together two distinct executables without them knowing. (How?)
Stepping stone: our first goal is to write a program that spawns another program and sends data to its STDIN.
Terminal
0 | 1 | 2 |
---|
pipe READ
Our program
sort
pipe WRITE
0 | 1 | 2 | ...4 |
---|
The sort executable has no idea its input is not coming from terminal entry!
Our first goal is to write a program that spawns another program and sends data to its STDIN.
"Wait a minute...I thought execvp consumed the process? How do the file descriptors stick around?"
New insight: execvp consumes the process, but leaves the file descriptor table in tact!
One issue; how do we "connect" our pipe FDs to STDIN/STDOUT?
dup2 makes a copy of a file descriptor entry and puts it in another file descriptor index. If the second parameter is an already-open file descriptor, it is closed before being used.
int dup2(int oldfd, int newfd);
Example: we can use dup2 to copy the pipe read file descriptor into standard input!
dup2(fds[0], STDIN_FILENO);
dup2 makes a copy of a file descriptor entry and puts it in another file descriptor index. If the second parameter is an already-open file descriptor, it is closed before being used.
int dup2(int oldfd, int newfd);
Illustrations courtesy of Roz Cyrus.
To practice this piping technique, let's implement a custom function called subprocess.
subprocess_t subprocess(char *command);
subprocess is the same as mysystem, except it also sets up a pipe we can use to write to the child process's STDIN.
It returns a struct containing:
I/O redirection and pipes allow us to handle piping in our shell: e.g. cat file.txt | sort
0 | 1 | 2 |
---|
Terminal
0 | 1 | 2 |
---|
pipe READ
cat
sort
pipe WRITE
Final task: write a program that spawns two child processes and connects the first child's STDOUT to the second child's STDIN.
Our final goal is to write a program that spawns two other processes where one's output is the other's input. Both processes should run in parallel.
Let's implement a custom function called pipeline.
void pipeline(char *argv1[], char *argv2[], pid_t pids[]);
pipeline is similar to subprocess, except it also spawns a second child and directs its STDOUT to write to the pipe. Both children should run in parallel.
It doesn't return anything, but it writes the two children PIDs to the specified pids array
There were a lot of close() calls! Is there a way for any of them to be done automatically?
int pipe2(int fds[], int flags);
pipe2 is the same as pipe except it lets you customize the pipe with some optional flags.
void pipeline(char *argv1[], char *argv2[], pid_t pids[]) {
int fds[2];
pipe(fds);
pids[0] = fork();
if (pids[0] == 0) {
close(fds[0]);
dup2(fds[1], STDOUT_FILENO);
close(fds[1]);
execvp(argv1[0], argv1);
}
close(fds[1]);
pids[1] = fork();
if (pids[1] == 0) {
dup2(fds[0], STDIN_FILENO);
close(fds[0]);
execvp(argv2[0], argv2);
}
close(fds[0]);
}
The highlighted calls to close() would no longer be necessary if we use pipe2 with O_CLOEXEC because the surrounding process for each calls execvp.
Note that the parent must still close them because it doesn't call execvp.
void pipeline(char *argv1[], char *argv2[], pid_t pids[]) {
int fds[2];
pipe2(fds, O_CLOEXEC);
pids[0] = fork();
if (pids[0] == 0) {
dup2(fds[1], STDOUT_FILENO);
execvp(argv1[0], argv1);
}
close(fds[1]);
pids[1] = fork();
if (pids[1] == 0) {
dup2(fds[0], STDIN_FILENO);
execvp(argv2[0], argv2);
}
close(fds[0]);
}
This version of pipeline uses pipe2 with O_CLOEXEC.
Next time: signals (another form of interprocess communication)
The program below takes an arbitrary number of filenames as arguments and attempts to publish the date and time. The desired behavior is shown at right:
static void publish(const char *name) {
printf("Publishing date and time to file named \"%s\".\n", name);
int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(outfile, STDOUT_FILENO);
close(outfile);
if (fork() > 0) return;
char *argv[] = { "date", NULL };
execvp(argv[0], argv);
}
int main(int argc, char *argv[]) {
for (size_t i = 1; i < argc; i++) publish(argv[i]);
return 0;
}
myth62:~$ ./publish one two three four
Publishing date and time to file named "one".
Publishing date and time to file named "two".
Publishing date and time to file named "three".
Publishing date and time to file named "four".
However, the program is buggy!
What text is actually printed to standard output?
What do each of the four files contain?
How can we fix the issue?
Because the child processes (and only the child processes) should be redirecting, we should open, dup2, and close in child-specific code. A happy side effect of the change is that we never muck with STDOUT_FILENO in the parent if we confine the redirection code to the child. Solution:
static void publish(const char *name) {
printf("Publishing date and time to file named \"%s\".\n", name);
if (fork() > 0) return;
int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(outfile, STDOUT_FILENO);
close(outfile);
char *argv[] = { "date", NULL };
execvp(argv[0], argv);
}
Let's implement a custom function called captureProcess, like subprocess except instead of setting up a pipe to write to the child's STDIN, it's a pipe to read from its STDOUT.
subprocess_t captureProcess(char *command);
It returns a struct containing:
the PID of the child process
a file descriptor we can use to read from the child's STDOUT
Let's implement a custom function called captureProcess, like subprocess except instead of setting up a pipe to write to the child's STDIN, it's a pipe to read from its STDOUT.
subprocess_t captureProcess(char *command) {
int fds[2];
pipe(fds);
pid_t pidOrZero = fork();
if (pidOrZero == 0) {
// We are not reading from the pipe, only writing to it
close(fds[0]);
// Duplicate the write end of the pipe into STDOUT
dup2(fds[1], STDOUT_FILENO);
close(fds[1]);
char *arguments[] = {"/bin/sh", "-c", command, NULL};
execvp(arguments[0], arguments);
exitIf(true, kExecFailed, stderr, "execvp failed to invoke this: %s.\n", command);
}
close(fds[1]);
return (subprocess_t) { pidOrZero, fds[0] };
}