CS110 Lecture 9: Pipes and Interprocess Communication, Part 2

CS110: Principles of Computer Systems

Winter 2021-2022

Stanford University

Instructors: Nick Troccoli and Jerry Cain

PDF of this presentation

Illustration courtesy of Roz Cyrus.

CS110 Topic 2: How can our program create and interact with other programs?

Learning About Processes

Creating processes and running other programs

Inter-process communication and Pipes

Signals

Race Conditions

Lecture 6/7

This lecture

Lecture 10/11

Lecture 11

assign3: implement multiprocessing programs like "trace" (to trace another program's behavior) and "farm" (parallelize tasks)

assign4: implement your own shell!

Learning Goals

Get more practice creating and using pipes
Learn about dup2 to create and manipulate file descriptors
Use pipes to redirect process input and output

Lecture Plan

Review: pipes
Redirecting process I/O
Practice: Implementing subprocess
Practice: Implementing pipeline

Lecture Plan

Review: pipes
Redirecting process I/O
Practice: Implementing subprocess
Practice: Implementing pipeline

Pipes

A pipe is a set of two file descriptors representing a "virtual file" that can be written to and read from
It's not actually a physical file on disk - we are just using files as an abstraction
Any data you write to the write FD can be read from the read FD
Because file descriptors are duplicated on fork(), we can create pipes that are shared across processes!

Illustration courtesy of Roz Cyrus.

Key Idea: because the pipe file descriptors are duplicated in the child, we need to close the 2 pipe ends in both the parent and the child.

Here's an example program showing how pipe works across processes (full program link at bottom).

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

parent-child-pipe.c

Parent-Child Communication

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

This method of communication between processes relies on the fact that file descriptors are duplicated when forking.

each process has its own copy of both file descriptors for the pipe
both processes could read or write to the pipe if they wanted.
each process must therefore close both file descriptors for the pipe when finished

This is the core idea behind how a shell can support piping between processes
(e.g. cat file.txt | uniq | sort).

Pipes

Lecture Plan

Review: pipes
Redirecting process I/O
Practice: Implementing subprocess
Practice: Implementing pipeline

Redirecting Process I/O

Each process has the special file descriptors STDIN (0), STDOUT (1) and STDERR (2)
Processes assume these indexes are for these methods of communication (e.g. printf always outputs to file descriptor 1, STDOUT).

Idea: what happens if we change FD 1 to point somewhere else?

0	1	2	3

Terminal

File

Redirecting Process I/O

0	1	2

Terminal

int main() {
    printf("This will print to the terminal\n");
    close(STDOUT_FILENO);
    
    // fd will always be 1
    int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
	
    printf("This will print to myfile.txt!\n");
    close(fd);
    return 0;
}

Idea: what happens if we change FD 1 to point somewhere else?

Redirecting Process I/O

0	1	2

Terminal

int main() {
    printf("This will print to the terminal\n");
    close(STDOUT_FILENO);
    
    // fd will always be 1
    int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
	
    printf("This will print to myfile.txt!\n");
    close(fd);
    return 0;
}

Idea: what happens if we change FD 1 to point somewhere else?

Redirecting Process I/O

0	1	2

Terminal

myfile.txt

int main() {
    printf("This will print to the terminal\n");
    close(STDOUT_FILENO);
    
    // fd will always be 1
    int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
	
    printf("This will print to myfile.txt!\n");
    close(fd);
    return 0;
}

Idea: what happens if we change FD 1 to point somewhere else?

Redirecting Process I/O

0	1	2

Terminal

myfile.txt

int main() {
    printf("This will print to the terminal\n");
    close(STDOUT_FILENO);
    
    // fd will always be 1
    int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
	
    printf("This will print to myfile.txt!\n");
    close(fd);
    return 0;
}

Idea: what happens if we change FD 1 to point somewhere else?

Redirecting Process I/O

0	1	2

Terminal

Idea: what happens if we change a special FD to point somewhere else?

Could we do this with a pipe?

0	1	2

pipe READ

Process 1

Process 2

pipe WRITE

Why would this be useful?

Redirecting Process I/O

I/O redirection and pipes allow us to handle piping in our shell: e.g. cat file.txt | sort

0	1	2

Terminal

0	1	2

pipe READ

cat

sort

pipe WRITE

This allows the shell to link together two distinct executables without them knowing. (How?)

Redirecting Process I/O

Stepping stone: our first goal is to write a program that spawns another program and sends data to its STDIN.

Terminal

0	1	2

pipe READ

Our program

sort

pipe WRITE

0	1	2	...4

The sort executable has no idea its input is not coming from terminal entry!

Redirecting Process I/O

Our first goal is to write a program that spawns another program and sends data to its STDIN.

Our program creates a pipe
Our program spawns a child process
That child process changes its STDIN to be the pipe read end (how?)
That child process calls execvp to run the specified command
The parent writes to the write end of the pipe, which appears to the child as its STDIN

"Wait a minute...I thought execvp consumed the process? How do the file descriptors stick around?"

New insight: execvp consumes the process, but leaves the file descriptor table in tact!

One issue; how do we "connect" our pipe FDs to STDIN/STDOUT?

Redirecting Process I/O

dup2 makes a copy of a file descriptor entry and puts it in another file descriptor index. If the second parameter is an already-open file descriptor, it is closed before being used.

int dup2(int oldfd, int newfd);

Example: we can use dup2 to copy the pipe read file descriptor into standard input!

dup2(fds[0], STDIN_FILENO);

Redirecting Process I/O

dup2 makes a copy of a file descriptor entry and puts it in another file descriptor index. If the second parameter is an already-open file descriptor, it is closed before being used.

int dup2(int oldfd, int newfd);

Illustrations courtesy of Roz Cyrus.

Lecture Plan

Review: our first shell
Running in the background
Introducing Pipes
- What are pipes?
- Pipes between processes
- Redirecting process I/O
Practice: Implementing subprocess

subprocess

To practice this piping technique, let's implement a custom function called subprocess.

subprocess_t subprocess(char *command);

subprocess is the same as mysystem, except it also sets up a pipe we can use to write to the child process's STDIN.

It returns a struct containing:

the PID of the child process
a file descriptor we can use to write to the child's STDIN

Demo: subprocess

subprocess-soln.c

Lecture Plan

Review: pipes
Redirecting process I/O
Practice: Implementing subprocess
Practice: Implementing pipeline

Pipeline

I/O redirection and pipes allow us to handle piping in our shell: e.g. cat file.txt | sort

0	1	2

Terminal

0	1	2

pipe READ

cat

sort

pipe WRITE

Final task: write a program that spawns two child processes and connects the first child's STDOUT to the second child's STDIN.

Redirecting Process I/O

Our final goal is to write a program that spawns two other processes where one's output is the other's input. Both processes should run in parallel.

Our program creates a pipe
Our program spawns a child process
That child process changes its STDIN to be the pipe read end
That child process calls execvp to run the first specified command
Our program spawns another child process
That child process changes its STDOUT to be the pipe write end

pipeline

Let's implement a custom function called pipeline.

void pipeline(char *argv1[], char *argv2[], pid_t pids[]);

pipeline is similar to subprocess, except it also spawns a second child and directs its STDOUT to write to the pipe. Both children should run in parallel.

It doesn't return anything, but it writes the two children PIDs to the specified pids array

Demo: pipeline

pipeline-soln.c

pipe2

There were a lot of close() calls! Is there a way for any of them to be done automatically?

int pipe2(int fds[], int flags);

pipe2 is the same as pipe except it lets you customize the pipe with some optional flags.

if flags is 0, it's the same as pipe
if flags is O_CLOEXEC, the pipe FDs will be automatically closed when the surrounding process calls execvp.

pipeline

void pipeline(char *argv1[], char *argv2[], pid_t pids[]) {
  int fds[2];
  pipe(fds);

  pids[0] = fork();
  if (pids[0] == 0) {
    close(fds[0]);
    dup2(fds[1], STDOUT_FILENO);
    close(fds[1]);
    execvp(argv1[0], argv1);
  }

  close(fds[1]);

  pids[1] = fork();  
  if (pids[1] == 0) {
    dup2(fds[0], STDIN_FILENO);
    close(fds[0]);
    execvp(argv2[0], argv2);
  }

  close(fds[0]);
}

The highlighted calls to close() would no longer be necessary if we use pipe2 with O_CLOEXEC because the surrounding process for each calls execvp.

Note that the parent must still close them because it doesn't call execvp.

pipeline with pipe2

void pipeline(char *argv1[], char *argv2[], pid_t pids[]) {
  int fds[2];
  pipe2(fds, O_CLOEXEC);

  pids[0] = fork();
  if (pids[0] == 0) {
    dup2(fds[1], STDOUT_FILENO);
    execvp(argv1[0], argv1);
  }

  close(fds[1]);

  pids[1] = fork();
  if (pids[1] == 0) {
    dup2(fds[0], STDIN_FILENO);
    execvp(argv2[0], argv2);
  }

  close(fds[0]);
}

This version of pipeline uses pipe2 with O_CLOEXEC.

Pipes and I/O Redirection: Key Takeaways

Pipes are sets of file descriptors that allow us to communicate across processes.
Processes can share these file descriptors because they are copied on fork()
File descriptors 0,1 and 2 are special and assumed to represent STDIN, STDOUT and STDERR
If we change those file descriptors to point to other resources, we can redirect STDIN/STDOUT/STDERR to be something else without the program knowing!
Pipes are how terminal support for piping and redirection (command1 | command2 and command1 > file.txt) are implemented!

Lecture Recap

Review: pipes
Redirecting process I/O
Practice: Implementing subprocess
Practice: Implementing pipeline

Next time: signals (another form of interprocess communication)

Practice Problems

The program below takes an arbitrary number of filenames as arguments and attempts to publish the date and time. The desired behavior is shown at right:

static void publish(const char *name) {
    printf("Publishing date and time to file named \"%s\".\n", name);
    int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    dup2(outfile, STDOUT_FILENO);
    close(outfile);
    if (fork() > 0) return;
    char *argv[] = { "date", NULL };
    execvp(argv[0], argv);
}
 
int main(int argc, char *argv[]) {
    for (size_t i = 1; i < argc; i++) publish(argv[i]);
    return 0;
}

A Publishing Error

publish.c

myth62:~$ ./publish one two three four
Publishing date and time to file named "one".
Publishing date and time to file named "two".
Publishing date and time to file named "three".
Publishing date and time to file named "four".

However, the program is buggy!

What text is actually printed to standard output?

What do each of the four files contain?

How can we fix the issue?

Because the child processes (and only the child processes) should be redirecting, we should open, dup2, and close in child-specific code. A happy side effect of the change is that we never muck with STDOUT_FILENO in the parent if we confine the redirection code to the child. Solution:

static void publish(const char *name) {
    printf("Publishing date and time to file named \"%s\".\n", name); 
    if (fork() > 0) return;
    int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644); 
    dup2(outfile, STDOUT_FILENO);
    close(outfile);
    char *argv[] = { "date", NULL };
    execvp(argv[0], argv);
}

A Publishing Error

publish.c

captureProcess

Let's implement a custom function called captureProcess, like subprocess except instead of setting up a pipe to write to the child's STDIN, it's a pipe to read from its STDOUT.

subprocess_t captureProcess(char *command);

It returns a struct containing:

the PID of the child process

a file descriptor we can use to read from the child's STDOUT

captureProcess

Let's implement a custom function called captureProcess, like subprocess except instead of setting up a pipe to write to the child's STDIN, it's a pipe to read from its STDOUT.

subprocess_t captureProcess(char *command) {
    int fds[2];
    pipe(fds);
    
    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // We are not reading from the pipe, only writing to it
        close(fds[0]);

        // Duplicate the write end of the pipe into STDOUT
        dup2(fds[1], STDOUT_FILENO);
        close(fds[1]);

        char *arguments[] = {"/bin/sh", "-c", command, NULL};
        execvp(arguments[0], arguments);
        exitIf(true, kExecFailed, stderr, "execvp failed to invoke this: %s.\n", command);
    }

    close(fds[1]);
    return (subprocess_t) { pidOrZero, fds[0] };
}

captureProcess.c