CS110 Lecture 8: Pipes and Interprocess Communication, Part 1

CS110: Principles of Computer Systems

Winter 2021-2022

Stanford University

Instructors: Nick Troccoli and Jerry Cain

The Stanford University logo

Illustration courtesy of Roz Cyrus.

CS110 Topic 2: How can our program create and interact with other programs?

Learning About Processes

Creating processes and running other programs

Inter-process communication and Pipes

Signals

Race Conditions

Lecture 6/7

This/next lecture

Lecture 10/11

Lecture 11

assign3: implement multiprocessing programs like "trace" (to trace another program's behavior) and "farm" (parallelize tasks)

assign4: implement your own shell!

Learning Goals

  • Get more practice with using fork() and execvp
  • Learn about pipe and dup2 to create and manipulate file descriptors
  • Use pipes to redirect process input and output

Lecture Plan

  • Review: our first shell
  • Running in the background
  • Introducing Pipes
    • ​​What are pipes?
    • Pipes between processes

Lecture Plan

  • Review: our first shell
  • Running in the background
  • Introducing Pipes
    • ​​What are pipes?
    • Pipes between processes

fork()

  • A system call that creates a new child process
  • The "parent" is the process that creates the other "child" process
  • From then on, both processes are running the code after the fork
  • The child process is identical to the parent, except:
    • it has a new Process ID (PID)
    • for the parent, fork() returns the PID of the child; for the child, fork() returns 0
    • fork() is called once, but returns twice
pid_t pidOrZero = fork();
// both parent and child run code here onwards
printf("This is printed by two processes.\n");

waitpid()

A function that a parent can call to wait for its child to exit:

pid_t waitpid(pid_t pid, int *status, int options);
  • pid: the PID of the child to wait on, or -1 to wait on any of our children
  • status: where to put info about the child's termination (or NULL)
  • options: optional flags to customize behavior (always 0 for now)

 

The function returns when the specified child process exits.

  • the return value is the PID of the child that exited, or -1 on error (e.g. no child to wait on)
  • If the child process has already exited, this returns immediately - otherwise, it blocks
  • It's important to wait on all children to clean up system resources

execvp is a function that lets us run another program in the current process.

int execvp(const char *path, char *argv[]);

execvp()

It runs the executable at the specified path, completely cannibalizing the current process.

  • If successful, execvp never returns in the calling process
  • If unsuccessful, execvp returns -1

To run another executable, we must specify the (NULL-terminated) arguments to be passed into its main function, via the argv parameter.

  • For our programs, path and argv[0] will be the same

execvp has many variants (execle, execlp, and so forth. Type man execvp for more). We rely on execvp in CS110.

Revisiting mysystem

mysystem is our own version of the built-in function system.

  • It takes in a terminal command (e.g. "ls -l /usr/class/cs110"), executes it in a separate process, and returns when that process is finished.
    • We can use fork to create the child process
    • We can use execvp in that child process to execute the terminal command
    • We can use waitpid in the parent process to wait for the child to terminate

Revisiting first-shell

int main(int argc, char *argv[]) {
    char command[kMaxLineLength];
    while (true) {
        printf("> ");
        fgets(command, sizeof(command), stdin);
    
        // If the user entered Ctl-d, stop
        if (feof(stdin)) {
            break;
        }
    
        // Remove the \n that fgets puts at the end
        command[strlen(command) - 1] = '\0';

        int commandReturnCode = mysystem(command);
        printf("return code = %d\n", commandReturnCode);
    }
  
    printf("\n");
    return 0;
}

Our first-shell program is a loop in main that parses the user input and passes it to mysystem.

static int mysystem(char *command) {
    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        char *arguments[] = {"/bin/sh", "-c", command, NULL};
        execvp(arguments[0], arguments);
        // If the child gets here, there was an error
        exitIf(true, kExecFailed, stderr, "execvp failed to invoke this: %s.\n", command);
    }

    // If we are the parent, wait for the child
    int status;
    waitpid(pidOrZero, &status, 0);
    return WIFEXITED(status) ? WEXITSTATUS(status) : -WTERMSIG(status);
}

Revisiting first-shell

first-shell Takeaways

  • A shell is a program that repeats: read command from the user, execute that command
  • In order to execute a program and continue running the shell afterwards, we fork off another process and run the program in that process
  • We rely on fork, execvp, and waitpid to do this!
  • Real shells have more advanced functionality that we will add going forward.
  • For your fourth assignment, you'll build on this with your own shell, stsh ("Stanford shell") with much of the functionality of real Unix shells.

Lecture Plan

  • Review: our first shell
  • Running in the background
  • Introducing Pipes
    • ​​What are pipes?
    • Pipes between processes​

Supporting Background Execution

Shells usually also let you run a command in the background by adding "&" at the end:

  • e.g. sort myfile.txt &  - create a sort process and run it in the background
  • only difference is specifying & with command
  • shell immediately re-prompts the user
  • process doesn't know "foreground" vs. "background"; the "&" just specifies whether or not the shell waits

Supporting Background Execution

Let's make an updated version of mysystem called executeCommand.

  • Takes an additional parameter bool inBackground
    • If false, same behavior as mysystem (spawn child, execvp, wait for child)
    • If true, spawn child, execvp, but don't wait for child

Supporting Background Execution

static void executeCommand(char *command, bool inBackground) {
    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // If we are the child, execute the shell command
        char *arguments[] = {"/bin/sh", "-c", command, NULL};
        execvp(arguments[0], arguments);
        // If the child gets here, there was an error
        exitIf(true, kExecFailed, stderr, "execvp failed to invoke this: %s.\n", command);
    }

    // If we are the parent, either wait or return immediately
    if (inBackground) {
        printf("%d %s\n", pidOrZero, command);
    } else {
        waitpid(pidOrZero, NULL, 0);
    }
}

Supporting Background Execution

static void executeCommand(char *command, bool inBackground) {
    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // If we are the child, execute the shell command
        char *arguments[] = {"/bin/sh", "-c", command, NULL};
        execvp(arguments[0], arguments);
        // If the child gets here, there was an error
        exitIf(true, kExecFailed, stderr, "execvp failed to invoke this: %s.\n", command);
    }

    // If we are the parent, either wait or return immediately
    if (inBackground) {
        printf("%d %s\n", pidOrZero, command);
    } else {
        waitpid(pidOrZero, NULL, 0);
    }
}

Line 1: Now, the caller can optionally run the command in the background.

Supporting Background Execution

static void executeCommand(char *command, bool inBackground) {
    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // If we are the child, execute the shell command
        char *arguments[] = {"/bin/sh", "-c", command, NULL};
        execvp(arguments[0], arguments);
        // If the child gets here, there was an error
        exitIf(true, kExecFailed, stderr, "execvp failed to invoke this: %s.\n", command);
    }

    // If we are the parent, either wait or return immediately
    if (inBackground) {
        printf("%d %s\n", pidOrZero, command);
    } else {
        waitpid(pidOrZero, NULL, 0);
    }
}

Lines 11-16: The parent waits on a foreground child, but not a background child.

Supporting Background Execution

int main(int argc, char *argv[]) {
    char command[kMaxLineLength];
    while (true) {
        printf("> ");
        fgets(command, sizeof(command), stdin);
    
        // If the user entered Ctl-d, stop
        if (feof(stdin)) {
            break;
        }
    
        // Remove the \n that fgets puts at the end
        command[strlen(command) - 1] = '\0';

        if (strcmp(command, "quit") == 0) break;

        bool isbg = command[strlen(command) - 1] == '&';
        if (isbg) {
            command[strlen(command) - 1] = '\0';
        }

        executeCommand(command, isbg);
    }
  
    printf("\n");
    return 0;
}

In main, we add two additional things:

  • Check for the "quit" command to exit
  • Allow the user to add "&" at the end of a command to run that command in the background

Note that a background child isn't reaped! This is a problem - one we'll learn how to fix soon.

Lecture Plan

  • Review: our first shell
  • Running in the background
  • Introducing Pipes
    • ​​What are pipes?
    • Pipes between processes

Is there a way that the parent and child processes can communicate?

Interprocess Communication

  • It's useful for a parent process to communicate with its child (and vice versa)
  • There are two key ways we will learn to do this: pipes and signals
    • Pipes let two processes send and receive arbitrary data
    • Signals let two processes send and receive certain "signals" that indicate something special has happened. 

Interprocess Communication

  • It's useful for a parent process to communicate with its child (and vice versa)
  • There are two key ways we will learn to do this: pipes and signals
    • Pipes let two processes send and receive arbitrary data
    • Signals let two processes send and receive certain "signals" that indicate something special has happened. 

Pipes

  • How can we let two processes send arbitrary data back and forth?
  • A core Unix principle is modeling things as files.  Could we use a "file"?
  • Idea: a file that one process could write, and another process could read?
  • Problem: we don't want to clutter the filesystem with actual files every time two processes want to communicate.
  • Solution: have the operating system set this up for us.  
    • It will give us two new file descriptors - one for writing, another for reading.
    • If someone writes data to the write FD, it can be read from the read FD.
    • It's not actually a physical file on disk - we are just using files as an abstraction

The pipe system call populates the 2-element array fds with two file descriptors such that everything written to fds[1]can be read from fds[0].  Returns 0 on success, or -1 on error.

int pipe(int fds[]);

pipe()

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    int result = pipe(fds);

    // Write message to pipe (assuming here all bytes written immediately)
    write(fds[1], kPipeMessage, strlen(kPipeMessage) + 1);
    close(fds[1]);

    // Read message from pipe
    char receivedMessage[strlen(kPipeMessage) + 1];
    read(fds[0], receivedMessage, sizeof(receivedMessage));
    close(fds[0]);
    printf("Message read: %s\n", receivedMessage);
  
    return 0;
}

Tip: you learn to read before you learn to write (read = fds[0], write = fds[1]).

$ ./pipe-demo
Message read: Hello, this message is coming through a pipe.

Tip: you learn to read before you learn to write (read = fds[0], write = fds[1]).

Lecture Plan

  • Review: fork() and execvp()
  • Running in the background
  • Introducing Pipes
    • What are pipes?
    • Pipes between processes

pipe()

pipe can allow processes to communicate!

  • The parent's file descriptor table is replicated in the child - both have pipe access (increasing reference counts in open file table)
  • E.g. the parent can write to the "write" end and the child can read from the "read" end
  • Because they're file descriptors, there's no global name for the pipe (another process can't "connect" to the pipe).
  • Each pipe is uni-directional (one end is read, the other write)

Illustration courtesy of Roz Cyrus.

Let's write a program where the parent writes something to the pipe, and the child reads that from the pipe.

Illustration courtesy of Roz Cyrus.

Key Idea: because the pipe file descriptors are duplicated in the child, we need to close the 2 pipe ends in both the parent and the child.

Here's an example program showing how pipe works across processes (full program link at bottom).

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Parent-Child Communication

Make a pipe just like before.

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Parent-Child Communication

The parent must close all its open FDs.  It never uses the Read FD so we can close it here.

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Parent-Child Communication

Write to the Write FD to send a message to the child.

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Parent-Child Communication

We are now done with the Write FD so we can close it here.

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Parent-Child Communication

We wait for the child to terminate.

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Parent-Child Communication

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Key Idea: when we call fork, the child gets a copy of the parent's file descriptor table.  Any open FDs in the parent at the time fork is called must be closed in both the parent and the child.

Parent-Child Communication

This duplication means the child's file descriptor table entries point to the same open file table entries as the parent.  Thus, the open file table entries for the two pipe FDs both have reference counts of 2.

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

The child must close all its open FDs.  It never uses the Write FD so we can close it here.

Parent-Child Communication

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Read from the Read FD to read the message from the parent.

Parent-Child Communication

Key Idea: read() blocks until the bytes are available or there is no more to read (e.g. end of file or pipe write end closed).  If the parent hasn't written yet, the child's call to read() will wait.

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

We are now done with the Read FD so we can close it here.  Also print the received message.

Parent-Child Communication

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Key Idea: the child gets a copy of the parent's file descriptor table.  Any open FDs in the parent at the time fork is called must be closed in both the parent and the child.

Here, right before the fork call, the parent has 2 open file descriptors (besides 0-2): the pipe Read FD and Write FD.

Parent-Child Communication

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Key Idea: the child gets a copy of the parent's file descriptor table.  Any open FDs in the parent at the time fork is called must be closed in both the parent and the child.

Therefore, when the child is spawned, it also has the same 2 open file descriptors (besides 0-2): the pipe Read FD and Write FD.

Parent-Child Communication

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Key Idea: the child gets a copy of the parent's file descriptor table.  Any open FDs in the parent at the time fork is called must be closed in both the parent and the child.

We should close FDs when we are done with them.  The parent closes them here.

Parent-Child Communication

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";
int main(int argc, char *argv[]) {
    int fds[2];
    pipe(fds);
    size_t bytesSent = strlen(kPipeMessage) + 1;

    pid_t pidOrZero = fork();
    if (pidOrZero == 0) {
        // In the child, we only read from the pipe
        close(fds[1]);
        char buffer[bytesSent];
        read(fds[0], buffer, sizeof(buffer));
        close(fds[0]);
        printf("Message from parent: %s\n", buffer);
        return 0;
    }

    // In the parent, we only write to the pipe (assume everything is written)
    close(fds[0]);
    write(fds[1], kPipeMessage, bytesSent);
    close(fds[1]);
    waitpid(pidOrZero, NULL, 0);
    return 0;
}

Key Idea: the child gets a copy of the parent's file descriptor table.  Any open FDs in the parent at the time fork is called must be closed in both the parent and the child.

We should close FDs when we are done with them.  The child closes them here.

Parent-Child Communication

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

Illustrations courtesy of Roz Cyrus.

continued...

DEMO: Parent-Child Communication

This method of communication between processes relies on the fact that file descriptors are duplicated when forking.

  • each process has its own copy of both file descriptors for the pipe
  • both processes could read or write to the pipe if they wanted.
  • each process must therefore close both file descriptors for the pipe when finished

 

This is the core idea behind how a shell can support piping between processes
(e.g. cat file.txt | uniq | sort).  Let's see how this works in a shell.

 

Pipes

Lecture Recap

  • Review: our first shell
  • Running in the background
  • Introducing Pipes
    • ​​What are pipes?
    • Pipes between processes

 

Next time: more practice with pipes