CS110: Principles of Computer Systems
Autumn 2021
Jerry Cain
PDF
Lecture 07: Process Transformation
- System Call Introduced Last Time
-
execvp
effectively reboots a process to run a different program from scratch.-
path
is relative or absolute pathname of the executable to be invoked. -
argv
is the argument vector that should be funneled through to the new executable'smain
function. -
path
andargv[0]
generally end up being the same exact string. - If
execvp
fails to cannibalize the process and install a new executable image within it, it returns -1 to express failure. - If
execvp
succeeds, it 😱 never returns 😱.
-
-
execvp
has many variants (execle
,execlp
, and so forth. Typeman
execvp
to see all of them). We typically rely onexecvp
in this course. - Our first example was included in last Friday's slide deck, and we'll be working through that first.
-
int execvp(const char *path, char *argv[]);
Lecture 07: Process Transformation
- This
mysystem
function is just the first example wherefork
,execvp
, andwaitpid
all work together to do something genuinely useful.- The test harness we used to exercise
mysystem
is operationally a miniature shell. - We need to continue implementing a few additional mini-shells to fully demonstrate how
fork
,waitpid
, andexecvp
work in practice. - All of this is paying it forward to your fourth assignment, where you'll implement your own shell—we call it
stsh
, for Stanford shell—to imitate the functionality of the shell (c-shell akacsh
, or bash-shell akabash
, or z-shell akazsh
, or tc-shell akatcsh
, etc. are all different shell implementations) you've been using since you started using Unix.
- The test harness we used to exercise
Lecture 07: Process Transformation
- Let's work through the implementation of a more sophisticated shell: the
simplesh
.- This is the best introductory example of
fork
,waitpid
, andexecvp
that I can think of: a miniature shell not unlike those you've been using since the first time you logged into amyth
. -
simplesh
operates as a read-eval-print loop—often called a repl—which itself responds to the many things we type in, typically by forking off child processes.- Each child process is initially a deep clone of the
simplesh
process. - Each child proceeds to replace its own image with the new one we specify, e.g.
ls
,cp
, find, make, or evenemacs
. - As with traditional shells, a trailing ampersand—e.g. as with
emacs
&
—is an instruction to execute the new process in the background without forcing the shell to wait for it to finish. That means we can launch other programs from the foreground before that background process finishes.
- Each child process is initially a deep clone of the
- Our implementation of
simplesh
is presented on the next slide. Where helper functions don't rely on CS110 concepts, I omit their implementations (but describe them in adequate detail in lecture).
- This is the best introductory example of
Lecture 07: Process Transformation
- Here's the core implementation of
simplesh
(full implementation is right here):
int main(int argc, char *argv[]) {
while (true) {
char command[kMaxCommandLength + 1]; // room for \0 as well
readCommand(command, kMaxCommandLength);
char *arguments[kMaxArgumentCount + 1];
int count = parseCommandLine(command, arguments, kMaxArgumentCount);
if (count == 0) continue;
if (strcmp(arguments[0], "quit") ==) break; // hardcoded builtin to exit shell
bool isbg = strcmp(arguments[count - 1], "&") == 0;
if (isbg) arguments[--count] = NULL; // overwrite "&"
pid_t pid = fork();
if (pid == 0) execvp(arguments[0], arguments);
if (isbg) { // background process, don't wait for child to finish
printf("%d %s\n", pid, command);
} else { // otherwise block until child process is complete
waitpid(pid, NULL, 0);
}
}
printf("\n");
return 0;
}
Lecture 07: Process Transformation without fork!
-
xargs
(typeman
xargs
for the full read) is useful when one program is needed to programmatically generate the argument vector for a second.-
xargs
reads tokens from standard input (delimited by spaces and newlines). -
xargs
then appends those tokens to the end of its original argument list and executes the full list of arguments—original plus those read from standard input—as if we typed them all in by hand. - To illustrate the basic idea, consider the
factor
program, which prints out the prime factorizations of all of its numeric arguments, as with:
-
poohbear@myth62:~$ factor 720
720: 2 2 2 2 3 3 5
poohbear@myth62:~$ factor 9 16 2047 870037764750
9: 3 3
16: 2 2 2 2
2047: 23 89
870037764750: 2 3 3 5 5 5 7 7 7 7 11 11 11 11 11
poohbear@myth62:~$ printf "720" | ./xargs factor
720: 2 2 2 2 3 3 5
poohbear@myth62:~$ printf "2047 1000\n870037764750" | ./xargs factor 9 16
9: 3 3
16: 2 2 2 2
2047: 23 89
1000: 2 2 2 5 5 5
870037764750: 2 3 3 5 5 5 7 7 7 7 11 11 11 11 11
poohbear@myth62:~$
Lecture 07: Process Transformation without fork!
- Note that the first process in the pipeline—the
printf
—is a brute force representative of an executable capable of supplying or extending the argument vector of a second executable—in this case,factor
—throughxargs
.- Of course, the two executables needn't be
printf
orfactor
; they can be anything that works. - If, for example, I'm interested in exposing how much code I wrote for my own
assign2
solution , I might usexargs
to do this:
- For simplicity, we'll assume a working pullAllTokens function, which exhaustively pulls all content from the provided istream, tokenizes around newlines and whitespace, and populates the referenced vector with all tokens, in sequence.
- Of course, the two executables needn't be
poohbear@myth62:~$ ls /usr/class/cs110/staff/master_repos/assign2/*.c | ./xargs wc
78 1792 90 /usr/class/cs110/staff/master_repos/assign2/chksumfile.c
35 1178 121 /usr/class/cs110/staff/master_repos/assign2/directory.c
266 8015 111 /usr/class/cs110/staff/master_repos/assign2/diskimageaccess.c
31 731 86 /usr/class/cs110/staff/master_repos/assign2/diskimg.c
35 1193 144 /usr/class/cs110/staff/master_repos/assign2/file.c
72 2751 134 /usr/class/cs110/staff/master_repos/assign2/inode.c
33 987 152 /usr/class/cs110/staff/master_repos/assign2/pathname.c
45 1287 91 /usr/class/cs110/staff/master_repos/assign2/unixfilesystem.c
595 17934 152 total
static void pullAllTokens(istream& in, vector<string>& tokens);
Lecture 07: Process Transformation without fork!
- Here's our implementation of xargs.cc. Note that we're coding in C++, because the string processing is farcically easy compared compared to C.
- This is a rare example of a program that calls execvp without calling fork first.
- The real program to be executed is supplied via argv[1], and that's ultimately the executable we really want xargs to become.
- The code preceding execvp is little more than argument vector construction.
- This is a rare example of a program that calls execvp without calling fork first.
int main(int argc, char *argv[]) {
vector<string> tokens;
pullAllTokens(cin, tokens);
char *xargsv[argc + tokens.size()];
for (size_t i = 0; i < argc - 1; i++)
xargsv[i] = argv[i + 1];
for (size_t i = 0; i < tokens.size(); i++)
xargsv[argc - 1 + i] = (char *) tokens[i].c_str();
xargsv[argc + tokens.size() - 1] = NULL;
execvp(xargsv[0], xargsv);
cerr << xargsv[0] << ": command not found, so xargs can't do its job!" << endl;
return 0;
}
Lecture 07: Interprocess Communication
int pipe(int fds[]);
- Introducing the
pipe
system call.- The
pipe
system call takes an uninitialized array of two integers—we'll call itfds
—and populates it with two file descriptors such that everything written tofds[1]
can be read fromfds[0]
. - Here's the prototype:
-
pipe
is particularly useful for allowing parent processes to communicate with spawned child processes.- Recall that the file descriptor table of the parent is cloned across fork boundaries and preserved by execvp calls.
- That means open file table entries referenced by the parent's pipe endpoints are also referenced by the child's copies of them. Neat!
- The
Lecture 07: Interprocess Communication
- How does
pipe
work?- To illustrate how
pipe
works and how messages can be passed from one process to a second, let's consider the following program (available for play right here):
- To illustrate how
int main(int argc, char *argv[]) {
int fds[2];
pipe(fds);
pid_t pid = fork();
if (pid == 0) {
close(fds[1]);
char buffer[6];
read(fds[0], buffer, sizeof(buffer)); // assume one call is enough
printf("Read from pipe bridging processes: %s.\n", buffer);
close(fds[0]);
return 0;
}
close(fds[0]);
write(fds[1], "hello", 6);
close(fds[1]);
waitpid(pid, NULL, 0);
return 0;
}
Lecture 07: Interprocess Communication
- How do
pipe
andfork
work together in this example?- The base address of a small integer array called
fds
is shared with the call topipe
. -
pipe
allocates two descriptors, setting the first to read from a resource and the second to write to that same resource. Think of this resource as an unnamed file that only the OS and its support for pipe know about. -
pipe
then plants copies of those two descriptors into indices 0 and 1 of the supplied array before it returns. - The
fork
call creates a child process, which itself inherits a shallow copy of the parent'sfds
array.- The reference counts in each of the two open file entries is promoted from 1 to 2 to reflect the fact that two descriptors—one in the parent, and a second in the child—reference each of them.
- Immediately after the
fork
call, anything printed tofds[1]
is readable from the parent'sfds[0]
and the child'sfds[0]
. - Similarly, both the parent and child are capable of publishing text to the same resource via their copies of
fds[1]
.
- The base address of a small integer array called
Lecture 07: Interprocess Communication
- How do
pipe
andfork
work together in this example?- The parent closes
fds[0]
before it writes to anything tofds[1]
to emphasize the fact that the parent has no need to read anything from the pipe. - Similarly, the child closes
fds[1]
before it reads fromfds[0]
to emphasize the fact that it has zero interest in publishing anything to the pipe. It's imperative all write endpoints of the pipe be closed if not being used, else the read end will never know if more text is to come or not. - For simplicity, I assume the one call to
write
in the parent presses all six bytes of"hello"
('\0'
included) in a single call. Similarly, I assume the one call toread
pulls in those same six bytes into its localbuffer
with just the one call. - I make a concerted effort to donate all resources back to the system before I exit. That's why I include as many
close
calls as I do in both the child and the parent before allowing them to exit.
- The parent closes
Lecture 07: Understanding execvp
By Jerry Cain
Lecture 07: Understanding execvp
- 1,446