CS110: Principles of Computer Systems

Autumn 2021
Jerry Cain
PDF
Lecture 07: Process Transformation
- System Call Introduced Last Time
-
execvpeffectively reboots a process to run a different program from scratch.-
pathis relative or absolute pathname of the executable to be invoked. -
argvis the argument vector that should be funneled through to the new executable'smainfunction. -
pathandargv[0]generally end up being the same exact string. - If
execvpfails to cannibalize the process and install a new executable image within it, it returns -1 to express failure. - If
execvpsucceeds, it 😱 never returns 😱.
-
-
execvphas many variants (execle,execlp, and so forth. Typemanexecvpto see all of them). We typically rely onexecvpin this course. - Our first example was included in last Friday's slide deck, and we'll be working through that first.
-
int execvp(const char *path, char *argv[]);Lecture 07: Process Transformation
- This
mysystemfunction is just the first example wherefork,execvp, andwaitpidall work together to do something genuinely useful.- The test harness we used to exercise
mysystemis operationally a miniature shell. - We need to continue implementing a few additional mini-shells to fully demonstrate how
fork,waitpid, andexecvpwork in practice. - All of this is paying it forward to your fourth assignment, where you'll implement your own shell—we call it
stsh, for Stanford shell—to imitate the functionality of the shell (c-shell akacsh, or bash-shell akabash, or z-shell akazsh, or tc-shell akatcsh, etc. are all different shell implementations) you've been using since you started using Unix.
- The test harness we used to exercise
Lecture 07: Process Transformation
- Let's work through the implementation of a more sophisticated shell: the
simplesh.- This is the best introductory example of
fork,waitpid, andexecvpthat I can think of: a miniature shell not unlike those you've been using since the first time you logged into amyth. -
simpleshoperates as a read-eval-print loop—often called a repl—which itself responds to the many things we type in, typically by forking off child processes.- Each child process is initially a deep clone of the
simpleshprocess. - Each child proceeds to replace its own image with the new one we specify, e.g.
ls,cp, find, make, or evenemacs. - As with traditional shells, a trailing ampersand—e.g. as with
emacs&—is an instruction to execute the new process in the background without forcing the shell to wait for it to finish. That means we can launch other programs from the foreground before that background process finishes.
- Each child process is initially a deep clone of the
- Our implementation of
simpleshis presented on the next slide. Where helper functions don't rely on CS110 concepts, I omit their implementations (but describe them in adequate detail in lecture).
- This is the best introductory example of
Lecture 07: Process Transformation
- Here's the core implementation of
simplesh(full implementation is right here):
int main(int argc, char *argv[]) {
while (true) {
char command[kMaxCommandLength + 1]; // room for \0 as well
readCommand(command, kMaxCommandLength);
char *arguments[kMaxArgumentCount + 1];
int count = parseCommandLine(command, arguments, kMaxArgumentCount);
if (count == 0) continue;
if (strcmp(arguments[0], "quit") ==) break; // hardcoded builtin to exit shell
bool isbg = strcmp(arguments[count - 1], "&") == 0;
if (isbg) arguments[--count] = NULL; // overwrite "&"
pid_t pid = fork();
if (pid == 0) execvp(arguments[0], arguments);
if (isbg) { // background process, don't wait for child to finish
printf("%d %s\n", pid, command);
} else { // otherwise block until child process is complete
waitpid(pid, NULL, 0);
}
}
printf("\n");
return 0;
}
Lecture 07: Process Transformation without fork!
-
xargs(typemanxargsfor the full read) is useful when one program is needed to programmatically generate the argument vector for a second.-
xargsreads tokens from standard input (delimited by spaces and newlines). -
xargsthen appends those tokens to the end of its original argument list and executes the full list of arguments—original plus those read from standard input—as if we typed them all in by hand. - To illustrate the basic idea, consider the
factorprogram, which prints out the prime factorizations of all of its numeric arguments, as with:
-
poohbear@myth62:~$ factor 720
720: 2 2 2 2 3 3 5
poohbear@myth62:~$ factor 9 16 2047 870037764750
9: 3 3
16: 2 2 2 2
2047: 23 89
870037764750: 2 3 3 5 5 5 7 7 7 7 11 11 11 11 11
poohbear@myth62:~$ printf "720" | ./xargs factor
720: 2 2 2 2 3 3 5
poohbear@myth62:~$ printf "2047 1000\n870037764750" | ./xargs factor 9 16
9: 3 3
16: 2 2 2 2
2047: 23 89
1000: 2 2 2 5 5 5
870037764750: 2 3 3 5 5 5 7 7 7 7 11 11 11 11 11
poohbear@myth62:~$Lecture 07: Process Transformation without fork!
- Note that the first process in the pipeline—the
printf—is a brute force representative of an executable capable of supplying or extending the argument vector of a second executable—in this case,factor—throughxargs.- Of course, the two executables needn't be
printforfactor; they can be anything that works. - If, for example, I'm interested in exposing how much code I wrote for my own
assign2solution , I might usexargsto do this:
- For simplicity, we'll assume a working pullAllTokens function, which exhaustively pulls all content from the provided istream, tokenizes around newlines and whitespace, and populates the referenced vector with all tokens, in sequence.
- Of course, the two executables needn't be
poohbear@myth62:~$ ls /usr/class/cs110/staff/master_repos/assign2/*.c | ./xargs wc
78 1792 90 /usr/class/cs110/staff/master_repos/assign2/chksumfile.c
35 1178 121 /usr/class/cs110/staff/master_repos/assign2/directory.c
266 8015 111 /usr/class/cs110/staff/master_repos/assign2/diskimageaccess.c
31 731 86 /usr/class/cs110/staff/master_repos/assign2/diskimg.c
35 1193 144 /usr/class/cs110/staff/master_repos/assign2/file.c
72 2751 134 /usr/class/cs110/staff/master_repos/assign2/inode.c
33 987 152 /usr/class/cs110/staff/master_repos/assign2/pathname.c
45 1287 91 /usr/class/cs110/staff/master_repos/assign2/unixfilesystem.c
595 17934 152 totalstatic void pullAllTokens(istream& in, vector<string>& tokens);Lecture 07: Process Transformation without fork!
- Here's our implementation of xargs.cc. Note that we're coding in C++, because the string processing is farcically easy compared compared to C.
- This is a rare example of a program that calls execvp without calling fork first.
- The real program to be executed is supplied via argv[1], and that's ultimately the executable we really want xargs to become.
- The code preceding execvp is little more than argument vector construction.
- This is a rare example of a program that calls execvp without calling fork first.
int main(int argc, char *argv[]) {
vector<string> tokens;
pullAllTokens(cin, tokens);
char *xargsv[argc + tokens.size()];
for (size_t i = 0; i < argc - 1; i++)
xargsv[i] = argv[i + 1];
for (size_t i = 0; i < tokens.size(); i++)
xargsv[argc - 1 + i] = (char *) tokens[i].c_str();
xargsv[argc + tokens.size() - 1] = NULL;
execvp(xargsv[0], xargsv);
cerr << xargsv[0] << ": command not found, so xargs can't do its job!" << endl;
return 0;
}Lecture 07: Interprocess Communication
int pipe(int fds[]);- Introducing the
pipesystem call.- The
pipesystem call takes an uninitialized array of two integers—we'll call itfds—and populates it with two file descriptors such that everything written tofds[1]can be read fromfds[0]. - Here's the prototype:
-
pipeis particularly useful for allowing parent processes to communicate with spawned child processes.- Recall that the file descriptor table of the parent is cloned across fork boundaries and preserved by execvp calls.
- That means open file table entries referenced by the parent's pipe endpoints are also referenced by the child's copies of them. Neat!
- The
Lecture 07: Interprocess Communication
- How does
pipework?- To illustrate how
pipeworks and how messages can be passed from one process to a second, let's consider the following program (available for play right here):
- To illustrate how
int main(int argc, char *argv[]) {
int fds[2];
pipe(fds);
pid_t pid = fork();
if (pid == 0) {
close(fds[1]);
char buffer[6];
read(fds[0], buffer, sizeof(buffer)); // assume one call is enough
printf("Read from pipe bridging processes: %s.\n", buffer);
close(fds[0]);
return 0;
}
close(fds[0]);
write(fds[1], "hello", 6);
close(fds[1]);
waitpid(pid, NULL, 0);
return 0;
}Lecture 07: Interprocess Communication
- How do
pipeandforkwork together in this example?- The base address of a small integer array called
fdsis shared with the call topipe. -
pipeallocates two descriptors, setting the first to read from a resource and the second to write to that same resource. Think of this resource as an unnamed file that only the OS and its support for pipe know about. -
pipethen plants copies of those two descriptors into indices 0 and 1 of the supplied array before it returns. - The
forkcall creates a child process, which itself inherits a shallow copy of the parent'sfdsarray.- The reference counts in each of the two open file entries is promoted from 1 to 2 to reflect the fact that two descriptors—one in the parent, and a second in the child—reference each of them.
- Immediately after the
forkcall, anything printed tofds[1]is readable from the parent'sfds[0]and the child'sfds[0]. - Similarly, both the parent and child are capable of publishing text to the same resource via their copies of
fds[1].
- The base address of a small integer array called
Lecture 07: Interprocess Communication
- How do
pipeandforkwork together in this example?- The parent closes
fds[0]before it writes to anything tofds[1]to emphasize the fact that the parent has no need to read anything from the pipe. - Similarly, the child closes
fds[1]before it reads fromfds[0]to emphasize the fact that it has zero interest in publishing anything to the pipe. It's imperative all write endpoints of the pipe be closed if not being used, else the read end will never know if more text is to come or not. - For simplicity, I assume the one call to
writein the parent presses all six bytes of"hello"('\0'included) in a single call. Similarly, I assume the one call toreadpulls in those same six bytes into its localbufferwith just the one call. - I make a concerted effort to donate all resources back to the system before I exit. That's why I include as many
closecalls as I do in both the child and the parent before allowing them to exit.
- The parent closes
Lecture 07: Understanding execvp
By Jerry Cain
Lecture 07: Understanding execvp
- 1,684