CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Task: we want to count the frequency of words in a document.
Possible Approach: program that reads document and builds a word -> frequency map
How can we parallelize this?
Idea: split document into pieces, count words in each piece concurrently
Problem: what if a word appears in multiple pieces? We need to then merge the counts.
Idea: combine all the output, sort it, split into pieces, combine in each one concurrently
Idea: split document into pieces, count words in each piece concurrently. Then, combine all the text output, sort it, split into pieces, sum each one concurrently.
Example: "the very very quick fox greeted the brown fox"
the very very
quick fox greeted
the brown fox
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
Sorted
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
the very very
quick fox greeted
the brown fox
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
Sorted
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
2 "phases" where we parallelize work
The first phase focuses on finding, and the second phase focuses on summing. So the first phase should only output 1s, and leave the summing for later.
Example: "the very very quick fox greeted the brown fox"
the very very
quick fox greeted
the brown fox
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
...
the, 1
very, 1
very, 1
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
Sorted
2 "phases" where we parallelize work
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
Sorted
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
Question: is there a way to parallelize this operation as well?
Idea: have each map task separate its data in advance for each reduce task. Then each reduce task can combine and sort its own data.
Idea: have each map task separate its data in advance for each reduce task. Then each reduce task can combine and sort its own data.
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
bucket # = hash(key) % R where R = # reduce tasks (3)
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
Input
Files
Map
Phase
Intermediate
Files
Reduce
Phase
Output
Files
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
Task: we have webpages, and want to make a list of what webpages link to a given URL.
Possible Approach: program that reads webpages and builds a URL -> list(webpage) map
How can we parallelize this?
Idea: split webpages into groups, find URLs in each group concurrently
Problem: what if a URL appears in multiple groups? We need to then merge the lists.
Idea: use hashing to split the intermediate output by reduce task, and have each reduce task merge, sort and reduce concurrently.
Idea: split webpages into groups, find URLs in each group concurrently. Then, use hashing to split the intermediate output, and reduce each piece concurrently.
Example: 3 webpages (1 per group): a.com, b.com, c.com
a.com: Visit d.com for more! Also see e.com.
b.com: Visit a.com for more! Also see e.com.
c.com: Visit a.com for more! Also see d.com.
a.com, b.com
a.com, c.com
d.com, a.com
d.com, c.com
e.com, a.com
e.com, b.com
a.com, [b.com, c.com]
d.com, [a.com, c.com]
e.com, [a.com, b.com]
d.com, a.com
e.com, a.com
bucket 2
bucket 3
a.com, b.com
e.com, b.com
bucket 1
bucket 3
d.com, c.com
bucket 1
bucket 2
a.com, c.com
a.com: Visit d.com for more! Also see e.com.
b.com: Visit a.com for more! Also see e.com.
c.com: Visit a.com for more! Also see d.com.
a.com, b.com
a.com, c.com
d.com, a.com
d.com, c.com
e.com, a.com
e.com, b.com
a.com, [b.com, c.com]
d.com, [a.com, c.com]
e.com, [a.com, b.com]
d.com, a.com
e.com, a.com
bucket 2
bucket 3
a.com, b.com
e.com, b.com
bucket 1
bucket 3
d.com, c.com
bucket 1
bucket 2
a.com, c.com
2 "phases" where we parallelize work
a.com: Visit d.com for more! Also see e.com.
b.com: Visit a.com for more! Also see e.com.
c.com: Visit a.com for more! Also see d.com.
a.com, b.com
a.com, c.com
d.com, a.com
d.com, c.com
e.com, a.com
e.com, b.com
a.com, [b.com, c.com]
d.com, [a.com, c.com]
e.com, [a.com, b.com]
d.com, a.com
e.com, a.com
bucket 2
bucket 3
a.com, b.com
e.com, b.com
bucket 1
bucket 3
d.com, c.com
bucket 1
bucket 2
a.com, c.com
Input
Files
Map
Phase
Intermediate
Files
Reduce
Phase
Output
Files
Case Study: Counting Word Frequencies
Standard Approach: program that reads document and builds a word -> frequency map
Parallel Approach: split document into pieces, count words in each piece concurrently, partitioning output. Then, sort and reduce each chunk concurrently.
Case Study: Inverted Web Index
Standard Approach: program that reads webpages and builds a URL -> list(webpage) map
Parallel Approach: split webpages into groups, find URLs in each group concurrently, partitioning output. Then, sort and reduce each chunk concurrently.
Word frequencies: split document into pieces, count words in each piece concurrently, partitioning output. Then, sort and reduce each chunk concurrently.
Inverted web index: split webpages into groups, find URLs in each group concurrently, partitioning output. Then, sort and reduce each chunk concurrently.
We expressed these problems in this two step structure:
Not all problems can be expressed in this structure. But if we can express it in this structure, we can parallelize it!
Published by Google in 2004. Read it [here].
Programmer must implement map and reduce steps:
map(k1, v1) -> list(k2, v2)
reduce(k2, list(v2)) -> list(v2)
Here's pseudocode for the word counting example:
map(String key, String value):
// key: document name
// value: document contents
for word w in value:
EmitIntermediate(w,"1")
reduce(String key, List values):
// key: a word
// values: a list of counts
int result = 0
for v in values:
result += ParseInt(v)
Emit(AsString(result))
map(k1, v1) -> list(k2, v2)
reduce(k2, list(v2)) -> list(v2)
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
map(String key, String value):
// key: document name
// value: document contents
for word w in value:
EmitIntermediate(w,"1")
reduce(String key, List values):
// key: a word
// values: a list of counts
int result = 0
for v in values:
result += ParseInt(v)
Emit(AsString(result))
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
2. MapReduce partitions input data into M (=3) pieces, starts program running on cluster machines - one will be leader, rest will be workers
map(String key, String value):
// key: document name
// value: document contents
for word w in value:
EmitIntermediate(w,"1")
reduce(String key, List values):
// key: a word
// values: a list of counts
int result = 0
for v in values:
result += ParseInt(v)
Emit(AsString(result))
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
3. Leader assigns tasks (map or reduce) to idle workers until job is done.
map(String key, String value):
// key: document name
// value: document contents
for word w in value:
EmitIntermediate(w,"1")
reduce(String key, List values):
// key: a word
// values: a list of counts
int result = 0
for v in values:
result += ParseInt(v)
Emit(AsString(result))
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
Map task - worker reads slice of input data, calls map(), output is partitioned into R (=3) partitions on disk with hashing % R. The leader is given the location of these partitions.
map(String key, String value):
// key: document name
// value: document contents
for word w in value:
EmitIntermediate(w,"1")
reduce(String key, List values):
// key: a word
// values: a list of counts
int result = 0
for v in values:
result += ParseInt(v)
Emit(AsString(result))
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
Reduce task - reducer is told by leader where its relevant partitions are, it reads them / sorts them by intermediate key. For each intermediate key and set of intermediate values, calls reduce(), output is appended to output file.
map(String key, String value):
// key: document name
// value: document contents
for word w in value:
EmitIntermediate(w,"1")
reduce(String key, List values):
// key: a word
// values: a list of counts
int result = 0
for v in values:
result += ParseInt(v)
Emit(AsString(result))
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
bucket 2
bucket 3
fox, 1
greeted, 1
quick, 1
bucket 1
bucket 2
brown, 1
fox, 1
the, 1
bucket 1
bucket 2
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
Leader assigns tasks (map or reduce) to idle workers until job is done.
map(String key, String value):
// key: document name
// value: document contents
for word w in value:
EmitIntermediate(w,"1")
reduce(String key, List values):
// key: a word
// values: a list of counts
int result = 0
for v in values:
result += ParseInt(v)
Emit(AsString(result))
MapReduce was one framework invented to parallelize certain kinds of problems. There are many other ways to process large datasets, and MapReduce must make tradeoffs.
"Execution Templates: Caching Control Plane Decisions for
Strong Scaling of Data Analytics"
Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
In Proceedings of 2017 USENIX Annual Technical Conference (USENIX ATC '17)
Idea: what about caching?
./mr_soln --mapper ./mrm_soln --reducer ./mrr_soln --config odyssey-full.cfg
Illustration courtesy of Ecy King, CS110 Champion, Spring 2021
Key Question: How can we design filesystems to store and manipulate files on disk, and how can we interact with the filesystem in our programs?
Unix Filesystem Inode Design [source]
Key Question: How can our program create and interact with other programs?
Chrome Site Isolation [source]
Key Question: How can we have concurrency within a single process?
Nginx and Thread Pools [source]
Key Question: How can we write programs that communicate over a network with other programs?
Key Question: How can we parallelize data processing across many machines?
Dataflow for Spotify Wrapped [source]
Key Idea: separating behavior from implementation (use vs. implementation)
Key Idea: organization of modules that interact hierarchically, layering on top of each other
Key Idea: names provide a way to refer to system resources, and name resolution converts between human-readable names and machine-friendly names
Key Idea: remember recently-generated results so that future requests for the same data can be faster.
Key Idea: abstraction mechanism used to make one resource look like many, or many resources look like one.
Key Idea: running multiple tasks in parallel and coordinating between them.
multiprocessing
multithreading
single-core vs. multi-core CPUs
How can languages better support concurrent programming?
Key Idea: organizing functionality into modules with well-defined communication protocols and responsibilities.
networking client/server
function call and return
system call execution
CS110 enables you to take some awesome next classes if you want to explore the topics further. What are some options?
We hope you can take the time to fill out the end-quarter CS 110 course evaluation. We sincerely appreciate any feedback you have about the course and read every piece of feedback we receive. We are always looking for ways to improve!
What questions do you have about CS110, life after CS110, or anything else?