CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
💤📢
🔐
PERMIT
Introduction to Threads
Mutexes and Condition Variables
Semaphores
Multithreading Patterns
assign5: implement your own multithreaded news aggregator to quickly fetch news from the web!
Binary lock (mutex) - e.g. dining philosophers' forks
Generalized wait (condition variable) - e.g. waiting for complex condition
Permits (semaphore) - e.g. dining philosophers permits for eating
Binary coordination (semaphore) - e.g. writer telling reader there is new content
Generalized coordination (semaphore) - e.g. thread waits for N others to finish a task
Layered Construction (combo) - combine multiple patterns
Let's implement a program that requires thread coordination with semaphores. First, we'll look at a version without semaphores to see why they are necessary.
The reader-writer pattern/program spawns 2 threads: one writer (publishes content to a shared buffer) and one reader (reads from shared buffer when content is available)
Common pattern! E.g. web server publishes content over a dedicated communication channel, and the web browser consumes that content.
Optionally consider a more complex version: multiple readers, similar to how a web server handles many incoming requests (puts request in buffer, readers each read and process requests)
int main(int argc, const char *argv[]) {
// Create an empty buffer
char buffer[kNumBufferSlots];
memset(buffer, ' ', sizeof(buffer));
thread writer(writeToBuffer, buffer, sizeof(buffer), kNumIterations);
thread reader(readFromBuffer, buffer, sizeof(buffer), kNumIterations);
writer.join();
reader.join();
return 0;
}
Both threads share the same buffer, so they agree where content is stored (think of buffer like state for a pipe or a connection between client and server)
static void readFromBuffer(char buffer[], size_t bufferSize, size_t iterations) {
cout << oslock << "Reader: ready to read." << endl << osunlock;
for (size_t i = 0; i < iterations * bufferSize; i++) {
// Read and process the data
char ch = buffer[i % bufferSize];
processData(ch); // sleep to simulate work
buffer[i % bufferSize] = ' ';
cout << oslock << "Reader: consumed data packet "
<< "with character '" << ch << "'.\t\t" << osunlock;
printBuffer(buffer, bufferSize);
}
}
The reader consumes the content as it's written. Each thread cycles through the buffer the same number of times, and they both agree that i % 8 identifies the next slot of interest.
static void writeToBuffer(char buffer[], size_t bufferSize, size_t iterations) {
cout << oslock << "Writer: ready to write." << endl << osunlock;
for (size_t i = 0; i < iterations * bufferSize; i++) {
char ch = prepareData();
buffer[i % bufferSize] = ch;
cout << oslock << "Writer: published data packet with character '"
<< ch << "'.\t\t" << osunlock;
printBuffer(buffer, bufferSize);
}
}
The writer publishes content to the circular buffer. Each thread cycles through the buffer the same number of times, and they both agree that i % 8 identifies the next slot of interest.
Problem: each thread runs independently, without knowing how much progress the other has made.
Goal: we must encode constraints into our program.
What constraint(s) should we add to our program?
A reader should not read until something is available to read
A writer should not write until there is space available to write
How can we model these constraint(s)?
One semaphore to manage empty slots
One semaphore to manage full slots
What might this look like in code?
Could we do this with one semaphore instead of 2?
What might this look like in code?
Could we do this with one semaphore instead of 2? Unfortunately, no.
What might this look like in code?
Could we do this with one semaphore instead of 2? Unfortunately, no.
static void readFromBuffer(char buffer[], size_t bufferSize, size_t iterations, semaphore& fullBufferSlots,
semaphore& emptyBufferSlots) {
cout << oslock << "Reader: ready to read." << endl << osunlock;
for (size_t i = 0; i < iterations * bufferSize; i++) {
fullBufferSlots.wait();
char ch = buffer[i % bufferSize];
processData(ch); // sleep to simulate work
buffer[i % bufferSize] = ' ';
emptyBufferSlots.signal();
cout << oslock << "Reader: consumed data packet " << "with character '" << ch << "'.\t\t" << osunlock;
printBuffer(buffer, bufferSize);
}
}
The reader consumes the content as it's written. Before reading, it waits for a slot to be full. After reading, it indicates that a new slot is empty. It is "tracing the steps" of the writer because it cycles through the same indexes that the writer does.
static void writeToBuffer(char buffer[], size_t bufferSize, size_t iterations, semaphore& fullBufferSlots,
semaphore& emptyBufferSlots) {
cout << oslock << "Writer: ready to write." << endl << osunlock;
for (size_t i = 0; i < iterations * bufferSize; i++) {
char ch = prepareData();
emptyBufferSlots.wait();
buffer[i % bufferSize] = ch;
fullBufferSlots.signal();
cout << oslock << "Writer: published data packet with character '" << ch << "'.\t\t" << osunlock;
printBuffer(buffer, bufferSize);
}
}
The writer publishes content to the circular buffer. Before writing, it waits for a slot to be empty. After writing, it indicates that a new slot is full. It is "leading the way" for the reader because it cycles through the same indexes always ahead of the reader.
Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.
representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.
myth51 has this many CS110-student processes: 59
myth52 has this many CS110-student processes: 135
myth53 has this many CS110-student processes: 112
myth54 has this many CS110-student processes: 89
myth55 has this many CS110-student processes: 107
myth56 has this many CS110-student processes: 58
myth57 has this many CS110-student processes: 70
myth58 has this many CS110-student processes: 93
myth59 has this many CS110-student processes: 107
myth60 has this many CS110-student processes: 145
myth61 has this many CS110-student processes: 105
myth62 has this many CS110-student processes: 126
myth63 has this many CS110-student processes: 314
myth64 has this many CS110-student processes: 119
myth65 has this many CS110-student processes: 156
myth66 has this many CS110-student processes: 144
Machine least loaded by CS110 students: myth56
Number of CS110 processes on least loaded machine: 58
Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.
representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.
int getNumProcesses(int mythNum, const std::unordered_set<std::string>& sunetIDs);
We'll use the following pre-implemented function that does some networking to fetch process counts. This connects to the specified myth machine, and blocks until done.
Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.
representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.
int main(int argc, char *argv[]) {
// Create a set of student SUNETs
unordered_set<string> cs110SUNETs;
readStudentSUNETsFile(cs110SUNETs, kCS110StudentIDsFile);
// Create a map from myth number -> CS110 process count and print its info
map<int, int> processCountMap;
createCS110ProcessCountMap(cs110SUNETs, processCountMap);
printMythMachineWithFewestProcesses(processCountMap);
return 0;
}
We'll implement createCS110ProcessCountMap sequentially and concurrently.
static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs,
map<int, int>& processCountMap) {
for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
int numProcesses = getNumProcesses(mythNum, sunetIDs);
// If successful, add to the map and print out
if (numProcesses >= 0) {
processCountMap[mythNum] = numProcesses;
cout << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl;
}
}
}
This implementation fetches the count for each myth machine one after the other. This means we have to wait for 16 sequential connections to be started and completed.
Depending on a program's tasks, it may see different benefits from multithreading.
Thought experiment: each of us is a single-core CPU! (yes, it's true)
So how is it possible for us to multitask if we can only do one thing at a time?
Key Idea: not everything needs our constant, undivided attention:
These tasks are primarily time where we can do other things, and come back to them when they do need our attention. Even without our attention, they make progress. We can thus alternate between them to "multitask"!
These are I/O-bound tasks: the time to complete them is dictated by how long it takes for some external mechanism to complete its work (laundry machine, shipping, oven).
Thought experiment: each of us is a single-core CPU! (yes, it's true)
So how is it possible for us to multitask if we can only do one thing at a time?
Key Idea: some things do need our constant, undivided attention:
These tasks are primarily time where we must devote our attention in order to make progress. We'll probably see limited gains by alternating between them, and may even see "context switch" penalties.
These are CPU-bound tasks: the time to complete them is dictated by how long it takes us to do the CPU computation (solve homework, read chapters).
CPU-bound tasks: the time to complete them is dictated by how long it takes us to do the CPU computation.
I/O-bound tasks: the time to complete them is dictated by how long it takes for some external mechanism to complete its work.
Even a single-core CPU can see performance improvements by parallelizing I/O-bound tasks. But parallelizing CPU-bound tasks will likely show minimal gains unless we have a multi-core CPU.
For mythbuster, the primary task is fetching the number of running CS110 processes over the network. Is this an I/O-bound or CPU-bound task?
I/O-bound!
This means we should see large gains from multithreading, even on a single-core machine.
Why is this implementation slow?
Each call to getNumProcesses is independent. We should call it multiple times concurrently to overlap this "dead time".
We wait 16 times, because we idle while waiting for a connection to come back.
How can we improve its performance?
What might this look like in code?
Implementation: spawn multiple threads, each responsible for connecting to a different myth machine and updating the map.
What might this look like in code?
When spawning threads, we don't want to spawn too many, because we might overwhelm the OS and diminish the performance gains of our multithreaded implementation.
A common approach is to limit the number of simultaneous threads with a cap. E.g. we can only have 16 spawned threads at a time. Once one finishes, then we can spawn another.
static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs, map<int, int>& processCountMap) {
vector<thread> threads;
mutex processCountMapLock;
semaphore permits(kMaxNumSimultaneousThreads);
for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
permits.wait();
threads.push_back(thread(countCS110ProcessesForMyth, mythNum, ref(sunetIDs),
ref(processCountMap), ref(processCountMapLock), ref(permits)));
}
for (thread& threadToJoin : threads) threadToJoin.join();
}
static void countCS110ProcessesForMyth(int mythNum, const unordered_set<string>& sunetIDs,
map<int, int>& processCountMap, mutex& processCountMapLock, semaphore& permits) {
int numProcesses = getNumProcesses(mythNum, sunetIDs);
if (numProcesses >= 0) {
processCountMapLock.lock();
processCountMap[mythNum] = numProcesses;
processCountMapLock.unlock();
cout << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl;
}
permits.signal();
}
(Nagging voice) hey, technically isn't is possible for more than the permitted number of threads to be alive if one is spawned here?
static void countCS110ProcessesForMyth(int mythNum, const unordered_set<string>& sunetIDs,
map<int, int>& processCountMap, mutex& processCountMapLock, semaphore& permits) {
permits.signal(on_thread_exit);
int numProcesses = getNumProcesses(mythNum, sunetIDs);
if (numProcesses >= 0) {
processCountMapLock.lock();
processCountMap[mythNum] = numProcesses;
processCountMapLock.unlock();
cout << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl;
}
}
Even though we are limiting the number of simultaneous threads, we still spawn that many in total. It would be nice if we could use the same threads to complete all the tasks.
A common approach is to use a thread pool; a variable type that maintains a pool of worker threads that can complete assigned tasks.
class ThreadPool {
public:
ThreadPool(size_t numThreads);
void schedule(const std::function<void(void)>& thunk);
void wait();
~ThreadPool();
};
Even though we are limiting the number of simultaneous threads, we still spawn that many in total. It would be nice if we could use the same threads to complete all the tasks.
A common approach is to use a thread pool; a variable type that maintains a pool of worker threads that can complete assigned tasks.
class ThreadPool {
public:
ThreadPool(size_t numThreads);
void schedule(const std::function<void(void)>& thunk);
void wait();
~ThreadPool();
};
Next time: multithreading wrap-up and introduction to networking