CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Introduction to Threads
Mutexes and Condition Variables
Semaphores
Multithreading Patterns
assign5: implement your own multithreaded news aggregator to quickly fetch news from the web!
Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.
representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.
myth51 has this many CS110-student processes: 59
myth52 has this many CS110-student processes: 135
myth53 has this many CS110-student processes: 112
myth54 has this many CS110-student processes: 89
myth55 has this many CS110-student processes: 107
myth56 has this many CS110-student processes: 58
myth57 has this many CS110-student processes: 70
myth58 has this many CS110-student processes: 93
myth59 has this many CS110-student processes: 107
myth60 has this many CS110-student processes: 145
myth61 has this many CS110-student processes: 105
myth62 has this many CS110-student processes: 126
myth63 has this many CS110-student processes: 314
myth64 has this many CS110-student processes: 119
myth65 has this many CS110-student processes: 156
myth66 has this many CS110-student processes: 144
Machine least loaded by CS110 students: myth56
Number of CS110 processes on least loaded machine: 58
CPU-bound tasks: the time to complete them is dictated by how long it takes us to do the CPU computation.
I/O-bound tasks: the time to complete them is dictated by how long it takes for some external mechanism to complete its work.
Even a single-core CPU can see performance improvements by parallelizing I/O-bound tasks. But parallelizing CPU-bound tasks will likely show minimal gains unless we have a multi-core CPU.
For mythbuster, the primary task is fetching the number of running CS110 processes over the network. Is this an I/O-bound or CPU-bound task?
I/O-bound!
This means we should see large gains from multithreading, even on a single-core machine.
Implementation: spawn multiple threads, each responsible for connecting to a different myth machine and updating the map.
static void countCS110ProcessesForMyth(int mythNum, const unordered_set<string>& sunetIDs,
map<int, int>& processCountMap, mutex& processCountMapLock) {
int numProcesses = getNumProcesses(mythNum, sunetIDs);
// If successful, add to the map and print out
if (numProcesses >= 0) {
processCountMapLock.lock();
processCountMap[mythNum] = numProcesses;
processCountMapLock.unlock();
cout << oslock << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl << osunlock;
}
}
When spawning threads, we don't want to spawn too many, because we might overwhelm the OS and diminish the performance gains of our multithreaded implementation.
A common approach is to limit the number of simultaneous threads with a cap. E.g. we can only have 16 spawned threads at a time. Once one finishes, then we can spawn another.
static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs, map<int, int>& processCountMap) {
vector<thread> threads;
mutex processCountMapLock;
semaphore permits(kMaxNumSimultaneousThreads);
for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
permits.wait();
threads.push_back(thread(countCS110ProcessesForMyth, mythNum, ref(sunetIDs),
ref(processCountMap), ref(processCountMapLock), ref(permits)));
}
for (thread& threadToJoin : threads) threadToJoin.join();
}
static void countCS110ProcessesForMyth(int mythNum, const unordered_set<string>& sunetIDs,
map<int, int>& processCountMap, mutex& processCountMapLock, semaphore& permits) {
permits.signal(on_thread_exit);
int numProcesses = getNumProcesses(mythNum, sunetIDs);
if (numProcesses >= 0) {
processCountMapLock.lock();
processCountMap[mythNum] = numProcesses;
processCountMapLock.unlock();
cout << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl;
}
}
Even though we are limiting the number of simultaneous threads, we still spawn that many in total. It would be nice if we could use the same threads to complete all the tasks.
A common approach is to use a thread pool; a variable type that maintains a pool of worker threads that can complete assigned tasks.
class ThreadPool {
public:
ThreadPool(size_t numThreads);
void schedule(const std::function<void(void)>& thunk);
void wait();
~ThreadPool();
};
Even though we are limiting the number of simultaneous threads, we still spawn that many in total. It would be nice if we could use the same threads to complete all the tasks.
What might this look like in code?
class ThreadPool {
public:
ThreadPool(size_t numThreads);
void schedule(const std::function<void(void)>& thunk);
void wait();
~ThreadPool();
};
static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs,
map<int, int>& processCountMap) {
ThreadPool pool(kMaxNumSimultaneousThreads);
mutex processCountMapLock;
for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
pool.schedule([mythNum, &sunetIDs, &processCountMap, &processCountMapLock]() {
countCS110ProcessesForMyth(mythNum, sunetIDs, processCountMap, processCountMapLock);
});
}
...
We can only enqueue a task represented by a function with no params/return value. Therefore, to access external data, we must capture it in a lambda function.
static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs,
map<int, int>& processCountMap) {
ThreadPool pool(kMaxNumSimultaneousThreads);
mutex processCountMapLock;
for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
pool.schedule([mythNum, &sunetIDs, &processCountMap, &processCountMapLock]() {
countCS110ProcessesForMyth(mythNum, sunetIDs, processCountMap, processCountMapLock);
});
}
pool.wait();
}
Thread Pools are very useful abstractions that let a client spread tasks across several threads without having to deal with the complexities of threads.
static mutex rgenLock;
static RandomGenerator rgen;
...
void browse() {
cout << oslock << "Customer starts to kill time." << endl << osunlock;
size_t browseTimeMS = getBrowseTimeMS();
sleep_for(browseTimeMS);
cout << oslock << "Customer just killed " << double(browseTimeMS) / 1000
<< " seconds." << endl << osunlock;
}
void makeCone(size_t coneID, size_t customerID) {
cout << oslock << " Clerk starts to make ice cream cone #" << coneID
<< " for customer #" << customerID << "." << endl << osunlock;
size_t prepTimeMS = getPrepTimeMS();
sleep_for(prepTimeMS);
cout << oslock << " Clerk just spent " << double(prepTimeMS) / 1000
<< " seconds making ice cream cone #" << coneID
<< " for customer #" << customerID << "." << endl << osunlock;
}
...
To model a "real" ice cream store, we want to randomize different occurrences throughout the program. We use functions like this to do that.
int main(int argc, const char *argv[]) {
// Make an array of customer threads, and add up how many cones they order
size_t totalConesOrdered = 0;
thread customers[kNumCustomers];
/* The structs to package up variables needed for cone inspection and
* customer checkout
*/
inspection_t inspection;
checkout_t checkout;
for (size_t i = 0; i < kNumCustomers; i++) {
// utility function, random (see ice-cream-store-utils.h)
size_t numConesWanted = getNumCones();
customers[i] = thread(customer, i, numConesWanted,
ref(inspection), ref(checkout));
totalConesOrdered += numConesWanted;
}
/* Make the manager and cashier threads to approve cones / checkout customers.
* Tell the manager how many cones will be ordered in total. */
thread managerThread(manager, totalConesOrdered, ref(inspection));
thread cashierThread(cashier, ref(checkout));
for (thread& customer: customers) customer.join();
cashierThread.join();
managerThread.join();
return 0;
}
In main, we spawn all of the customers, the manager (telling it the total number of cones ordered), and the cashier. Why not clerks? Each customer spawns its own clerks.
Then, we wait for the threads to finish.
A customer does the following:
struct checkout_t {
atomic<size_t> nextPlaceInLine{0};
semaphore customers[kNumCustomers];
semaphore waitingCustomers;
};
Struct passed by reference to all customers and the cashier.
static void customer(size_t id, size_t numConesWanted,
inspection_t& inspection, checkout_t& checkout) {
// Make a vector of clerk threads, one per cone to be ordered
vector<thread> clerks(numConesWanted);
for (size_t i = 0; i < clerks.size(); i++) {
clerks[i] = thread(clerk, i, id, ref(inspection));
}
// The customer browses for some amount of time, then joins the clerks.
browse();
for (thread& clerk: clerks) clerk.join();
size_t place = checkout.nextPlaceInLine++;
cout << oslock << "Customer " << id << " assumes position #" << place
<< " at the checkout counter." << endl << osunlock;
// Tell the cashier that we are ready to check out
checkout.waitingCustomers.signal();
// Wait on our unique semaphore so we know when it is our turn
checkout.customers[place].wait();
cout << oslock << "Customer " << id
<< " has checked out and leaves the ice cream store."
<< endl << osunlock;
}
A customer does the following:
1) spawns a clerk for each cone
2) browses and waits for clerks
3) gets its place in checkout line
4) tells cashier it's there
5) waits for cashier to ring it up
struct checkout_t {
atomic<size_t> nextPlaceInLine{0};
semaphore customers[kNumCustomers];
semaphore waitingCustomers;
};
A clerk does the following:
struct inspection_t {
mutex available;
semaphore requested;
semaphore finished;
bool passed;
};
Struct passed by reference to all clerks and the manager.
static void clerk(size_t coneID, size_t customerID,
inspection_t& inspection) {
bool success = false;
while (!success) {
makeCone(coneID, customerID);
// We must be the only one requesting approval
inspection.available.lock();
// Let the manager know we are requesting approval
inspection.requested.signal();
// Wait for the manager to finish
inspection.finished.wait();
/* If the manager is finished, it has put
* its approval decision into "passed"
*/
success = inspection.passed;
// We're done requesting approval, so unlock for someone else
inspection.available.unlock();
}
}
A clerk does the following:
struct inspection_t {
mutex available;
semaphore requested;
semaphore finished;
bool passed;
};
The single manager does the following while there are more cones needed:
struct inspection_t {
mutex available;
semaphore requested;
semaphore finished;
bool passed;
};
Struct passed by reference to all clerks and the manager.
static void manager(size_t numConesNeeded,
inspection_t& inspection) {
size_t numConesAttempted = 0;
size_t numConesApproved = 0;
while (numConesApproved < numConesNeeded) {
// Wait for someone to request an inspection
inspection.requested.wait();
inspection.passed = inspectCone();
// Let them know we have finished inspecting
inspection.finished.signal();
numConesAttempted++;
if (inspection.passed) numConesApproved++;
}
cout << oslock << " Manager inspected a total of "
<< numConesAttempted
<< " ice cream cones before approving a total of "
<< numConesNeeded
<< "." << endl << " Manager leaves the ice cream store."
<< endl << osunlock;
}
The manager does the following while there are more cones needed:
struct inspection_t {
mutex available;
semaphore requested;
semaphore finished;
bool passed;
};
The single cashier does the following while there are more customers to ring up:
struct checkout_t {
atomic<size_t> nextPlaceInLine{0};
semaphore customers[kNumCustomers];
semaphore waitingCustomers;
};
Global struct shared by all customers and the cashier.
static void cashier(checkout_t& checkout) {
cout << oslock
<< " Cashier is ready to help customers check out."
<< endl << osunlock;
// We check out all customers
for (size_t i = 0; i < kNumCustomers; i++) {
// Wait for someone to let us know they are ready to check out
checkout.waitingCustomers.wait();
cout << oslock << " Cashier rings up customer " << i << "."
<< endl << osunlock;
// Let the ith customer know that they can leave.
checkout.customers[i].signal();
}
cout << oslock << " Cashier is all done and can go home."
<< endl << osunlock;
}
The cashier does the following while there are more customers to ring up:
struct checkout_t {
atomic<size_t> nextPlaceInLine{0};
semaphore customers[kNumCustomers];
semaphore waitingCustomers;
};
There's a lot going on in this simulation!
Managing all of the threads, locking, waiting, etc., takes planning and foresight.
This isn't the only way to model the ice cream store
How would you modify the model?
What would we have to do if we wanted more than one manager?
Could we create multiple clerks in main, as well? (sure)
Example of different threads doing different tasks
Layered construction - combination of multithreading patterns
Role playing helps to visualize!
Next time: Introduction to networking