CS110 Lecture 19: Thread Pool and Ice Cream Store

CS110: Principles of Computer Systems

Winter 2021-2022

Stanford University

Instructors: Nick Troccoli and Jerry Cain

PDF of this presentation

An icon for a "CS110 Ice cream store", saying "CS110 Ice Cream, Inc." and "Multiple threads of flavor!"

CS110 Topic 3: How can we have concurrency within a single process?

Learning About Multithreading

Introduction to Threads

Mutexes and Condition Variables

Semaphores

Multithreading Patterns

Lecture 13

Lectures 14/15

Lecture 16

Lecture 17/18/this lecture

assign5: implement your own multithreaded news aggregator to quickly fetch news from the web!

Learning Goals

Practice applying our toolbox of concurrency directives (mutexes, condition variables and semaphores) to coordinate threads in different ways
Understand the larger ice cream store example as a case study in multithreading and threads doing different tasks

Plan For Today

Recap: Mythbuster
Example: Ice Cream Store

Plan For Today

Recap: Mythbuster
Example: Ice Cream Store

Mythbuster

Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.

representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.

myth51 has this many CS110-student processes: 59
myth52 has this many CS110-student processes: 135
myth53 has this many CS110-student processes: 112
myth54 has this many CS110-student processes: 89
myth55 has this many CS110-student processes: 107
myth56 has this many CS110-student processes: 58
myth57 has this many CS110-student processes: 70
myth58 has this many CS110-student processes: 93
myth59 has this many CS110-student processes: 107
myth60 has this many CS110-student processes: 145
myth61 has this many CS110-student processes: 105
myth62 has this many CS110-student processes: 126
myth63 has this many CS110-student processes: 314
myth64 has this many CS110-student processes: 119
myth65 has this many CS110-student processes: 156
myth66 has this many CS110-student processes: 144
Machine least loaded by CS110 students: myth56
Number of CS110 processes on least loaded machine: 58

I/O-Bound vs. CPU-Bound Programs

CPU-bound tasks: the time to complete them is dictated by how long it takes us to do the CPU computation.

heavy computations
data processing

I/O-bound tasks: the time to complete them is dictated by how long it takes for some external mechanism to complete its work.

reading from an external device (e.g. disk)
reading data from the network

Even a single-core CPU can see performance improvements by parallelizing I/O-bound tasks. But parallelizing CPU-bound tasks will likely show minimal gains unless we have a multi-core CPU.

Parallelizing Mythbuster

For mythbuster, the primary task is fetching the number of running CS110 processes over the network. Is this an I/O-bound or CPU-bound task?

I/O-bound!

This means we should see large gains from multithreading, even on a single-core machine.

Mythbusters: Concurrent

myth-buster-concurrent.cc

Implementation: spawn multiple threads, each responsible for connecting to a different myth machine and updating the map.

static void countCS110ProcessesForMyth(int mythNum, const unordered_set<string>& sunetIDs,
  map<int, int>& processCountMap, mutex& processCountMapLock) {

  int numProcesses = getNumProcesses(mythNum, sunetIDs);

  // If successful, add to the map and print out
  if (numProcesses >= 0) {
    processCountMapLock.lock();
    processCountMap[mythNum] = numProcesses;
    processCountMapLock.unlock();
    cout << oslock << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl << osunlock;
  }
}

Mythbusters: Capped

When spawning threads, we don't want to spawn too many, because we might overwhelm the OS and diminish the performance gains of our multithreaded implementation.

A common approach is to limit the number of simultaneous threads with a cap. E.g. we can only have 16 spawned threads at a time. Once one finishes, then we can spawn another.

myth-buster-concurrent.cc

Mythbusters: Capped

For each myth machine number, we'll spawn a new thread if there are permits available. That thread will fetch the count for that myth machine.
When the thread finishes, it returns its permit.

myth-buster-concurrent.cc

static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs, map<int, int>& processCountMap) {
  vector<thread> threads;
  mutex processCountMapLock;
  semaphore permits(kMaxNumSimultaneousThreads);

  for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
    permits.wait();

    threads.push_back(thread(countCS110ProcessesForMyth, mythNum, ref(sunetIDs),
      ref(processCountMap), ref(processCountMapLock), ref(permits)));
  }

  for (thread& threadToJoin : threads) threadToJoin.join();
}

Mythbusters: Capped

For each myth machine number, we'll spawn a new thread if there are permits available. That thread will fetch the count for that myth machine.
When the thread finishes, it returns its permit. We can use a special version of signal() to specify that the semaphore should be signaled only once it exits.

myth-buster-concurrent.cc

static void countCS110ProcessesForMyth(int mythNum, const unordered_set<string>& sunetIDs,
  map<int, int>& processCountMap, mutex& processCountMapLock, semaphore& permits) {
  
  permits.signal(on_thread_exit);

  int numProcesses = getNumProcesses(mythNum, sunetIDs);

  if (numProcesses >= 0) {
    processCountMapLock.lock();
    processCountMap[mythNum] = numProcesses;
    processCountMapLock.unlock();
    cout << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl;
  }
}

Mythbusters: Thread Pool

Even though we are limiting the number of simultaneous threads, we still spawn that many in total. It would be nice if we could use the same threads to complete all the tasks.

A common approach is to use a thread pool; a variable type that maintains a pool of worker threads that can complete assigned tasks.

You initialize the thread pool and specify the number of workers
You can call schedule and pass in a function you want it to execute. It will assign it to the next available worker.
You can call wait to block until all currently-assigned tasks have been completed.

class ThreadPool {
public:
   ThreadPool(size_t numThreads);
   void schedule(const std::function<void(void)>& thunk);
   void wait();
   ~ThreadPool();
};

Mythbusters: Thread Pool

Even though we are limiting the number of simultaneous threads, we still spawn that many in total. It would be nice if we could use the same threads to complete all the tasks.

What might this look like in code?

In myth buster, instead of spawning threads, we can schedule a "thunk" for each task of fetching a myth machine's count of CS110 processes. It must be a function that has no parameters or return value.
After we add all the tasks to the thread pool, we wait on the thread pool to finish all the tasks.

class ThreadPool {
public:
   ThreadPool(size_t numThreads);
   void schedule(const std::function<void(void)>& thunk);
   void wait();
   ~ThreadPool();
};

Mythbusters: Thread Pool

We can schedule a "thunk" for each task of fetching a myth machine's count of CS110 processes. It must be a function that has no parameters or return value.
After we add all the tasks to the thread pool, we wait on the thread pool to finish all the tasks.

myth-buster-pooled.cc

static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs,
          map<int, int>& processCountMap) {

  ThreadPool pool(kMaxNumSimultaneousThreads);
  mutex processCountMapLock;

  for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
    pool.schedule([mythNum, &sunetIDs, &processCountMap, &processCountMapLock]() {
      countCS110ProcessesForMyth(mythNum, sunetIDs, processCountMap, processCountMapLock);
    });
  }
  ...

We can only enqueue a task represented by a function with no params/return value. Therefore, to access external data, we must capture it in a lambda function.

Mythbusters: Thread Pool

We can schedule a "thunk" for each task of fetching a myth machine's count of CS110 processes. It must be a function that has no parameters or return value.
After we add all the tasks to the thread pool, we wait on the thread pool to finish all the tasks.

myth-buster-pooled.cc

static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs,
          map<int, int>& processCountMap) {

  ThreadPool pool(kMaxNumSimultaneousThreads);
  mutex processCountMapLock;

  for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
    pool.schedule([mythNum, &sunetIDs, &processCountMap, &processCountMapLock]() {
      countCS110ProcessesForMyth(mythNum, sunetIDs, processCountMap, processCountMapLock);
    });
  }

  pool.wait();
}

Thread Pools

Thread Pools are very useful abstractions that let a client spread tasks across several threads without having to deal with the complexities of threads.

You will have a chance to implement your own ThreadPool class on assignment 5!

Plan For Today

Recap: Mythbuster
Example: Ice Cream Store

Visiting The Ice Cream Store

Now, let's use our multithreading knowledge to understand an in-depth multithreading program simulating an ice cream store!
There are customers, clerks, a manager and a cashier, coordinating in various ways.

ice-cream-store.cc

Visiting The Ice Cream Store

Each customer wants to order some number of ice cream cones.
A customer spawns a new clerk to make each ice cream cone.
A clerk makes a single cone, and must have it approved by the manager.
The single manager approves or rejects cones made by clerks.
Once a customer's order is made, they must get in line with the cashier to check out.
The cashier helps customers check out in the order in which they got on line.

Ice Cream Store: scaffolding

static mutex rgenLock;
static RandomGenerator rgen;

...

void browse() {
  cout << oslock << "Customer starts to kill time." << endl << osunlock;
  size_t browseTimeMS = getBrowseTimeMS();
  sleep_for(browseTimeMS);
  cout << oslock << "Customer just killed " << double(browseTimeMS) / 1000
       << " seconds." << endl << osunlock;
}

void makeCone(size_t coneID, size_t customerID) {
  cout << oslock << "    Clerk starts to make ice cream cone #" << coneID 
       << " for customer #" << customerID << "." << endl << osunlock;
  size_t prepTimeMS = getPrepTimeMS();
  sleep_for(prepTimeMS);
  cout << oslock << "    Clerk just spent " << double(prepTimeMS) / 1000 
       << " seconds making ice cream cone #" << coneID 
       << " for customer #" << customerID << "." << endl << osunlock;
}

...

To model a "real" ice cream store, we want to randomize different occurrences throughout the program. We use functions like this to do that.

Ice Cream Store: main

int main(int argc, const char *argv[]) {
  // Make an array of customer threads, and add up how many cones they order
  size_t totalConesOrdered = 0;
  thread customers[kNumCustomers];

  /* The structs to package up variables needed for cone inspection and 
   * customer checkout
   */
  inspection_t inspection;
  checkout_t checkout;

  for (size_t i = 0; i < kNumCustomers; i++) {
    // utility function, random (see ice-cream-store-utils.h)
    size_t numConesWanted = getNumCones();
    customers[i] = thread(customer, i, numConesWanted, 
      ref(inspection), ref(checkout));
    totalConesOrdered += numConesWanted;
  }

  /* Make the manager and cashier threads to approve cones / checkout customers.
   * Tell the manager how many cones will be ordered in total. */
  thread managerThread(manager, totalConesOrdered, ref(inspection));
  thread cashierThread(cashier, ref(checkout));
  
  for (thread& customer: customers) customer.join();
  cashierThread.join();
  managerThread.join();

  return 0;
}

In main, we spawn all of the customers, the manager (telling it the total number of cones ordered), and the cashier. Why not clerks? Each customer spawns its own clerks.

Then, we wait for the threads to finish.

Ice Cream Store: customer

A customer does the following:

spawns a clerk for each cone
browses and waits for clerks to finish
gets its number in checkout line
tells cashier we are ready to check out
waits for cashier to ring us up

"gets its number in checkout line"
"tells cashier we are ready to check out"
"waits for cashier to ring us up"

"gets its number in checkout line" - global counter, needs a binary lock
"tells cashier we are ready to check out" - one generalized coordination semaphore
"waits for cashier to ring us up" - binary coordination semaphore per customer

Ice Cream Store: customer

struct checkout_t {
  atomic<size_t> nextPlaceInLine{0};
  semaphore customers[kNumCustomers];
  semaphore waitingCustomers;
};

Struct passed by reference to all customers and the cashier.

nextPlaceInLine is a counter that is automatically atomic for ++!
waitingCustomers is a generalized coordination semaphore that the cashier waits on
customers stores a binary coordination semaphore per customer, customers wait on them

Ice Cream Store: customer

static void customer(size_t id, size_t numConesWanted,
                     inspection_t& inspection, checkout_t& checkout) {
  // Make a vector of clerk threads, one per cone to be ordered
  vector<thread> clerks(numConesWanted);
  for (size_t i = 0; i < clerks.size(); i++) {
    clerks[i] = thread(clerk, i, id, ref(inspection));
  }

  // The customer browses for some amount of time, then joins the clerks.
  browse();
  for (thread& clerk: clerks) clerk.join();

  size_t place = checkout.nextPlaceInLine++;
  cout << oslock << "Customer " << id << " assumes position #" << place
       << " at the checkout counter." << endl << osunlock;

  // Tell the cashier that we are ready to check out
  checkout.waitingCustomers.signal();

  // Wait on our unique semaphore so we know when it is our turn
  checkout.customers[place].wait();
  cout << oslock << "Customer " << id
       << " has checked out and leaves the ice cream store." 
       << endl << osunlock;
}

A customer does the following:

1) spawns a clerk for each cone

2) browses and waits for clerks

3) gets its place in checkout line

4) tells cashier it's there

5) waits for cashier to ring it up

struct checkout_t {
  atomic<size_t> nextPlaceInLine{0};
  semaphore customers[kNumCustomers];
  semaphore waitingCustomers;
};

Ice Cream Store: clerk

A clerk does the following:

makes a cone
attempts to get exclusive access to the manager
tells the manager it needs approval
waits for the manager to decide whether to approve or reject
checks the manager's decision
forfeits exclusive access to the manager
if our cone was rejected, go to step 1

"attempts to get exclusive access to the manager"
"tells the manager it needs approval"
"waits for the manager to decide..."

"attempts to get exclusive access to the manager" - binary lock
"tells the manager it needs approval" - binary coordination semaphore
"waits for the manager to decide..." - binary coordination semaphore

Ice Cream Store: clerk

struct inspection_t {
  mutex available;
  semaphore requested;
  semaphore finished;
  bool passed;
};

Struct passed by reference to all clerks and the manager.

available is a lock that a clerk must hold in order to interact with the manager.
requested is a binary coordination semaphore that the manager waits on
finished is a binary coordination semaphore that a clerk waits on
passed stores the result of the most recent inspection - only for lock-holder.

Ice Cream Store: clerk

static void clerk(size_t coneID, size_t customerID,
     inspection_t& inspection) {

  bool success = false;
  while (!success) {
    makeCone(coneID, customerID);

    // We must be the only one requesting approval
    inspection.available.lock();

    // Let the manager know we are requesting approval
    inspection.requested.signal();

    // Wait for the manager to finish
    inspection.finished.wait();

    /* If the manager is finished, it has put
     * its approval decision into "passed"
     */
    success = inspection.passed;

    // We're done requesting approval, so unlock for someone else
    inspection.available.unlock();
  }
}

A clerk does the following:

makes a cone
gets exclusive manager access
tells the manager it needs approval
waits for the manager to decide
checks the manager's decision
forfeits manager access
if rejected, go to step 1

struct inspection_t {
  mutex available;
  semaphore requested;
  semaphore finished;
  bool passed;
};

Ice Cream Store: manager

The single manager does the following while there are more cones needed:

waits for a clerk to request an inspection
inspects the cone and records decision to approve or not
tells the clerk that it is done
updates its cone counts
if more cones needed, go to step 1

"waits for a clerk's cone to inspect" - binary coordination semaphore
"tells the clerk that we are done" - binary coordination semaphore

Ice Cream Store: manager

struct inspection_t {
  mutex available;
  semaphore requested;
  semaphore finished;
  bool passed;
};

Struct passed by reference to all clerks and the manager.

available is a lock that a clerk must hold in order to interact with the manager.
requested is a binary coordination semaphore that the manager waits on
finished is a binary coordination semaphore that a clerk waits on
passed stores the result of the most recent inspection - only for lock-holder.

Ice Cream Store: manager

static void manager(size_t numConesNeeded,
     inspection_t& inspection) {

  size_t numConesAttempted = 0;
  size_t numConesApproved = 0;

  while (numConesApproved < numConesNeeded) {
    // Wait for someone to request an inspection
    inspection.requested.wait();

    inspection.passed = inspectCone();

    // Let them know we have finished inspecting
    inspection.finished.signal();

    numConesAttempted++;
    if (inspection.passed) numConesApproved++;
  }
  
  cout << oslock << "  Manager inspected a total of "
       << numConesAttempted 
       << " ice cream cones before approving a total of "
       << numConesNeeded 
       << "." << endl << "  Manager leaves the ice cream store."
       << endl << osunlock;
}

The manager does the following while there are more cones needed:

waits for a clerk's cone to inspect
inspects the cone and records decision to approve or not.
tells the clerk that it is done.
updates its cone counts
if more cones needed, go to 1

struct inspection_t {
  mutex available;
  semaphore requested;
  semaphore finished;
  bool passed;
};

Ice Cream Store: cashier

"waits for a customer to be ready to check out" - generalized coordination semaphore
"tells the i-th customer that it has checked out" - binary coordination semaphore per customer

The single cashier does the following while there are more customers to ring up:

waits for a customer to be ready to check out
tells the i-th customer that it has checked out
if more customers to ring up, go to step 1

Ice Cream Store: cashier

struct checkout_t {
  atomic<size_t> nextPlaceInLine{0};
  semaphore customers[kNumCustomers];
  semaphore waitingCustomers;
};

Global struct shared by all customers and the cashier.

nextPlaceInLine is a counter that is automatically atomic for ++!
waitingCustomers is a generalized coordination semaphore that the cashier waits on
customers stores a binary coordination semaphore per customer, customers wait on them

Ice Cream Store: cashier

static void cashier(checkout_t& checkout) {
  cout << oslock 
       << "      Cashier is ready to help customers check out." 
       << endl << osunlock;

  // We check out all customers
  for (size_t i = 0; i < kNumCustomers; i++) {
    // Wait for someone to let us know they are ready to check out
    checkout.waitingCustomers.wait();
    cout << oslock << "      Cashier rings up customer " << i << "." 
         << endl << osunlock;

    // Let the ith customer know that they can leave.
    checkout.customers[i].signal();
  }

  cout << oslock << "      Cashier is all done and can go home."
       << endl << osunlock;
}

The cashier does the following while there are more customers to ring up:

waits for a customer to be ready to check out
tells the i-th customer that it has checked out
if more customers to ring up, go to step 1

struct checkout_t {
  atomic<size_t> nextPlaceInLine{0};
  semaphore customers[kNumCustomers];
  semaphore waitingCustomers;
};

Ice Cream Store Takeaways

There's a lot going on in this simulation!
Managing all of the threads, locking, waiting, etc., takes planning and foresight.
This isn't the only way to model the ice cream store
- How would you modify the model?
- What would we have to do if we wanted more than one manager?
- Could we create multiple clerks in main, as well? (sure)
Example of different threads doing different tasks
Layered construction - combination of multithreading patterns
Role playing helps to visualize!

Multithreading Wrap-Up

Multithreading allows one process to execute multiple tasks at the same time.
We can spawn threads, which all share the same address space, and each of them can execute a function.
Race conditions are common when accessing shared data
We can use concurrency directives like mutexes, condition variables and semaphores to coordinate between threads and prevent race conditions.
Depending on what tasks a program performs, it may see varying benefits from adding multithreading - eg. I/O-bound vs. CPU-bound tasks.

Recap

Recap: Mythbuster
Example: Ice Cream Store

Next time: Introduction to networking