CS110 Lecture 11: Semaphores and Multithreading Patterns

Principles of Computer Systems

Winter 2021

Stanford University

Computer Science Department

Instructors: Chris Gregg and

Nick Troccoli

PDF of this presentation

CS110 Topic 3: How can we have concurrency within a single process?

Learning About Threads

Introduction to Threads

Threads and Mutexes

Condition Variables and Semaphores

Multithreading Patterns

Lecture 9

Lecture 9-10

Lecture 10

Lecture 11

Learning Goals

Learn how a semaphore generalizes the "permits pattern" we previously saw
Learn how to apply semaphores to coordinate threads in different ways

Lecture Plan

Recap: Dining With Philosophers
Semaphores and Thread Coordination
Example: Reader-Writer
Example: Mythbusters

Lecture Plan

Recap: Dining With Philosophers
Semaphores and Thread Coordination
Example: Reader-Writer
Example: Mythbusters

This is a canonical multithreading example of the potential for deadlock and how to avoid it.
Five philosophers sit around a circular table, eating spaghetti
There is one fork for each of them
Each philosopher thinks, then eats, and repeats this three times for their three daily meals.
To eat, a philosopher must grab the fork on their left and the fork on their right. With two forks in hand, they chow on spaghetti to nourish their big, philosophizing brain. When they're full, they put down the forks in the same order they picked them up and return to thinking for a while.
To think, the a philosopher keeps to themselves for some amount of time. Sometimes they think for a long time, and sometimes they barely think at all.
The full program is right here.

Dining Philosophers Problem

https://commons.wikimedia.org/wiki/File:An_illustration_of_the_dining_philosophers_problem.png

When coding with threads, you need to ensure that:

there are never any race conditions
there's zero chance of deadlock; otherwise a subset of threads are forever starved
Race conditions can generally be solved with mutexes.
- We use them to mark the boundaries of critical regions and limit the number of threads present within them to be at most one.
Deadlock can be programmatically prevented by implanting directives to limit the number of threads competing for a shared resource.
Our general goal is to determine what constraints must be added to eliminate race conditions.

Race Conditions and Deadlock

Goal: we must encode constraints into our program.

Example: how many philosophers can hold a fork at the same time? One.

How can we encode this into our program? Let's make a mutex for each fork.

Each philosopher either holds a fork or doesn't.
A philosopher grabs a fork by locking that mutex. If the fork is available, the philosopher continues. Otherwise, it blocks until the fork becomes available and it can have it.
A philosopher puts down a fork by unlocking that mutex.

Constraints: Forks

Goal: we must encode constraints into our program.

Example: how many philosophers can try to eat at the same time? Four.

Alternative: how many philosophers can eat at the same time? Two.
Why might the first one be better? Imposes less bottlenecking while still solving the issue.

How can we encode this into our program?

let's have a count of "permits" or "tickets" available.
In order to try to eat (aka grab forks at all) a philosopher must get a permit
Once done eating, a philosopher must return their permit

What does this look like in code?

Use a semaphore initialized with the number of permits we want
Before grabbing forks, get a permit
When done eating, return a permit.

Constraints: Permits

Lecture Plan

Recap: Dining With Philosophers
Semaphores and Thread Coordination
Example: Reader-Writer
Example: Mythbusters

More on Semaphores

A semaphore is a variable type that represents a count of finite resources.

"Permits" pattern with a counter, mutex and condition_variable_any
Thread-safe way to grant permission and to wait for permission (aka sleep)

class semaphore {
 public:
  semaphore(int value = 0);
  void wait();
  void signal();
  
 private:
  int value;
  std::mutex m;
  std::condition_variable_any cv;
}

What does a semaphore initialized with a positive number mean?

semaphore permits(3);

Positive Semaphores

We start with a fixed number of permits.
Once those permits are taken, further threads must wait for permits to be returned before continuing
Example: Dining Philosophers

What does a semaphore initialized with 0 mean?

semaphore permits(0);

Zero Semaphores

We don't have any permits!
permits.wait() always initially waits for a signal, and will only stop waiting once that signal is received. E.g. you want to wait until another thread finishes before a thread continues.

void create(int creationCount, semaphore &s) {
    for (int i = 0; i < creationCount; i++) {
        cout << oslock << "Now creating " << i << endl << osunlock;
        s.signal();
    }
}

void consume_after_create(int consumeCount, semaphore &s) {
    for (int i = 0; i < consumeCount; i++) {
        s.wait();
        cout << oslock << "Now consuming " << i << endl << osunlock;
    }
}

int main(int argc, const char *argv[]) {
    semaphore zeroSemaphore(0); // can omit (0), since default initializes to 0
    int numIterations = 5;
    thread thread_waited_on(create, numIterations, ref(zeroSemaphore));
    thread waiting_thread(consume_after_create, numIterations, ref(zeroSemaphore));
    thread_waited_on.join();
    waiting_thread.join();
    return 0;
}

Zero Semaphores

$ ./zeroSemaphore
Now creating 0
Now creating 1
Now creating 2
Now creating 3
Now creating 4
Now consuming 0
Now consuming 1
Now consuming 2
Now consuming 3
Now consuming 4

Negative Semaphores

What does a semaphore initialized with a negative number mean?

semaphore permits(-9);

The semaphore must reach 1 before the initial wait would end. E.g. you want to wait until other threads finish before a final thread continues.

void writer(int i, semaphore &s) {
    cout << oslock << "Sending signal " << i << endl << osunlock;
    s.signal();
}

void read_after_ten(semaphore &s) {
    s.wait();
    cout << oslock << "Got enough signals to continue!" << endl << osunlock;
}

int main(int argc, const char *argv[]) {
    semaphore negSemaphore(-9);
    thread writers[10];
    for (size_t i = 0; i < 10; i++) {
        writers[i] = thread(writer, i, ref(negSemaphore));
    }
    thread r(read_after_ten, ref(negSemaphore));
    for (thread &t : writers) t.join();
    r.join();
    return 0;
}

Negative Semaphores

$ ./negativeSemaphores
Sending signal 0
Sending signal 1
Sending signal 2
Sending signal 3
Sending signal 5
Sending signal 7
Sending signal 8
Sending signal 9
Sending signal 6
Sending signal 4
Got enough signals to continue!

semaphores can be used to support thread coordination.

One thread can stall—via semaphore::wait—until other thread(s) use semaphore::signal, e.g. the signaling thread prepared some data that the waiting thread needs to continue.
Generalization of thread::join

Thread Coordination

Lecture Plan

Recap: Dining With Philosophers
Semaphores and Thread Coordination
Example: Reader-Writer
Example: Mythbusters

Reader-Writer

Let's implement a program that requires thread coordination with semaphores. First, we'll look at a version without semaphores to see why they are necessary.

The reader-writer pattern/program spawns 2 threads: one writer (publishes content to a shared buffer) and one reader (reads from shared buffer when content is available)
Common pattern! E.g. web server publishes content over a dedicated communication channel, and the web browser consumes that content.
More complex version: multiple readers, similar to how a web server handles many incoming requests (puts request in buffer, readers each read and process requests)

int main(int argc, const char *argv[]) {
  // Create an empty buffer
  char buffer[kNumBufferSlots];
  memset(buffer, ' ', sizeof(buffer));

  thread writer(writeToBuffer, buffer, sizeof(buffer), kNumIterations);
  thread reader(readFromBuffer, buffer, sizeof(buffer), kNumIterations);
  writer.join();
  reader.join();
  return 0;
}

Confused Reader-Writer

confused-reader-writer.cc

static void readFromBuffer(char buffer[], size_t bufferSize, size_t iterations) {
  cout << oslock << "Reader: ready to read." << endl << osunlock;
  for (size_t i = 0; i < iterations * bufferSize; i++) {
  
    // Read and process the data
    char ch = buffer[i % bufferSize];
    processData(ch); // sleep to simulate work
    buffer[i % bufferSize] = ' ';
    
    cout << oslock << "Reader: consumed data packet " 
      << "with character '" << ch << "'.\t\t" << osunlock;
    printBuffer(buffer, bufferSize);
  }
}

Confused Reader-Writer

confused-reader-writer.cc

static void writeToBuffer(char buffer[], size_t bufferSize, size_t iterations) {
  cout << oslock << "Writer: ready to write." << endl << osunlock;
  for (size_t i = 0; i < iterations * bufferSize; i++) {

    char ch = prepareData();
    buffer[i % bufferSize] = ch;
    
    cout << oslock << "Writer: published data packet with character '" 
      << ch << "'.\t\t" << osunlock;
    printBuffer(buffer, bufferSize);
  }
}

Confused Reader-Writer

confused-reader-writer.cc

Confused Reader-Writer

Both threads share the same buffer, so they agree where content is stored (think of buffer like state for a pipe or a connection between client and server)
The writer publishes content to the circular buffer, and the reader consumes that content as it's written. Each thread cycles through the buffer the same number of times, and they both agree that i % 8 identifies the next slot of interest.
Problem: each thread runs independently, without knowing how much progress the other has made.
- Example: no way for the reader to know that the slot it wants to read from has meaningful data in it. It's possible the writer hasn't gotten that far yet.
- Example: the writer could loop around and overwrite content that the reader has not yet consumed.

Goal: we must encode constraints into our program.

What constraint(s) should we add to our program?

A reader should not read until something is available to read
A writer should not write until there is space available to write

How can we model these constraint(s)?

One semaphore to manage empty slots
One semaphore to manage full slots

Reader-Writer Constraints

What might this look like in code?

The writer thread waits until at least one buffer slot is empty before writing. Once it writes, it increments the full buffer count by one.
The reader thread waits until at least one buffer slot is full before reading. Once it reads, it increments the empty buffer count by one.
Let's try it!

Reader-Writer Constraints

reader-writer.cc

We have two semaphores to permit bidirectional thread coordination
- reader can communicate with writer, and writer can communicate with reader

Reader-Writer Takeaways

reader-writer.cc

Lecture Plan

Recap: Dining With Philosophers
Semaphores and Thread Coordination
Example: Reader-Writer
Example: Mythbusters

Mythbusters

Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.

representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.

myth51 has this many CS110-student processes: 59
myth52 has this many CS110-student processes: 135
myth53 has this many CS110-student processes: 112
myth54 has this many CS110-student processes: 89
myth55 has this many CS110-student processes: 107
myth56 has this many CS110-student processes: 58
myth57 has this many CS110-student processes: 70
myth58 has this many CS110-student processes: 93
myth59 has this many CS110-student processes: 107
myth60 has this many CS110-student processes: 145
myth61 has this many CS110-student processes: 105
myth62 has this many CS110-student processes: 126
myth63 has this many CS110-student processes: 314
myth64 has this many CS110-student processes: 119
myth65 has this many CS110-student processes: 156
myth66 has this many CS110-student processes: 144
Machine least loaded by CS110 students: myth56
Number of CS110 processes on least loaded machine: 58

Mythbusters

Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.

representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.

int getNumProcesses(int mythNum, const std::unordered_set<std::string>& sunetIDs);

We'll use the following pre-implemented function that does some networking to fetch process counts. This connects to the specified myth machine, and blocks until done.

Mythbusters

Let's implement a program called myth-buster that prints out how many CS110 student processes are running on each myth machine right now.

representative of load balancers (e.g. myth.stanford.edu or www.netflix.com) determining which internal server your request should forward to.

int main(int argc, char *argv[]) {
  // Create a set of student SUNETs
  unordered_set<string> cs110SUNETs;
  readStudentSUNETsFile(cs110SUNETs, kCS110StudentIDsFile);

  // Create a map from myth number -> CS110 process count and print its info
  map<int, int> processCountMap;
  createCS110ProcessCountMap(cs110SUNETs, processCountMap);
  printMythMachineWithFewestProcesses(processCountMap);

  return 0;
}

We'll implement createCS110ProcessCountMap sequentially and concurrently.

Mythbusters: Sequential

static void createCS110ProcessCountMap(const unordered_set<string>& sunetIDs,
					map<int, int>& processCountMap) {

  for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
    int numProcesses = getNumProcesses(mythNum, sunetIDs);

    // If successful, add to the map and print out
    if (numProcesses >= 0) {
      processCountMap[mythNum] = numProcesses;
      cout << "myth" << mythNum << " has this many CS110-student processes: " << numProcesses << endl;
    }
  }
}

This implementation fetches the count for each myth machine one after the other. This means we have to wait for 16 sequential connections to be started and completed.

myth-buster-sequential.cc

Mythbusters: Sequential

Why is this implementation slow?

Each call to getNumProcesses is independent. We should call it multiple times concurrently to overlap this "dead time".

We wait 16 times, because we idle while waiting for a connection to come back.

How can we improve its performance?

Mythbusters: Concurrent

myth-buster-concurrent.cc

What might this look like in code?

For each myth machine number, we'll spawn a new thread if there are permits available.
That thread will fetch the count for that myth machine. It must acquire a lock before modifying the map.
When the thread finishes, it returns its permit.
Let's try it!

Implementation: spawn multiple threads, each responsible for connecting to a different myth machine and updating the map. We'll cap the number of active threads to avoid overloading the myth machines.

Mythbusters Takeaways

myth-buster-concurrent.cc

We parallelized an independent operation to speed up runtime
- One call to getNumProcesses isn't dependent on another
To share the map for updating, we need a lock
We use signal(on_thread_exit) to signal only once the thread has terminated. This more accurately reflects permits as a cap on spawned threads.

Recap

Recap: Dining With Philosophers
Semaphores and Thread Coordination
Example: Reader-Writer
Example: Mythbusters

Next time: a trip to the ice cream store

Extra Practice Problems

Multithreading Patterns

For each of the scenarios below, what multithreading patterns might we use to apply appropriate constraints and coordinate threads?

reader/writer, but reading/writing numbers using one shared int
multiple workers periodically need approval from an "approval thread" before continuing. Approval may take some time, and gives back result (approve or deny).
multiple threads wait in line to be processed by a "processor" thread. The processor wakes up just one individual thread when it's their turn to be processed.

Multithreading Patterns

For each of the scenarios below, what multithreading patterns might we use to apply appropriate constraints and coordinate threads?

reader/writer, but reading/writing numbers using one shared int
- same as reader/writer from before, but semaphores initialized to 0 or 1 ("1 slot")
multiple workers periodically need approval from an "approval thread" before continuing. Approval may take some time, and gives back result (approve or deny).
- see next slide
multiple threads wait in line to be processed by a "processor" thread. The processor wakes up just one individual thread when it's their turn to be processed.
- stay tuned for next lecture!

Multithreading Patterns

Challenge: multiple workers periodically need approval from an "approval thread" before continuing. Approval may take some time, and gives back result (approve or deny).

// global struct
struct approval {
    mutex available;
    int workerData;
    semaphore requested;
    bool approved;
    semaphore finished; 
}

// all N workers
// spend time creating data, then...
approval.available.lock();
approval.workerData = ....
approval.requested.signal();
approval.finished.wait();
bool success = approval.approved;
approval.available.unlock();

// approver
while (true) {
    approval.requested.wait();
    // we are the only one accessing the struct here
    approval.approved = someCalculation(approval.workerData);
    approval.finished.signal();
}