CS110: Principles of Computer Systems

Spring 2021
Instructors Roz Cyrus and Jerry Cain

PDF

Multithreading and Networking

  • Implementing myth-buster!
    • The myth-buster is a command line utility that surveys all 16 myth machines to determine which is the least loaded.
      • By least loaded, we mean the myth machine that's running the fewest number of CS110 student processes.
      • Our myth-buster application is representative of the type of thing load balancers (e.g. myth.stanford.edu, www.facebook.com, or www.netflix.com) run to determine which internal server your request should forward to.
    • The overall architecture of the program looks like that below. We'll present various ways to implement compileCS110ProcessCountMap.
static const char *kCS110StudentIDsFile = "studentsunets.txt";
int main(int argc, char *argv[]) {
  unordered_set<string> cs110Students;
  readStudentFile(cs110Students, argv[1] != NULL ? argv[1] : kCS110StudentIDsFile);
  map<int, int> processCountMap;
  compileCS110ProcessCountMap(cs110Students, processCountMap);
  publishLeastLoadedMachineInfo(processCountMap);
  return 0;
}

Multithreading and Networking

  • Implementing myth-buster!






     
    • readStudentFile updates cs110Students to house the SUNets of all students currently enrolled in CS110. There's nothing interesting about its implementation, so I don't even show it (though you can see its implementation right here).
    • compileCS110ProcessCountMap is more interesting, since it uses networking—our first networking example!—to poll all 16 myths and count CS110 student processes.
    • processCountMap is updated to map myth numbers (e.g. 61) to process counts (e.g. 712).
    • publishLeastLoadedMachineInfo traverses processCountMap and and identifies the least loaded myth.
static const char *kCS110StudentIDsFile = "studentsunets.txt";
int main(int argc, char *argv[]) {
  unordered_set<string> cs110Students;
  readStudentFile(cs110Students, argv[1] != NULL ? argv[1] : kCS110StudentIDsFile);
  map<int, int> processCountMap;
  compileCS110ProcessCountMap(cs110Students, processCountMap);
  publishLeastLoadedMachineInfo(processCountMap);
  return 0;
}

Multithreading and Networking

  • The networking details are hidden and packaged in a library routine with this prototype:


     
  • num is the myth number (e.g. 54 for myth54) and sunetIDs is a hashset housing the SUNet IDs of all students currently enrolled in CS110 (according to our /usr/class/cs110/repos/assign4 directory).
  • Here is the sequential implementation of a compileCS110ProcessCountMap, which is very brute force and CS106B-ish:
static const int kMinMythMachine = 51;
static const int kMaxMythMachine = 66;
static void compileCS110ProcessCountMap(const unordered_set<string>& sunetIDs,
                                        map<int, int>& processCountMap) {
  for (int num = kMinMythMachine; num <= kMaxMythMachine; num++) {
    int numProcesses = getNumProcesses(num, sunetIDs);
    if (numProcesses >= 0) {
      processCountMap[num] = numProcesses;
      cout << "myth" << num << " has this many CS110-student processes: " << numProcesses << endl;
    }
  }
}

int getNumProcesses(int num, const unordered_set<std::string>& sunetIDs);

Multithreading and Networking

  • Here are two sample runs of myth-buster-sequential, which polls each of the myths in sequence (i.e. without concurrency).








     

 

 

 

 

  • Each call to getNumProcesses is relatively slow at a quarter of a second or so, so 16 calls adds up to about 16 times that.  Each of the two runs took circa 4 or 5 seconds.
poohbear@myth55:$ date
Wed 05 May 2021 12:23:54 PM PDT
poohbear@myth55:$ time ./myth-buster-sequential
myth51 has this many CS110-student processes: 195
myth52 has this many CS110-student processes: 110
myth53 has this many CS110-student processes: 209
myth54 has this many CS110-student processes: 168
myth55 has this many CS110-student processes: 161
myth56 has this many CS110-student processes: 123
myth57 has this many CS110-student processes: 134
myth58 has this many CS110-student processes: 90
myth59 has this many CS110-student processes: 112
myth60 has this many CS110-student processes: 92
myth61 has this many CS110-student processes: 295
myth62 has this many CS110-student processes: 140
myth63 has this many CS110-student processes: 334
myth64 has this many CS110-student processes: 101
myth65 has this many CS110-student processes: 138
myth66 has this many CS110-student processes: 1413
Machine least loaded by CS110 students: myth58
Number of CS110 processes on least loaded machine: 90

real	0m4.694s
user	0m0.312s
sys	0m0.139s
poohbear@myth55:$
poohbear@myth55:$ date
Wed Oct 21 17:34:19 PDT 2020
poohbear@myth55:$ time ./myth-buster-sequential
myth51 has this many CS110-student processes: 200
myth52 has this many CS110-student processes: 110
myth53 has this many CS110-student processes: 215
myth54 has this many CS110-student processes: 165
myth55 has this many CS110-student processes: 164
myth56 has this many CS110-student processes: 116
myth57 has this many CS110-student processes: 134
myth58 has this many CS110-student processes: 81
myth59 has this many CS110-student processes: 107
myth60 has this many CS110-student processes: 86
myth61 has this many CS110-student processes: 296
myth62 has this many CS110-student processes: 140
myth63 has this many CS110-student processes: 334
myth64 has this many CS110-student processes: 101
myth65 has this many CS110-student processes: 138
myth66 has this many CS110-student processes: 1415
Machine least loaded by CS110 students: myth58
Number of CS110 processes on least loaded machine: 81

real	0m3.959s
user	0m0.305s
sys	0m0.187s
poohbear@myth55:$

Multithreading and Networking

  • Each call to getNumProcesses spends most of its time off the CPU, waiting for a network connection to be established.
  • Idea: poll each myth machine in its own thread of execution. By doing so, we'll align the dead times of each getNumProcesses call, and the total execution time will plummet.
static void countCS110Processes(int num, const unordered_set<string>& sunetIDs,
                                map<int, int>& processCountMap, mutex& processCountMapLock, 
                                semaphore& permits) {
  permits.signal(on_thread_exit); // immediately schedule signal, ensures call no matter how we exit
  int count = getNumProcesses(num, sunetIDs);
  if (count >= 0) {
    lock_guard<mutex> lg(processCountMapLock);
    processCountMap[num] = count;
    cout << "myth" << num << " has this many CS110-student processes: " << count << endl;
  }
}

static void compileCS110ProcessCountMap(const unordered_set<string> sunetIDs, 
                                        map<int, int>& processCountMap) {  
  vector<thread> threads;
  mutex processCountMapLock;
  semaphore permits(8); // limit the number of threads to the number of CPUs
  for (int num = kMinMythMachine; num <= kMaxMythMachine; num++) {
    permits.wait();
    threads.push_back(thread(countCS110Processes, num, ref(sunetIDs),
                             ref(processCountMap), ref(processCountMapLock), ref(permits)));
  }
  for (thread& t: threads) t.join();
}

Multithreading and Networking

  • Here are key observations about the code on the prior slide:
    • Polling the myths concurrently means updating processCountMap concurrently. That means we need a mutex to guard access to processCountMap.
    • The implementation of compileCS110ProcessCountMap wraps a thread around each call to getNumProcesses while introducing a semaphore to limit the number of threads to a reasonably small number.
    • Note we use an overloaded version of signal. This one accepts the on_thread_exit tag as its only argument.
      • Rather than signaling the semaphore right away, this second version schedules the signal method to be invoked after the entire thread routine has exited, just as the thread is being destroyed.
      • That's the correct time to really signal if you're using the semaphore to track the number of active threads.
    • This new version, called myth-buster-concurrent, has a runtime that varies between 0.3 and 0.7 seconds. That's a substantial improvement!
    • The full implementation of myth-buster-concurrent sits right here.

Copy of Multithreading and Networking

By Jerry Cain

Copy of Multithreading and Networking

  • 1,544