CS110: Principles of Computer Systems

Autumn 2021
Jerry Cain
PDF

There is a reason our server from last lecture is called expensive-server: it spends a lot of time busy waiting, particularly when there are no files to serve. If we run the server as is for one second, we get a continuous stream of cout statements like this:

myth57:$ more demo-the-waste
#!/bin/bash

./expensive-server &
pid=$!
sleep 1
kill $pid

myth57:$ ./demo-the-waste
Static file server listening on port 12345.
Num calls to accept: 1.
Num calls to accept: 2.
Num calls to accept: 3.
Num calls to accept: 4.
Num calls to accept: 5.
Num calls to accept: 6.
... lots of lines removed
Num calls to accept: 348319.
Num calls to accept: 348320.
Num calls to accept: 348321.
Num calls to accept: 348322.
Num calls to accept: 348323.
Num calls to accept: 348324.
myth57:$

Although we are successful in running a responsive server in a single thread, we don't like the idea of a spinning process, because it wastes resources that the operating system could dedicate to other processes (and it makes the fans on our computer sound like jet engines).
In truth, we do want to put our process to sleep unless:
- we have an incoming connection, or
- we have to send data on an already-open connection.
So, we need to get the OS involved, and we will now take a look at how we can do this.

Lecture 26: Nonblocking I/O, Event-Driven Programming

Linux has a scalable I/O event notification mechanism¹called epoll that can monitor a set of file descriptors to see whether there is any I/O ready for them. There are three system calls, as described below, that form the api for epoll.
```
int epoll_create1(int flags);
```
- This function creates an epoll object and returns a file descriptor. The only valid flag is EPOLL_CLOEXEC, which closes the descriptor on exec as you might expect.
```
int epoll_ctl(int epfd, int op, int fd,
              struct epoll_event *event);
```
- This function configures which descriptors are watched by the object, and op can be EPOLL_CTL_ADD, EPOLL_CTL_MOD, or EPOLL_CTL_DEL. We will investigate struct epoll_event on the next slide.
```
int epoll_wait(int epfd, struct epoll_event *events, 
               int maxevents, int timeout);
```
- This function waits for any of the events being monitored, until there is a timeout. It returns up to maxevents at once and populates the events array with each event that has occurred.

¹https://en.wikipedia.org/wiki/Epoll

Lecture 26: Nonblocking I/O, Event-Driven Programming

The struct epoll_event is defined as follows:

struct epoll_event {
  uint32_t events;
  epoll_data_t data;
};

A union is a data structure that can hold a single data type out of a set of data types, and it does so in a single memory location. The actual memory size of the union is that of the largest data type that can be stored.

typedef union epoll_data {
  void *ptr;
  int fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;

epoll_data_t is a typedef'd union, defined as follows:

The events member is a bit mask, and for our purposes, we care about three values:
- EPOLLIN : the file is available for reading
- EPOLLOUT : the file is available for writing
- EPOLLET : This sets the file descriptor to be "edge triggered", meaning that events are delivered when there is a change on the descriptor (e.g., there is data to be read).

static const unsigned short kDefaultPort = 33333;
int main(int argc, char **argv) {
  int server = createServerSocket(kDefaultPort);
  if (server == kServerSocketFailure) {
    cerr << "Failed to start server.  Port " << kDefaultPort << " is probably already in use." << endl;
    return 1;
  }

  cout << "Server listening on port " << kDefaultPort << endl;
  runServer(server);
  return 0;
}

The efficient server we will set up uses epoll to call functions when file descriptors are able to input or output data.
Let's start with main:

Lecture 26: Nonblocking I/O, Event-Driven Programming

main simply sets up a server, and then calls the runServer function, which we will look at next.

static void runServer(int server) {
  setAsNonBlocking(server);
  int ws = buildInitialWatchSet(server);

The runServer function first converts the server socket to be nonblocking, and sets up the epoll watch around the socket:

Lecture 26: Nonblocking I/O, Event-Driven Programming

static const int kMaxEvents = 64;
static int buildInitialWatchSet(int server) {
  int ws = epoll_create1(0);
  struct epoll_event info = {.events = EPOLLIN | EPOLLET, .data = {.fd = server}};
  epoll_ctl(ws, EPOLL_CTL_ADD, server, &info);
  return ws;
}

Let's jump to the buildInitialWatchSet function:

This function creates an epoll watch set around the supplied server socket. We register an event to show our interest in being notified when the server socket is available for accept and read operations via EPOLLIN, and we also note that the event notifications should be edge triggered (EPOLLET) which means that we'd only like to be notified that a call to accept or read that would have returned -1 before will now return something nonnegative.

Lecture 26: Nonblocking I/O, Event-Driven Programming

  struct epoll_event events[kMaxEvents];
  while (true) {
    int numEvents = epoll_wait(ws, events, kMaxEvents, /* timeout = */ -1);

Continuing where we left off with runServer, the function next creates an array of struct epoll_event objects to hold the events we may encounter.
Then it sets up a while (true) loop and sets up the only blocking system call in the server, epoll_wait().

We never want to time out on the call, and when nothing interesting is happening with our watch set, our process is put to sleep in a similar fashion to the waitpids we have seen previously in class.
Multiple events can trigger at the same time, and we get a count (numEvents) of the number of events put into the events array.
(continued on next slide)

When one or more of our file descriptors in the watch set trigger, we handle the events in the events array, one at a time. For our server, there could be three different events:
- If the event was a connection request, events[i].data.fd will be the server's file descriptor, and we accept a new connection (we will look at that function shortly):

Lecture 26: Nonblocking I/O, Event-Driven Programming

    for (int i = 0; i < numEvents; i++) {
      if (events[i].data.fd == server) {
        acceptNewConnections(ws, server);
      }

      } else if (events[i].events & EPOLLIN) { // we're still reading the client's request
        consumeAvailableData(ws, events[i].data.fd);
      }

- If the event indicates that it has incoming data (EPOLLIN), then we need to consume the data in the request:

      } else { // events[i].events & EPOLLOUT
        publishResponse(events[i].data.fd);
      }
    }
  }
}

- If the event indicates that it has outgoing data (EPOLLOUT), then we publish data to that file descriptor:

Let's look at the acceptNewConnections function next.
We may have multiple connections that have come in at once, so we need to accept all of them. Therefore, we have a while(true) loop that continues until there are no more connections to be made:

Lecture 26: Nonblocking I/O, Event-Driven Programming

static void acceptNewConnections(int ws, int server) {
  while (true) {
    int clientSocket = accept4(server, NULL, NULL, SOCK_NONBLOCK);
    if (clientSocket == -1) return;

    struct epoll_event info = {.events = EPOLLIN | EPOLLET, .data = {.fd = clientSocket}};
    epoll_ctl(ws, EPOLL_CTL_ADD, clientSocket, &info);
  }
}

When we make a connection, we update our epoll watch list to include our client socket and the request to monitor it for input (again, as an edge-triggered input).
We use the epoll_ctl system call to register the new addition to our watch list:

We have two more functions to look at for our server: consumeAvailableData and publishResponse. The first is more complicated, but also happens first, so let's look at it now.
The consumeAvailableData function attempts to read in as much data as it can from the server, until either there isn't data to be read (meaning we have to read it later), or until we get enough information in the header to respond. The second condition is met when we receive two newlines, or "\r\n\r\n":

Lecture 26: Nonblocking I/O, Event-Driven Programming

static const size_t kBufferSize = 1;
static const string kRequestHeaderEnding("\r\n\r\n");
static void consumeAvailableData(int ws, int client) {
  static map<int, string> requests; // tracks what's been read in thus far over each client socket
  size_t pos = string::npos;
  while (pos == string::npos) {
    char buffer[kBufferSize];
    ssize_t count = read(client, buffer, kBufferSize);
    if (count == -1 && errno == EWOULDBLOCK) return; // not done reading everything yet, so return
    if (count <= 0) { close(client); break; } // passes? then bail on connection, as it's borked
    requests[client] += string(buffer, buffer + count);
    pos = requests[client].find(kRequestHeaderEnding);
    if (pos == string::npos) continue;

Notice the static map<> variable inside the function. This map persists across all calls to the function, and tracks the data we have read, per client.
If we still have data to read, but we have not yet gotten to our header ending, we keep reading data (because it is available). (continued on next slide)

Once we receive the header ending, we can log how many active connections we have, and then we also print out the header we've received.
Next, we modify our epoll watch event to also trigger when data needs to be written on the client (this will happen when we publish our response).

Lecture 26: Nonblocking I/O, Event-Driven Programming

    cout << "Num Active Connections: " << requests.size() << endl;
    cout << requests[client].substr(0, pos + kRequestHeaderEnding.size()) << flush;
    struct epoll_event info = {.events = EPOLLOUT | EPOLLET, .data = {.fd = client}};
    epoll_ctl(ws, EPOLL_CTL_MOD, client, &info); // MOD == modify existing event
  }

  requests.erase(client);
}

Notice that we don't break out of the while loop at this point! We continue looping until we have read all of the available data; otherwise, epoll_wait will not trigger again, because there is still data waiting for us (e.g., the rest of the response). The only time we exit the loop (see the previous slide) is when we have no more data to read, at which point we also close the connection.
Also notice (previous slide) that we return when we encounter a potential block -- we don't close the connection, and we don't erase the client entry in our requests map. Recall that as a static variable, the map persists, as does the requests map entry.
Once we exit the loop because there is no more data, we erase the client entry in our requests map, because it is no longer needed.

Finally, let's turn our attention to publishResponse.
Our response needs to be a proper HTTP response, and we supplement this with our HTML code for the website we will push to the client.

Lecture 26: Nonblocking I/O, Event-Driven Programming

static const string kResponseString("HTTP/1.1 200 OK\r\n\r\n"
        "<b>Thank you for your request! We're working on it! No, really!</b><br/>"
        "<br/><img src=\"http://vignette3.wikia.nocookie.net/p__/images/e/e0/"
        "Agnes_Unicorn.png/revision/latest?cb=20160221214120&path-prefix=protagonist\"/>");

static void publishResponse(int client) {
  static map<int, size_t> responses;
  responses[client]; // insert a 0 if key isn't present
  while (responses[client] < kResponseString.size()) {
    ssize_t count = write(client, kResponseString.c_str() + responses[client], 
                          kResponseString.size() - responses[client]);
    if (count == -1 && errno == EWOULDBLOCK) return;
    if (count == -1) break;
    assert(count > 0);
    responses[client] += count;
  }

  responses.erase(client);
  close(client);
}

As we saw in consumeAvailableData, we have a static map, this time populated with the file descriptor of the client we are responding to, with the values corresponding to the number of bytes we have sent. Remember, no blocking allowed!
We attempt to write all of the data in the response, but if we can't, we don't block and we return, knowing that the responses map will persist until the next time we call the function to push data. We erase the entry from the map and close the connection once we have sent all the data for the response (which may be after multiple calls to the function).

As we have seen over the past few lectures, there are many ways to build a server.
The best way depends on your own goals, your system setup (OS, computer type, network, etc.), but there are two key things that your servers should prioritize:
- Accept and respond to as many connections as they can
- Respond quickly to client requests
Both of these issues can be managed using threads or processes, or by using non-blocking I/O and OS-savvy waiting. You should probably not write a server that busy-waits (and you should almost always avoid busy-waiting, in general)
Whatever you decide, you do need to understand some lower-level ideas, such as those we covered in class this quarter.
Networking comes with a number of challenges, particularly when you prioritize as above. There are more details than we've seen so far (see below). For some of those details (and even lower-level concerns), consider taking CS 144.

Lecture 26: Nonblocking I/O, Event-Driven Programming

https://xkcd.com/869/