Autumn 2021
Jerry Cain
PDF
createClientSocket
. For now, view it as a built-in that sets up a bidirectional pipe between a client and a server running on the specified host (e.g. myth64
) and bound to the specified port number (e.g. 12345).int main(int argc, char *argv[]) {
int clientSocket = createClientSocket("myth64.stanford.edu", 12345);
assert(clientSocket >= 0);
sockbuf sb(clientSocket);
iosockstream ss(&sb);
string timeline;
getline(ss, timeline);
cout << timeline << endl;
return 0;
}
We have been talking about how to open connections between two machines. These two machines are typically a server and a client.
Formally, a protocol is a specification dictating how two computers should should converse. By respecting a protocol, both client and server know they'll understand each other, even if they are running different software written by different people in different programming languages, running in different environments on varying architectures.
Network protocols are generally codified in Requests For Comments, or RFCs.
The specification for HTTP/1.1, for instance, is right here. It’s long and reads like legalese, but if you implement a client program that respects the HTTP/1.1 protocol, the client application can interact with any server that speaks HTTP/1.1.
The HTTP protocol is pretty much universal in the networking world. Since so many programs speak and understand it, it’s a common protocol for exchanging information and executing commands over network connections.
Once a client establishes a connection to a server, the client sends a request, and then the server sends a response. The client and server can go back and forth several times, issuing several requests and responses over the same connection before closing it.
To keep things simple for lecture, we'll only look at examples where the client sends a single request and ingests a single response from the server.
An HTTP request (sent by a client to a server) looks something like the following:
GET /search?q=cats&tbm=isch HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Accept-Language: en-US,en;q=0.5
The first line is called the start line or request line.
The first part of the request line is a verb. HTTP supports many verbs as part of the language, but you should know the following:
GET: requests some information from the server
POST: upload information to the server with the expectation that the server stores it or something related to it
GET /search?q=cats&tbm=isch HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Accept-Language: en-US,en;q=0.5
Following the request line are zero or more lines of request headers.
Headers are key/value pairs, written as Key: Value, with each pair on a separate line. There are many standard headers, although a program can add any extra, non-standard headers if it likes. In the example above, my browser is confessing that it's Firefox (through the User-Agent header) and that, if possible, any content should be expressed in American English (through the Accept-Language header).
GET /search?q=cats&tbm=isch HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Accept-Language: en-US,en;q=0.5
POST /login HTTP/1.1
Host: myinsecurebank.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Accept-Language: en-US,en;q=0.5
Content-Length: 29
username=poohbear&password=pw
If the client needs to include a payload with its request (e.g. to upload a new profile picture for your LinkedIn profile), the client can inline the payload after the blank line.
This typically only happens with POST requests, although GET requests are technically permitted to upload a payload as well.
One example? A request to log into a website might look something like this:
Another? A request to upload an image to might look like this:
POST /uploadpp HTTP/1.1
Host: www.clubcardinal.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Accept-Language: en-US,en;q=0.5
Content-Length: 3829102
{file: {name: "jerry-and-doris.png", data: "LyoqCiAqIGludC5jYwogKiAtLS0tLS0KICogU2hvcnQgcHJvZ3JhbSB1c2VkIHRvIHRlc3Qgc3RzaC4gaW50IHNwaW5zIGZvciA8bj4KICog
c2Vjb25kcyBpbiBvbmUtc2Vjb25kIGJ1cnN0cywgYW5kIHRoZW4gc2VuZHMgaXRzZWxmIGEKICogU0lHSU5ULgogKi8KI2luY2x1ZGUgPGlvc3RyZWFtPiAgICAgICAvLyBmb3IgY2VycgojaW5jbHVk
Z2V0cGlkCiNpbmNsdWRlIDxzeXMvd2FpdC5oPiAgICAgLy8gZm9yIHJhaXNlLCBTSUdJTlQKdXNpbmcgbmFtZXNwYWNlIHN0ZDsKCnN0YXRpYyBjb25zdCBpbnQga1dyb25nQXJndW1lbnRDb3VudCA9
IDE7CnN0YXRpYyBjb25zdCBpbnQga1JhaXNlRmFpbGVkID0gMjsKaW50IG1haW4oaW50IGFyZ2MsIGNoYXIgKmFyZ3ZbXSkgewogIGlmIChhcmdjICE9IDIpIHsKICAgIGNlcnIgPDwgIlVzYWdlOiAi
IDw8IGFyZ3ZbMF0gPDwgIiA8bj4iIDw8IGVuZGw7CiAgICByZXR1cm4ga1dyb25nQXJndW1lbnRDb3VudDsKICB9CgogIHNpemVfdCBzZWNzID0gYXRvaShhcmd2WzFdKTsKICBmb3IgKHNpemVfdCBp
ID0gMDsgaSA8IHNlY3M7IGkrKykgc2xlZXAoMSk7CgogIGlmIChyYWlzZShTSUdJTlQpICE9IDApIHsKICAgIGNlcnIgPDwgIlByb2JsZW0gaW50ZXJydXB0aW5nIHByb2Nlc3MgIiA8PCBnZXRwaWQoKSA8PCAiLiIgPDwgZW5kbDsKICAgIHJldHVybiBrUmFpc2VGYWlsZWQ7CiAgfQoKICByZXR1cm4g
// lots and lots of base64-encoded bytes
MDsKfQo="}}
HTTP/1.1 200 OK
Date: Fri, 05 Nov 2021 00:39:59 GMT
Server: Apache
Accept-Ranges: bytes
Transfer-Encoding: chunked
Content-Type: text/html
3f6
<html lang="">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no" />
<title>CS110: Principles of Computer Systems, Autumn 2021</title>
<base href="https://web.stanford.edu/class/archive/cs/cs110/cs110.1222/">
<link rel="stylesheet" href="assets/vendors/bootstrap-4.0.0-beta.2.min.css" />
<link rel="stylesheet" href="assets/styles.css" type="text/css" />
<script src="assets/cs110-code.js" async="async"></script>
<script src="https://kit.fontawesome.com/ded873da0c.js"></script>
An HTTP response is structured as below:
The start line specifies the HTTP version that the server can speak, as well as an HTTP status code. I’m sure you’ve seen status code 404 Not Found and maybe even 403 Forbidden. and 500 Internal Server Error.
Response headers follow—e.g. Content-Type indicates that we’ve been sent an HTML page. Other common headers include Set-Cookie, used to send browser cookies, and Cache-Control, which tells the browser whether it can cache the page or not.
A blank line indicates the end of the headers, and then the server sends the payload.
static const string kProtocolPrefix = "http://";
static const string kDefaultPath = "/";
static pair<string, string> parseURL(string url) {
if (startsWith(url, kProtocolPrefix))
url = url.substr(kProtocolPrefix.size());
size_t found = url.find('/');
if (found == string::npos)
return make_pair(url, kDefaultPath);
string host = url.substr(0, found);
string path = url.substr(found);
return make_pair(host, path);
}
int main(int argc, char *argv[]) {
pullContent(parseURL(argv[1]));
return 0;
}
wget
(continued)
pullContent
, of course, needs to manage everything, including the networking.static const unsigned short kDefaultHTTPPort = 80;
static void pullContent(const pair<string, string>& components) {
int client = createClientSocket(components.first, kDefaultHTTPPort);
if (client == kClientSocketError) {
cerr << "Could not connect to host named \"" << components.first << "\"." << endl;
return;
}
sockbuf sb(client);
iosockstream ss(&sb);
issueRequest(ss, components.first, components.second);
skipToPayload(ss);
savePayload(ss, getFileName(components.second));
}
createClientSocket
function for our time-client
. This time, we're connecting to real but arbitrary web servers that speak HTTP.issueRequest
, skipToPayload
, and savePayload
subdivide the client-server conversation into manageable chunks.
Emulation of wget
(continued)
issueRequest
, which generates the smallest legitimate HTTP request possible and sends it over to the server.
static void issueRequest(iosockstream& ss, const string& host, const string& path) {
ss << "GET " << path << " HTTP/1.0\r\n";
ss << "Host: " << host << "\r\n";
ss << "\r\n";
ss.flush();
}
'\r'
following by '\n'
.flush
is necessary to ensure all character data is pressed over the wire and consumable at the other end.flush
, the client transitions from supply to ingest mode. Remember, the iosockstream
is read/write, because the socket descriptor backing it is bidirectional.skipHeader
reads through and discards all of the HTTP response header lines until it encounters either a blank line or one that contains nothing other than a '\r'.
The blank line is, indeed, supposed to be "\r\n"
, but some servers—often hand-rolled ones—are sloppy, so we treat the '\r'
as optional. Recall that getline chews up the '\n'
, but it won't chew up the '\r'
.static void skipToPayload(iosockstream& ss) {
string line;
do {
getline(ss, line);
} while (!line.empty() && line != "\r");
}
wget
we're imitating—would ingest all of the lines of the response header into a data structure and allow it to influence its execution.
static string getFileName(const string& path) {
if (path.empty() || path[path.size() - 1] == '/') return "index.html";
size_t found = path.rfind('/');
return path.substr(found + 1);
}
static void savePayload(iosockstream& ss, const string& filename) {
ofstream output(filename, ios::binary); // don't assume it's text
size_t totalBytes = 0;
while (!ss.fail()) {
char buffer[2014] = {'\0'};
ss.read(buffer, sizeof(buffer));
totalBytes += ss.gcount();
output.write(buffer, ss.gcount());
}
cout << "Total number of bytes fetched: " << totalBytes << endl;
}
EOF
, and everything we read gets saved to disk.scrabble-word-finder
. The source code for this executable—completely unaware it'll be used in a larger networked application—can be found right here.poohbear@myth61:$ ./scrabble-word-finder lexical
ace
// many lines omitted for brevity
lexical
li
lice
lie
lilac
xi
poohbear@myth61:$
poohbear@myth61:$ ./scrabble-word-finder network
en
// many lines omitted for brevity
work
worn
wort
wot
wren
wrote
poohbear@myth61:$
scrabble-word-finder
is capable of.
myth54:13133
, we expect http://myth54:13133/lexical
and http://myth54:13133/network
to generate the following payloads:{
time: 0.223399,
cached: false,
possibilities: [
'ace',
// several words omitted
'lexical',
'li',
'lice',
'lie',
'lilac',
'xi'
]
}
{
time: 0.223399,
cached: false,
possibilities: [
'en',
// several words omitted
'work',
'worn',
'wort',
'wot',
'wren',
'wrote'
]
}
scrabble-word-finder.cc
to build the core of scrabble-word-finder-server.cc
.scrabble-word-finder
already outputs the primary content we need for our payload. We're packaging the payload as JSON instead of plain text, but we can still tap scrabble-word-finder
to generate the collection of formable words.subprocess_t
type and subprocess
function from Assignment 3.struct subprocess_t {
pid_t pid;
int supplyfd;
int ingestfd;
};
subprocess_t subprocess(char *argv[], bool supplyChildInput, bool ingestChildOutput);
main
function implementing our server:int main(int argc, char *argv[]) {
unsigned short port = extractPort(argv[1]);
int server = createServerSocket(port);
cout << "Server listening on port " << port << "." << endl;
ThreadPool pool(16);
map<string, vector<string>> cache;
mutex cacheLock;
while (true) {
struct sockaddr_in address;
// used to surface IP address of client
socklen_t size = sizeof(address); // also used to surface client IP address
bzero(&address, size);
int client = accept(server, (struct sockaddr *) &address, &size);
char str[INET_ADDRSTRLEN];
cout << "Received a connection request from "
<< inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN) << "." << endl;
pool.schedule([client, &cache, &cacheLock] {
publishScrabbleWords(client, cache, cacheLock);
});
}
return 0; // server never gets here, but not all compilers can tell
}
accept
are used to surface the client's IP address.address
, size
, and the inet_ntop
function until next week, when we'll talk more about them. Right now, it's a neat-to-see!ThreadPool
of size 16.publishScrabbleWords
will rely on our subprocess
function to marshal plain text output of scrabble-word-finder into JSON and publish that JSON as the payload of the HTTP response.publishScrabbleWords
and some of its helper functions.publishScrabbleWords
:static void publishScrabbleWords(int client, map<string, vector<string>>& cache,
mutex& cacheLock) {
sockbuf sb(client);
iosockstream ss(&sb);
string letters = getLetters(ss);
sort(letters.begin(), letters.end());
skipHeaders(ss);
struct timeval start;
gettimeofday(&start, NULL); // start the clock
cacheLock.lock();
auto found = cache.find(letters);
cacheLock.unlock(); // release lock immediately, iterator won't be invalidated by competing find calls
bool cached = found != cache.end();
vector<string> formableWords;
if (cached) {
formableWords = found->second;
} else {
const char *command[] = {"./scrabble-word-finder", letters.c_str(), NULL};
subprocess_t sp = subprocess(const_cast<char **>(command), false, true);
pullFormableWords(formableWords, sp.ingestfd); // function exits
waitpid(sp.pid, NULL, 0);
lock_guard<mutex> lg(cacheLock);
cache[letters] = formableWords;
}
struct timeval end, duration;
gettimeofday(&end, NULL); // stop the clock, server-computation of formableWords is complete
timersub(&end, &start, &duration);
double time = duration.tv_sec + duration.tv_usec/1000000.0;
ostringstream payload;
constructPayload(formableWords, cached, time, payload);
sendResponse(ss, payload.str());
}
pullFormableWords
and sendResponse
helper functions.static void pullFormableWords(vector<string>& formableWords, int ingestfd) {
stdio_filebuf<char> inbuf(ingestfd, ios::in);
istream is(&inbuf);
while (true) {
string word;
getline(is, word);
if (is.fail()) break;
formableWords.push_back(word);
}
}
static void sendResponse(iosockstream& ss, const string& payload) {
ss << "HTTP/1.1 200 OK\r\n";
ss << "Content-Type: text/javascript; charset=UTF-8\r\n";
ss << "Content-Length: " << payload.size() << "\r\n";
ss << "\r\n";
ss << payload << flush;
}
getLetters
and the constructPayload
helper functions. I omit the implementation of skipHeaders
—you saw it with web-get
—and constructJSONArray
, which you're welcome to view right here.static string getLetters(iosockstream& ss) {
string method, path, protocol;
ss >> method >> path >> protocol;
string rest;
getline(ss, rest);
size_t pos = path.rfind("/");
return pos == string::npos ? path : path.substr(pos + 1);
}
static void constructPayload(const vector<string>& formableWords, bool cached,
double time, ostringstream& payload) {
payload << "{" << endl;
payload << " time: " << time << "," << endl;
payload << " cached: " << boolalpha << cached << "," << endl;
payload << " possibilities: " << constructJSONArray(formableWords, 2) << endl;
payload << "}" << endl;
}
scrabble-word-finder-server
provided a single API call that resembles the types of API calls afforded by Google, Twitter, or Facebook to access search, tweet, or friend-graph data.