CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Introduction to Networking
Servers / HTTP
HTTP and APIs
Networking System Calls
assign6: implement an HTTP Proxy that sits between a client device and a web server to monitor, block or modify web traffic.
Our time server chose to send a raw single-line string response to a client. A client connecting must be aware of this to know how to handle / use the response data.
Key idea: a client and server must agree on the format of the data being sent back and forth so they know what to send and how to parse the response.
HTTP ("HyperText Transfer Protocol") is the predominant protocol for Internet requests and responses (e.g. webpages, web resources, web APIs).
GET / HTTP/1.0 Host: www.google.com ... [BLANK LINE] {request body?}
The first line is the request line. It specifies general information about the kind of request and the protocol version. Following that is a list of headers, 1 per line, and sometimes a payload in the body.
GET /posts?sort=recent&limit=10 HTTP/1.0
The path can have query parameters; these are key-value pairs that appear after the "?" that can specify additional information about the request.
HTTP/1.0 200 OK Content-Type: text/html [BLANK LINE] {response body}
The first line is the status line. It specifies general information about how the request was handled and the protocol version. Following that is a list of headers, 1 per line, and the payload in the body.
HTTP response payloads contain the requested data. The payload format could be:
We can play around with HTTP requests and responses using browser tools and telnet.
Both will be useful for testing assign6!
wget is a command line utility that, given a URL, downloads a single document (HTML document, image, video, etc.) and saves a copy of it to the current working directory.
web-get is a program that, given a URL, downloads a single document (HTML document, image, video, etc.) and saves a copy of it to the current working directory.
int main(int argc, char *argv[]) {
if (argc != 2) {
cerr << "Usage: " << argv[0] << " <url>" << endl;
return kWrongArgumentCount;
}
// string pair of <host, path>
pair<string, string> hostAndPath = parseURL(argv[1]);
fetchContent(hostAndPath.first, hostAndPath.second);
return 0;
}
Step 1: parse the specified URL into the host and path components
Step 1: parse the specified URL into the host and path components
static pair<string, string> parseURL(string url) {
// If the URL starts with the protocol e.g. http://, remove it
if (startsWith(url, kProtocolPrefix)) {
url = url.substr(kProtocolPrefix.size());
}
// Search for the first /
size_t found = url.find('/');
// If there is none, the path should be /
if (found == string::npos) return make_pair(url, "/");
// Otherwise, the host is what is before the /, and the path is after the /
string host = url.substr(0, found);
string path = url.substr(found);
return make_pair(host, path);
}
int main(int argc, char *argv[]) {
if (argc != 2) {
cerr << "Usage: " << argv[0] << " <url>" << endl;
return kWrongArgumentCount;
}
// string pair of <host, path>
pair<string, string> hostAndPath = parseURL(argv[1]);
fetchContent(hostAndPath.first, hostAndPath.second);
return 0;
}
Step 2: Send an HTTP GET request to the server for that resource
static void fetchContent(const string& host, const string& path) {
// Create a connection to the server on the HTTP port
int socketDescriptor = createClientSocket(host, kDefaultHTTPPort);
if (socketDescriptor == kClientSocketError) {
cerr << "Count not connect to host named \"" << host << "\"." << endl;
return;
}
sockbuf socketBuffer(socketDescriptor);
iosockstream socketStream(&socketBuffer);
// Send our request (using HTTP/1.0 for simpler requests)
socketStream << "GET " << path << " HTTP/1.0\r\n";
socketStream << "Host: " << host << "\r\n";
socketStream << "\r\n" << flush;
readResponse(socketStream, getFileName(path));
}
Step 2: Send an HTTP GET request to the server for that resource
Note: It's standard HTTP-protocol practice that each line, including the blank line that marks the end of the request, end in CRLF (short for carriage-return-line-feed), which is '\r' following by '\n'. We must also flush!
Step 3: Read through the server's HTTP response and save its payload data to a file
Step 3: Read through the server's HTTP response and save its payload data to a file
static void readResponse(iosockstream& socketStream, const string& filename) {
// Skip the status line and headers (we don't need any information from them)
while (true) {
string line;
getline(socketStream, line);
if (line.empty() || line == "\r") break;
}
readAndSavePayload(socketStream, filename);
}
We keep reading lines until we encounter one that is empty or "\r" (getline consumes the \n). That means we have gotten to the payload. We include line.empty() in case the server forgot the "\r".
Step 3: Read through the server's HTTP response and save its payload data to a file
static void readAndSavePayload(iosockstream& socketStream, const string& filename) {
ofstream output(filename, ios::binary); // don't assume it's text
size_t totalBytes = 0;
while (!socketStream.fail()) {
char buffer[kBufferSizeBytes] = {'\0'};
socketStream.read(buffer, sizeof(buffer));
totalBytes += socketStream.gcount();
output.write(buffer, socketStream.gcount());
}
cout << "Total number of bytes fetched: " << totalBytes << endl;
}
We won't focus too much on the intricacies of this function, but it reads the rest of the response as binary data, and saves it to a file in chunks. Once the server sends the payload, it closes its end of the connection which the client sees as "EOF".
Let's write a web application for finding valid scrabble words given certain letters.
Server
Client
BROWSER: scrabble-words.com, please!
sure, here's HTML for that page.
Server
Client
WEBPAGE: scrabble words for "aebght", please!
sure, here's a list of words.
Server
Client
APP: scrabble words for "aebght", please!
sure, here's a list of words.
Next time: implementing scrabble server, and learning about how createClientSocket and createServerSocket are implemented.