CS110 Lecture 22: HTTP and APIs
CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
CS110 Topic 4: How can we write programs that communicate over a network with other programs?
Learning About Networking
Introduction to Networking
Servers / HTTP
HTTP and APIs
Networking System Calls
Lecture 20
Lecture 21
Today
Lecture 23
assign6: implement an HTTP Proxy that sits between a client device and a web server to monitor, block or modify web traffic.
Learning Goals
- Gain more practice with the client-server model
- Understand the HTTP protocol for making requests and responses
- Write a client program that makes HTTP requests
- Write a server program that sends back HTTP responses
Plan For Today
- Recap: Protocols and HTTP
- HTTP Client Example: wget
- HTTP Server Example: scrabble
Plan For Today
- Recap: Protocols and HTTP
- HTTP Client Example: wget
- HTTP Server Example: scrabble
Data Protocols
Our time server chose to send a raw single-line string response to a client. A client connecting must be aware of this to know how to handle / use the response data.
Key idea: a client and server must agree on the format of the data being sent back and forth so they know what to send and how to parse the response.
- A protocol is a specification dictating how two computers should should converse. By respecting a protocol, both the client and server know they'll understand each other.
HTTP ("HyperText Transfer Protocol") is the predominant protocol for Internet requests and responses (e.g. webpages, web resources, web APIs).
HTTP Request Format
GET / HTTP/1.0 Host: www.google.com ... [BLANK LINE] {request body?}
The first line is the request line. It specifies general information about the kind of request and the protocol version. Following that is a list of headers, 1 per line, and sometimes a payload in the body.
HTTP Request Format
GET /posts?sort=recent&limit=10 HTTP/1.0
The path can have query parameters; these are key-value pairs that appear after the "?" that can specify additional information about the request.
HTTP Response Format
HTTP/1.0 200 OK Content-Type: text/html [BLANK LINE] {response body}
The first line is the status line. It specifies general information about how the request was handled and the protocol version. Following that is a list of headers, 1 per line, and the payload in the body.
HTTP Response Format
HTTP response payloads contain the requested data. The payload format could be:
- HTML (a webpage) for a browser
- an image, file or other non-text data
- JSON - "Javascript Object Notation": common text format for data types like maps, lists, strings, etc. Used for sending data that can be parsed by another program.
- or others (XML, etc.)
Demo: HTTP Requests/Responses using your browser and telnet
Browser and Telnet
We can play around with HTTP requests and responses using browser tools and telnet.
- Browser developer tools show all HTTP requests and responses being sent for us
- telnet lets us "phone" a server and manually send/receive HTTP requests/responses
Both will be useful for testing assign6!
Plan For Today
- Recap: Protocols and HTTP
- HTTP Client Example: wget
- HTTP Server Example: scrabble
wget
wget is a command line utility that, given a URL, downloads a single document (HTML document, image, video, etc.) and saves a copy of it to the current working directory.
- Let's see a quick demo
- wget works by sending an HTTP GET request to the specified URL!
- We can implement our own version called web-get that relies on our knowledge of HTTP requests and responses to do the same thing.
web-get
web-get is a program that, given a URL, downloads a single document (HTML document, image, video, etc.) and saves a copy of it to the current working directory.
- parse the specified URL into the host and path components
- Send an HTTP GET request to the server for that resource
- Read through the server's HTTP response and save its payload data to a file
web-get
int main(int argc, char *argv[]) {
if (argc != 2) {
cerr << "Usage: " << argv[0] << " <url>" << endl;
return kWrongArgumentCount;
}
// string pair of <host, path>
pair<string, string> hostAndPath = parseURL(argv[1]);
fetchContent(hostAndPath.first, hostAndPath.second);
return 0;
}
Step 1: parse the specified URL into the host and path components
web-get
Step 1: parse the specified URL into the host and path components
static pair<string, string> parseURL(string url) {
// If the URL starts with the protocol e.g. http://, remove it
if (startsWith(url, kProtocolPrefix)) {
url = url.substr(kProtocolPrefix.size());
}
// Search for the first /
size_t found = url.find('/');
// If there is none, the path should be /
if (found == string::npos) return make_pair(url, "/");
// Otherwise, the host is what is before the /, and the path is after the /
string host = url.substr(0, found);
string path = url.substr(found);
return make_pair(host, path);
}
web-get
int main(int argc, char *argv[]) {
if (argc != 2) {
cerr << "Usage: " << argv[0] << " <url>" << endl;
return kWrongArgumentCount;
}
// string pair of <host, path>
pair<string, string> hostAndPath = parseURL(argv[1]);
fetchContent(hostAndPath.first, hostAndPath.second);
return 0;
}
Step 2: Send an HTTP GET request to the server for that resource
web-get
static void fetchContent(const string& host, const string& path) {
// Create a connection to the server on the HTTP port
int socketDescriptor = createClientSocket(host, kDefaultHTTPPort);
if (socketDescriptor == kClientSocketError) {
cerr << "Count not connect to host named \"" << host << "\"." << endl;
return;
}
sockbuf socketBuffer(socketDescriptor);
iosockstream socketStream(&socketBuffer);
// Send our request (using HTTP/1.0 for simpler requests)
socketStream << "GET " << path << " HTTP/1.0\r\n";
socketStream << "Host: " << host << "\r\n";
socketStream << "\r\n" << flush;
readResponse(socketStream, getFileName(path));
}
Step 2: Send an HTTP GET request to the server for that resource
Note: It's standard HTTP-protocol practice that each line, including the blank line that marks the end of the request, end in CRLF (short for carriage-return-line-feed), which is '\r' following by '\n'. We must also flush!
web-get
Step 3: Read through the server's HTTP response and save its payload data to a file
- The server's response will contain a status line, headers, and a payload
- We don't actually care about the status line or headers in this case - let's skip them. We must read them in (even if we don't need them) in order to get to the payload.
- Once we get to the payload, we can save that part to a file
web-get
Step 3: Read through the server's HTTP response and save its payload data to a file
static void readResponse(iosockstream& socketStream, const string& filename) {
// Skip the status line and headers (we don't need any information from them)
while (true) {
string line;
getline(socketStream, line);
if (line.empty() || line == "\r") break;
}
readAndSavePayload(socketStream, filename);
}
We keep reading lines until we encounter one that is empty or "\r" (getline consumes the \n). That means we have gotten to the payload. We include line.empty() in case the server forgot the "\r".
web-get
Step 3: Read through the server's HTTP response and save its payload data to a file
static void readAndSavePayload(iosockstream& socketStream, const string& filename) {
ofstream output(filename, ios::binary); // don't assume it's text
size_t totalBytes = 0;
while (!socketStream.fail()) {
char buffer[kBufferSizeBytes] = {'\0'};
socketStream.read(buffer, sizeof(buffer));
totalBytes += socketStream.gcount();
output.write(buffer, socketStream.gcount());
}
cout << "Total number of bytes fetched: " << totalBytes << endl;
}
We won't focus too much on the intricacies of this function, but it reads the rest of the response as binary data, and saves it to a file in chunks. Once the server sends the payload, it closes its end of the connection which the client sees as "EOF".
Plan For Today
- Recap: Protocols and HTTP
- HTTP Client Example: wget
- HTTP Server Example: scrabble
HTTP Server: Scrabble Word Finder
Let's write a web application for finding valid scrabble words given certain letters.
Web Applications
Server
Client
BROWSER: scrabble-words.com, please!
sure, here's HTML for that page.
Web Applications
Server
Client
WEBPAGE: scrabble words for "aebght", please!
sure, here's a list of words.
Mobile Applications
Server
Client
APP: scrabble words for "aebght", please!
sure, here's a list of words.
Web Applications and APIs
- A web server can handle different types of requests. Some can send back HTML for a browser, others can be for non-HTML data for programs or webpages to parse.
- A web application is like a "dynamic webpage" - the page can make more requests to the server while the user interacts with it.
- A web API (Application Programming Interface) is the list of request types that a given server can handle
- More generally: an API is a set of functions one can use in order to build a larger piece of software.
- APIs can be functions you import (like #include <stdio.h>) or types of requests servers can respond to (like the NASA API, or other web APIs like this or this).
- Any kind of program can send/receive HTTP requests - webpages, apps, etc. When building a product, you may have the same server API used by your webpage and app.
Recap
- Recap: Protocols and HTTP
- HTTP Client Example: wget
- HTTP Server Example: scrabble
Next time: implementing scrabble server, and learning about how createClientSocket and createServerSocket are implemented.
CS110 Lecture 22: HTTP and APIs
By Nick Troccoli
CS110 Lecture 22: HTTP and APIs
- 2,453