CS110 Lecture 22: HTTP and APIs

CS110: Principles of Computer Systems

Winter 2021-2022

Stanford University

Instructors: Nick Troccoli and Jerry Cain

The Stanford University logo

CS110 Topic 4: How can we write programs that communicate over a network with other programs?

Learning About Networking

Introduction to  Networking

Servers / HTTP

HTTP and APIs

Networking System Calls

Lecture 20

Lecture 21

Today

Lecture 23

assign6: implement an HTTP Proxy that sits between a client device and a web server to monitor, block or modify web traffic.

Learning Goals

  • Gain more practice with the client-server model
  • Understand the HTTP protocol for making requests and responses
  • Write a client program that makes HTTP requests
  • Write a server program that sends back HTTP responses

Plan For Today

  • Recap: Protocols and HTTP
  • HTTP Client Example: wget
  • HTTP Server Example: scrabble

Plan For Today

  • Recap: Protocols and HTTP
  • HTTP Client Example: wget
  • HTTP Server Example: scrabble

Data Protocols

Our time server chose to send a raw single-line string response to a client.  A client connecting must be aware of this to know how to handle / use the response data.

Key idea: a client and server must agree on the format of the data being sent back and forth so they know what to send and how to parse the response.

  • A protocol is a specification dictating how two computers should should converse. By respecting a protocol, both the client and server know they'll understand each other.

HTTP ("HyperText Transfer Protocol") is the predominant protocol for Internet requests and responses (e.g. webpages, web resources, web APIs).

HTTP Request Format

GET / HTTP/1.0
Host: www.google.com
...
[BLANK LINE]
{request body?}

The first line is the request line.  It specifies general information about the kind of request and the protocol version.  Following that is a list of headers, 1 per line, and sometimes a payload in the body.

HTTP Request Format

GET /posts?sort=recent&limit=10 HTTP/1.0

The path can have query parameters; these are key-value pairs that appear after the "?" that can specify additional information about the request.

HTTP Response Format

HTTP/1.0 200 OK
Content-Type: text/html
[BLANK LINE]
{response body}

The first line is the status line.  It specifies general information about how the request was handled and the protocol version.  Following that is a list of headers, 1 per line, and the payload in the body.

HTTP Response Format

HTTP response payloads contain the requested data.  The payload format could be:

  • HTML (a webpage) for a browser
  • an image, file or other non-text data
  • JSON - "Javascript Object Notation": common text format for data types like maps, lists, strings, etc.  Used for sending data that can be parsed by another program.
  • or others (XML, etc.)

Demo: HTTP Requests/Responses using your browser and telnet

Browser and Telnet

We can play around with HTTP requests and responses using browser tools and telnet.

  • Browser developer tools show all HTTP requests and responses being sent for us
  • telnet lets us "phone" a server and manually send/receive HTTP requests/responses

 

Both will be useful for testing assign6!

Plan For Today

  • Recap: Protocols and HTTP
  • HTTP Client Example: wget
  • HTTP Server Example: scrabble

wget

wget is a command line utility that, given a URL, downloads a single document (HTML document, image, video, etc.) and saves a copy of it to the current working directory.

  • Let's see a quick demo
  • wget works by sending an HTTP GET request to the specified URL!
  • We can implement our own version called web-get that relies on our knowledge of HTTP requests and responses to do the same thing.

web-get

web-get is a program that, given a URL, downloads a single document (HTML document, image, video, etc.) and saves a copy of it to the current working directory.

  1. ​parse the specified URL into the host and path components
  2. Send an HTTP GET request to the server for that resource
  3. Read through the server's HTTP response and save its payload data to a file

web-get

int main(int argc, char *argv[]) {
  if (argc != 2) {
    cerr << "Usage: " << argv[0] << " <url>" << endl;
    return kWrongArgumentCount;
  }
  
  // string pair of <host, path> 
  pair<string, string> hostAndPath = parseURL(argv[1]);
  fetchContent(hostAndPath.first, hostAndPath.second);
  return 0;
}

Step 1: ​parse the specified URL into the host and path components

web-get

Step 1: ​parse the specified URL into the host and path components

static pair<string, string> parseURL(string url) {
  // If the URL starts with the protocol e.g. http://, remove it
  if (startsWith(url, kProtocolPrefix)) {
    url = url.substr(kProtocolPrefix.size());
  }

  // Search for the first /
  size_t found = url.find('/');

  // If there is none, the path should be /
  if (found == string::npos) return make_pair(url, "/");

  // Otherwise, the host is what is before the /, and the path is after the /
  string host = url.substr(0, found);
  string path = url.substr(found);
  return make_pair(host, path);
}

web-get

int main(int argc, char *argv[]) {
  if (argc != 2) {
    cerr << "Usage: " << argv[0] << " <url>" << endl;
    return kWrongArgumentCount;
  }
  
  // string pair of <host, path> 
  pair<string, string> hostAndPath = parseURL(argv[1]);
  fetchContent(hostAndPath.first, hostAndPath.second);
  return 0;
}

Step 2: Send an HTTP GET request to the server for that resource

web-get

static void fetchContent(const string& host, const string& path) {
  // Create a connection to the server on the HTTP port
  int socketDescriptor = createClientSocket(host, kDefaultHTTPPort);
  if (socketDescriptor == kClientSocketError) {
    cerr << "Count not connect to host named \"" << host << "\"." << endl;
    return;
  }

  sockbuf socketBuffer(socketDescriptor);
  iosockstream socketStream(&socketBuffer);

  // Send our request (using HTTP/1.0 for simpler requests)
  socketStream << "GET " << path << " HTTP/1.0\r\n";
  socketStream << "Host: " << host << "\r\n";
  socketStream << "\r\n" << flush;

  readResponse(socketStream, getFileName(path));
}

Step 2: Send an HTTP GET request to the server for that resource

Note: It's standard HTTP-protocol practice that each line, including the blank line that marks the end of the request, end in CRLF (short for carriage-return-line-feed), which is '\r' following by '\n'.  We must also flush!

web-get

Step 3: Read through the server's HTTP response and save its payload data to a file

  • The server's response will contain a status line, headers, and a payload
  • We don't actually care about the status line or headers in this case - let's skip them.  We must read them in (even if we don't need them) in order to get to the payload.
  • Once we get to the payload, we can save that part to a file

web-get

Step 3: Read through the server's HTTP response and save its payload data to a file

static void readResponse(iosockstream& socketStream, const string& filename) {
  // Skip the status line and headers (we don't need any information from them)
  while (true) {
    string line;
    getline(socketStream, line);
    if (line.empty() || line == "\r") break;
  }

  readAndSavePayload(socketStream, filename);
}

We keep reading lines until we encounter one that is empty or "\r" (getline consumes the \n).  That means we have gotten to the payload.  We include line.empty() in case the server forgot the "\r".

web-get

Step 3: Read through the server's HTTP response and save its payload data to a file

static void readAndSavePayload(iosockstream& socketStream, const string& filename) {
  ofstream output(filename, ios::binary); // don't assume it's text
  size_t totalBytes = 0;
  while (!socketStream.fail()) {
    char buffer[kBufferSizeBytes] = {'\0'};
    socketStream.read(buffer, sizeof(buffer));
    totalBytes += socketStream.gcount();
    output.write(buffer, socketStream.gcount());
  }
  cout << "Total number of bytes fetched: " << totalBytes << endl;
}

We won't focus too much on the intricacies of this function, but it reads the rest of the response as binary data, and saves it to a file in chunks.  Once the server sends the payload, it closes its end of the connection which the client sees as "EOF".

Plan For Today

  • Recap: Protocols and HTTP
  • HTTP Client Example: wget
  • HTTP Server Example: scrabble

HTTP Server: Scrabble Word Finder

Let's write a web application for finding valid scrabble words given certain letters.

Web Applications

Server

Client

BROWSER: scrabble-words.com, please!

sure, here's HTML for that page.

Web Applications

Server

Client

WEBPAGE: scrabble words for "aebght", please!

sure, here's a list of words.

Mobile Applications

Server

Client

APP: scrabble words for "aebght", please!

sure, here's a list of words.

Web Applications and APIs

  • A web server can handle different types of requests.  Some can send back HTML for a browser, others can be for non-HTML data for programs or webpages to parse.
  • A web application is like a "dynamic webpage" - the page can make more requests to the server while the user interacts with it.
  • A web API (Application Programming Interface) is the list of request types that a given server can handle
    • More generally: an API is a set of functions one can use in order to build a larger piece of software.
    • APIs can be functions you import (like #include <stdio.h>) or types of requests servers can respond to (like the NASA API, or other web APIs like this or this).
    • Any kind of program can send/receive HTTP requests - webpages, apps, etc.  When building a product, you may have the same server API used by your webpage and app.

Recap

  • Recap: Protocols and HTTP
  • HTTP Client Example: wget
  • HTTP Server Example: scrabble

 

 

Next time: implementing scrabble server, and learning about how createClientSocket and createServerSocket are implemented.