CS110 Lecture 16: Network System Calls
Principles of Computer Systems
Winter 2020
Stanford University
Computer Science Department
Instructors: Chris Gregg and
Nick Troccoli


CS110 Topic 4: How can we write programs that communicate over a network with other programs?

Learning About Networking
Introduction to Networking
API Servers
Network System Calls
2/24
2/26
Today

Today's Learning Goals
- Learn how we can create a client socket descriptor to connect to server
 - Learn how we can create a server socket descriptor to listen for client connections
 

Plan For Today
- Recap: Networking So Far
 - createClientSocket
 - Break: Announcements
 - createServerSocket
 - assign7 Overview
 

Plan For Today
- Recap: Networking So Far
 - createClientSocket
 - Break: Announcements
 - createServerSocket
 - assign7 Overview
 

Networking So Far
- Networking allows a program to communicate with a program on another machine.
 - Networking code relies on the same descriptor framework we used for files
	
- Create a descriptor (a number) that represents that resource
 - Read or write with that descriptor
 - Close the descriptor when you're done
 
 

Networking So Far
- An IP Address identifies a machine on a network
 - A port number identifies a specific networked program running on a machine
	
- This allows multiple networked programs to run on the same machine
 - Analogy: San Fran = IP address, waterfront piers = ports
 
 - A socket is the endpoint of a single connection over a port. It is represented as a descriptor we can read from/write to.
 - "Port" is to "socket descriptor" as "filename" is to "file descriptor"
 

Networking So Far
- We can use socket descriptors the same as file descriptors (read, write)
 - However, iosockstream lets us wrap a socket descriptor in a stream (so that we can read/write like we do with cout instead of via read/write.)
 
static void writeToSocket(int socketDescriptor) {
  sockbuf sb(socketDescriptor);
  iosockstream ss(&sb);
  ss << [CONTENT HERE] << endl;
} // sockbuf destructor closes client
Plan For Today
- Recap: Networking So Far
 - createClientSocket
 - Break: Announcements
 - createServerSocket
 - assign7 Overview
 

Clients And Servers
"client" and "server" are the two main program "roles" in networking.
- A server continually listens for incoming requests and sends responses back
 - A client sends a request to a server and does something with the response
 - Examples:
	
- YouTube app (client) sends requests to the YouTube servers for what content (e.g. videos) to display
 - Web browser (client) sends requests to the server at the URL you enter for what content (e.g. webpage, images) to display
 
 

Clients
int main(int argc, char *argv[]) {
    int clientSocket = createClientSocket("myserver.com", 12345);
    sockbuf sb(clientSocket);
    iosockstream ss(&sb);
    string responseData;
    getline(ss, responseData);
    cout << responseData << endl;
    return 0;
}- We create a client socket to connect to a given server
 - This returns a descriptor we can use to read/write
 - We use an iosockstream to simplify using this file descriptor
 - Finally, we print out the line of data sent from the server
 
But what is createClientSocket really doing?

createClientSocket
- Check that the specified server and port are valid
 - Create a new socket descriptor
 - Associate this socket descriptor with a connection to that server
 - Return the socket descriptor
 
int createClientSocket(const string& host, unsigned short port);
    
createClientSocket
- Check that the specified server and port are valid - gethostbyname()
 - Create a new socket descriptor - socket()
 - Associate this socket descriptor with a connection to that server - connect()
 - Return the socket descriptor
 
int createClientSocket(const string& host, unsigned short port);
    
createClientSocket
- Check that the specified server and port are valid - gethostbyname()
 - Create a new socket descriptor - socket()
 - Associate this socket descriptor with a connection to that server - connect()
 - Return the socket descriptor
 
int createClientSocket(const string& host, unsigned short port);
    - We check the validity of the host by attempting to look up their IP address
 - gethostbyname() gets host info for the given name (e.g. "www.facebook.com")
 - 
gethostbyaddr() gets host info for the given IPv4 address (e.g. "31.13.75.17")
	
- First argument is the base address of a character array with ASCII values of 171, 64, 64, and 137 in network byte order.  For IPv4, the second argument is usually 
sizeof(struct in_addr)and the third theAF_INETconstant. 
 - First argument is the base address of a character array with ASCII values of 171, 64, 64, and 137 in network byte order.  For IPv4, the second argument is usually 
 
- Both are technically deprecated in favor of getAddrInfo, but still prevalent and good to know
 - Returns a statically allocated 
struct hostentwith host's info (or NULL if error) 
Idea: let's use gethostbyname() to look up this host and see if it's valid (non-NULL).
struct hostent *gethostbyname(const char *name);
struct hostent *gethostbyaddr(const char *addr, int len, int type);
    createClientSocket
// represents an IP Address
struct in_addr {
    unsigned int s_addr // stored in network byte order (big endian)
};
// represents a host's info
struct hostent {
    // official name of host
    char *h_name;
    // NULL-terminated list of aliases
    char **h_aliases;
    // host address type (typically AF_INET for IPv4)
    int h_addrtype;
    // address length (typically 4, or sizeof(struct in_addr) for IPv4)
    int h_length;
    // NULL-terminated list of IP addresses
    // This is really a struct in_addr ** when hostent contains IPv4 addresses
    char **h_addr_list; 
}; 
    gethostbyname()
Note: h_addr_list is typed to be a char * array, but for IPv4 records it's really struct in_addr **, so we cast it to that in our code.
Why the confusion?
- h_addr_list needs to represent an array of pointers to IP addresses.
 - struct hostent must be generic and work with e.g. both IPv4 and IPv6 hosts.
 - Thus, h_addr_list could be an array of in_addr *s (IPv4) or an array of in6_addr *s (IPv6).
 - No void * back then, so char ** it is.
 

- Check that the specified server and port are valid - gethostbyname()
 
createClientSocket
int createClientSocket(const string& host, unsigned short port) {
    struct hostent *he = gethostbyname(host.c_str());
    if (he == NULL) return -1;
    ...
    
createClientSocket
- Check that the specified server and port are valid - gethostbyname()
 - Create a new socket descriptor - socket()
 
int socket(int domain, int type, int protocol);
int createClientSocket(const string& host, unsigned short port) {
    ...
    int s = socket(AF_INET, SOCK_STREAM, 0);
    if (s < 0) return -1;
    ...
    
createClientSocket
- Check that the specified server and port are valid - gethostbyname()
 - Create a new socket descriptor - socket()
 - Associate this socket descriptor with a connection to that server - connect()
 
int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);
    Lecture 15: Network System Calls, Library Functions
- The three data structures presented below are in place to model the IP address/port pairs:
 
struct sockaddr { // generic socket
    unsigned short sa_family; // protocol family for socket
    char sa_data[14];
    // address data (and defines full size to be 16 bytes)
};
    The sockaddr_in is used to model IPv4 address/port pairs.
- The 
sin_familyfield should always be initialized to beAF_INET, which is a constant used to be clear that IPv4 addresses are being used. If it feels redundant that a record dedicated to IPv4 needs to store a constant saying everything is IPv4, then stay tuned. - The 
sin_portfield stores a port number in network byte (i.e. big endian) order. - The 
sin_addrfield stores an IPv4 address as a packed, big endianint, as you saw withgethostbynameand thestruct hostent. - The 
sin_zerofield is generally ignored (though it's often set to store all zeroes). It exists primarily to pad the record up to 16 bytes. 
struct sockaddr_in { // IPv4 socket address record
    unsigned short sin_family;
    unsigned short sin_port;
    struct in_addr sin_addr;
    unsigned char sin_zero[8];
};
    struct sockaddr_in6 { // IPv6 socket address record
    unsigned short sin6_family;
    unsigned short sin6_port;
    unsigned int sin6_flowinfo;;
    struct in6_addr sin6_addr;
    unsigned int sin6_scope_id;
};
    Lecture 15: Network System Calls, Library Functions
- The three data structures presented below are in place to model the IP address/port pairs:
 
sockaddr_in6 is used to model IPv6 address/port pairs.
        - The 
sin6_familyfield should always be set toAF_INET6. As with thesin_familyfield,sin6_familyfield occupies the first two bytes of surrounding record. - The 
sin6_portfield holds a two-byte, network-byte-ordered port number, just like sin_port does. - A 
struct in6_addris also wedged in there to manage a 128-bit IPv6 address. - 
                
sin6_flowinfoandsin6_scope_idare beyond the scope of what we need, so we'll ignore them. 
struct sockaddr { // generic socket
    unsigned short sa_family; // protocol family for socket
    char sa_data[14];
    // address data (and defines full size to be 16 bytes)
};
    struct sockaddr_in { // IPv4 socket address record
    unsigned short sin_family;
    unsigned short sin_port;
    struct in_addr sin_addr;
    unsigned char sin_zero[8];
};
    struct sockaddr_in6 { // IPv6 socket address record
    unsigned short sin6_family;
    unsigned short sin6_port;
    unsigned int sin6_flowinfo;;
    struct in6_addr sin6_addr;
    unsigned int sin6_scope_id;
};
    Lecture 15: Network System Calls, Library Functions
- The three data structures presented below are in place to model the IP address/port pairs:
 
struct sockaddr is the best C can do to emulate an abstract base class.
        - You rarely if ever declare variables of type 
struct sockaddr, but many system calls will accept parameters of typestruct sockaddr *. - Rather than define a set of network system calls for IPv4 addresses and a second set of system calls for IPv6 addresses, Linux defines one set for both.
 - If a system call accepts a parameter of type struct sockaddr *, it really accepts the address of either a 
struct sockaddr_inor astruct sockaddr_in6. The system call relies on the value within the first two bytes—thesa_familyfield—to determine what the true record type is. 
struct sockaddr { // generic socket
    unsigned short sa_family; // protocol family for socket
    char sa_data[14];
    // address data (and defines full size to be 16 bytes)
};
    struct sockaddr_in { // IPv4 socket address record
    unsigned short sin_family;
    unsigned short sin_port;
    struct in_addr sin_addr;
    unsigned char sin_zero[8];
};
    struct sockaddr_in6 { // IPv6 socket address record
    unsigned short sin6_family;
    unsigned short sin6_port;
    unsigned int sin6_flowinfo;;
    struct in6_addr sin6_addr;
    unsigned int sin6_scope_id;
};
    
createClientSocket
- Check that the specified server and port are valid - gethostbyname()
 - Create a new socket descriptor - socket()
 - Associate this socket descriptor with a connection to that server - connect()
 
int createClientSocket(const string& host, unsigned short port) {
    ...
    struct sockaddr_in address;
    memset(&address, 0, sizeof(address));
    address.sin_family = AF_INET;
    address.sin_port = htons(port);
    // h_addr is #define for h_addr_list[0]
    address.sin_addr = *((struct in_addr *)he->h_addr); 
    if (connect(s, (struct sockaddr *) &address, sizeof(address)) == 0) return s;
    ...
    Lecture 15: Network System Calls, Library Functions
createClientSocket:int createClientSocket(const string& host, unsigned short port) {
    struct hostent *he = gethostbyname(host.c_str());
    if (he == NULL) return -1;
    int s = socket(AF_INET, SOCK_STREAM, 0);
    if (s < 0) return -1;
    struct sockaddr_in address;
    memset(&address, 0, sizeof(address));
    address.sin_family = AF_INET;
    address.sin_port = htons(port);
    // h_addr is #define for h_addr_list[0]
    address.sin_addr = *((struct in_addr *)he->h_addr); 
    if (connect(s, (struct sockaddr *) &address, sizeof(address)) == 0) return s;
    close(s);
    return -1;
}
    Lecture 15: Network System Calls, Library Functions
- 
                
addressis declared to be of typestruct sockaddr_in, since that's the data type specifically set up to model IPv4 addresses. Had we been dealing with IPv6 addresses, we'd have declared astruct sockaddr_in6instead. - It's important to embed 
AF_INETwithin thesin_familyfield, since those two bytes are examined by system calls to determine the type of socket address structure. - The 
sin_portfield is, not surprisingly, designed to hold the port of interest.htons—that's an abbreviation forhost-to-network-short—is there to ensure the port is stored in network byte order (which is big endian order). On big endian machines,htonsis implemented to return the provided short without modification. On little endian machines (like themyths),htonsreturns a figure constructed by exchanging the two bytes of the incomingshort. In addition tohtons, Linux also providedhtonlfor four-bytelongs, and it also providesntohsandntohlto restore host byte order from network byte ordered figures. - The call to 
connectassociates the descriptorswith the host/IP address pair modeled by the suppliedstruct sockaddr_in *. The second argument is downcast to astruct sockaddr *, sinceconnectneeds accept a pointer to any type within the entirestruct sockaddrfamily, not juststruct sockaddr_ins. 

Plan For Today
- Recap: Networking So Far
 - createClientSocket
 - Break: Announcements
 - createServerSocket
 - assign7 Overview
 

Announcements
- Assignment 7 due this Fri. 11:59PM
 

Plan For Today
- Recap: Networking So Far
 - createClientSocket
 - Break: Announcements
 - createServerSocket
 - assign7 Overview
 


createServerSocket
- Create a new socket descriptor - socket()
 - Bind this socket to a given port and IP address- bind()
 - Make the socket descriptor passive to listen for incoming requests - listen()
 - Return socket descriptor
 
int createServerSocket(unsigned short port, int backlog = kDefaultBacklog);
    Lecture 15: Network System Calls, Library Functions
createServerSocket:int createServerSocket(unsigned short port, int backlog) {
    int s = socket(AF_INET, SOCK_STREAM, 0);
    if (s < 0) return -1;
    struct sockaddr_in address;
    memset(&address, 0, sizeof(address));
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = htonl(INADDR_ANY);
    address.sin_port = htons(port);
    if (bind(s, (struct sockaddr *)&address, sizeof(address)) == 0 &&
            listen(s, backlog) == 0) return s;
    close(s);
    return -1;
}
    Lecture 15: Network System Calls, Library Functions
createServerSocket worth calling out:
        - The call to 
socketis precisely the same here as it was increateClientSocket. It allocates a descriptor and configures it to be a socket descriptor within theAF_INETnamespace. - The address of type 
struct sockaddr_inhere is configured in much the same way it was increateClientSocket, except that thesin_addr.s_addrfield should be set to a local IP address, not a remote one. The constantINADDR_ANYis used to state that address should represent all local addresses. - The 
bindcall simply assigns the set of local IP addresses represented byaddressto the provided sockets. Because we embeddedINADDR_ANYwithinaddress,bindassociates the supplied socket with all local IP addresses. That means oncecreateServerSockethas done its job, clients can connect to any of the machine's IP addresses via the specified port. - The 
listencall is what converts the socket to be one that's willing to accept connections viaaccept. The second argument is a queue size limit, which states how many pending connection requests can accumulate and wait their turn to beaccepted. If the number of outstanding requests is at the limit, additional requests are simply refused. 

Plan For Today
- Recap: Networking So Far
 - createClientSocket
 - Break: Announcements
 - createServerSocket
 - assign7 Overview
 
- A web proxy server is a server that acts as a go-between from your browser to sites on the Internet. Proxies can serve many purposes:
	
- Block access to certain websites
 - Block access to certain documents (big documents, .zip files, etc.)
 - Act as an anonymizer to strip data from headers about what the real IP address of the client is, or by stripping out cookies or other identifying information. The Tor network, using onion routing performs this role (among other roles, such as protecting data with strong encryption)
 - Intercept image requests, serving only upside-down versions of images.
 - Intercept all traffic and redirect to kittenwar.com.
 - Cache requests for static data (e.g., images) so it can later serve local copies rather than re-request from the web.
 - Redirect to a paywall (e.g., what happens at airports)
 
 
- Once you have started the proxy (starter code), you should be able to go to any web site, and see just "You're writing a proxy!" in place of the webpage:
 

- Not much going on!
 - After you have set up the proxy, you can leave it (if you browse with another browser), as long as you always ssh into the same myth machine each time you work on the assignment. If you change myth machines, you will need to update the proxy settings (always check this first if you run into issues)
 - You should also frequently clear the browser's cache, as it can locally cache elements, too, which means that you might load a page without ever going to your proxy.
 
- Version 1: Sequential Proxy
	
- You will eventually add a 
ThreadPoolto your program, but first, write a sequential version. - You will be changing the starter code to be a true proxy, that intercepts the requests and passes them on to the intended server. There are three HTTP methods you need to support:
		
- GET: request a web page from the server
 - HEAD: exactly like GET, but only requests the headers
 - POST: send data to the website
 
 - The request line will look like this:
 
 - You will eventually add a 
 
GET http://www.cornell.edu/research/ HTTP/1.1
- 
	
- For this example, your program forwards the request to www.cornell.edu, with the first line of the request as follows:
 
 
GET /research/ HTTP/1.1
- 
	
- You already have a fully implemented 
HTTPRequestclass, although you will have to update theoperator<<function at a later stage. 
 - You already have a fully implemented 
 

Lecture Recap
- Recap: Networking So Far
 - createClientSocket
 - Break: Announcements
 - createServerSocket
 - assign7 Overview
 
Next Time: Overview of MapReduce
Lecture 16: Network System Calls (w20)
By Nick Troccoli
Lecture 16: Network System Calls (w20)
Winter 2020
- 2,649