CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Introduction to Networking
Servers / HTTP
HTTP and APIs
Networking System Calls / Library Functions
assign6: implement an HTTP Proxy that sits between a client device and a web server to monitor, block or modify web traffic.
Let's see the underlying system calls and library functions needed to implement createClientSocket and createServerSocket!
We have used createClientSocket in client programs so far to connect to servers. It gives us back a descriptor we can use to read/write data.
But how is the createClientSocket helper function actually implemented?
int main(int argc, char *argv[]) {
// Open a connection to the server
int socketDescriptor = createClientSocket("myth64.stanford.edu", 12345);
// Read in the data from the server (sockbuf descructor closes descriptor)
sockbuf socketBuffer(socketDescriptor);
iosockstream socketStream(&socketBuffer);
string timeline;
getline(socketStream, timeline);
// Print the data from the server
cout << timeline << endl;
return 0;
}
int createClientSocket(const string& host, unsigned short port);
int createClientSocket(const string& host, unsigned short port);
int createClientSocket(const string& host, unsigned short port);
struct hostent
with host's info (or NULL if error)struct hostent *gethostbyname(const char *name);
struct hostent *gethostbyname2(const char *name, int af);
int createClientSocket(const string& host, unsigned short port) {
struct hostent *he = gethostbyname(host.c_str());
if (he == NULL) return -1;
...
int socket(int domain, int type, int protocol);
int createClientSocket(const string& host, unsigned short port) {
...
int s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) return -1;
...
The socket function creates a socket endpoint and returns a descriptor.
int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);
connect connects the specified socket to the specified address.
int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);
There are actually multiple different types of we may want to pass in. sockaddr_in and sockaddr_in6. How can we handle these possibilities? C doesn't support inheritance or templates.
int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);
We will make the parameter type a "parent type" called sockaddr, which will have the same memory layout as sockaddr_in and sockaddr_in6. Its structure is a 2 byte type field followed by 14 bytes of something. Both sockaddr_in and sockaddr_in6 will start with that 2 byte type field, and use the remaining 14 bytes for whatever they want.
struct sockaddr { // generic socket
unsigned short sa_family; // protocol family for socket
char sa_data[14];
// address data (and defines full size to be 16 bytes)
};
struct sockaddr_in { // IPv4 socket address record
unsigned short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
unsigned char sin_zero[8];
};
struct sockaddr_in6 { // IPv6 socket address record
unsigned short sin6_family;
unsigned short sin6_port;
unsigned int sin6_flowinfo;
struct in6_addr sin6_addr;
unsigned int sin6_scope_id;
};
struct sockaddr_in { // IPv4 socket address record
unsigned short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
unsigned char sin_zero[8];
};
sin_family
field should store AF_INET
for IPv4sin_port
field stores a port number in network byte order.
sin_addr
field stores the IPv4 addresssin_zero
field represents the remaining 8 bytes that are unused.sin6_family
field should store AF_INET6
for IPv6sin6_port
field stores a port number in network byte order.sin6_addr
field stores the IPv6 addresssin6_flowinfo
and sin6_scope_id
are beyond the scope of what we need, so we'll ignore them.
struct sockaddr_in6 { // IPv6 socket address record
unsigned short sin6_family;
unsigned short sin6_port;
unsigned int sin6_flowinfo;
struct in6_addr sin6_addr;
unsigned int sin6_scope_id;
};
int createClientSocket(const string& host, unsigned short port) {
...
struct sockaddr_in address;
memset(&address, 0, sizeof(address));
address.sin_family = AF_INET;
address.sin_port = htons(port);
address.addr = ???;
...
htons is "host to network short" - it converts to network byte order, which may or may not be the same as the byte order your machine uses.
Specify:
We can get the IP address for the server from the struct hostent * from gethostbyname.
struct hostent *gethostbyname(const char *name);
struct hostent *gethostbyname2(const char *name, int af);
struct hostent {
...
// NULL-terminated list of IP addresses
// This is really a struct in_addr ** when hostent contains IPv4 addresses
// This is really a struct in6_addr ** when hostent contains IPv6 addresses
char **h_addr_list;
...
};
We can get the IP address for the server from the struct hostent * from gethostbyname.
struct hostent *gethostbyname(const char *name);
struct hostent *gethostbyname2(const char *name, int af);
struct hostent {
...
// NULL-terminated list of IP addresses
// This is really a struct in_addr ** when hostent contains IPv4 addresses
// This is really a struct in6_addr ** when hostent contains IPv6 addresses
char **h_addr_list;
...
};
// h_addr is #define for h_addr_list[0]
struct in_addr first_ip = *((struct in_addr *)he->h_addr);
int createClientSocket(const string& host, unsigned short port) {
...
struct sockaddr_in address;
memset(&address, 0, sizeof(address));
address.sin_family = AF_INET;
address.sin_port = htons(port);
// h_addr is #define for h_addr_list[0]
address.sin_addr = *((struct in_addr *)he->h_addr);
if (connect(s, (struct sockaddr *) &address, sizeof(address)) == 0) return s;
...
int createClientSocket(const string& host, unsigned short port) {
struct hostent *he = gethostbyname(host.c_str());
if (he == NULL) return -1;
int s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) return -1;
struct sockaddr_in address;
memset(&address, 0, sizeof(address));
address.sin_family = AF_INET;
address.sin_port = htons(port);
// h_addr is #define for h_addr_list[0]
address.sin_addr = *((struct in_addr *)he->h_addr);
if (connect(s, (struct sockaddr *) &address, sizeof(address)) == 0) return s;
close(s);
return -1;
}
int createServerSocket(unsigned short port, int backlog = kDefaultBacklog);
int createServerSocket(unsigned short port, int backlog = kDefaultBacklog);
int createServerSocket(unsigned short port, int backlog = kDefaultBacklog);
int createServerSocket(unsigned short port, int backlog) {
int s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) return -1;
...
}
2. Bind this socket to a given port and IP address - bind()
int createServerSocket(unsigned short port, int backlog) {
...
struct sockaddr_in address;
memset(&address, 0, sizeof(address));
address.sin_family = AF_INET;
address.sin_addr.s_addr = htonl(INADDR_ANY);
address.sin_port = htons(port);
if (bind(s, (struct sockaddr *)&address, sizeof(address)) == 0 &&
...
}
bind "associates a name with a socket"
Specify:
3. Make the socket descriptor passive to listen for incoming requests - listen()
int createServerSocket(unsigned short port, int backlog) {
...
if (bind(s, (struct sockaddr *)&address, sizeof(address)) == 0 &&
listen(s, backlog) == 0) return s;
...
}
listen makes the specified socket passive - one used for listening via accept.
int createServerSocket(unsigned short port, int backlog) {
int s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) return -1;
struct sockaddr_in address;
memset(&address, 0, sizeof(address));
address.sin_family = AF_INET;
address.sin_addr.s_addr = htonl(INADDR_ANY);
address.sin_port = htons(port);
if (bind(s, (struct sockaddr *)&address, sizeof(address)) == 0 &&
listen(s, backlog) == 0) return s;
close(s);
return -1;
}
Task: we want to count the frequency of words in a document.
Possible Approach: program that reads document and builds a word -> frequency map
How can we parallelize this?
Idea: split document into pieces, count words in each piece concurrently
Problem: what if a word appears in multiple pieces? We need to then merge the counts.
Idea: combine all the output, sort it, split into pieces, combine in each one concurrently
Idea: split document into pieces, count words in each piece concurrently. Then, combine all the text output, sort it, split into pieces, sum each one concurrently.
Example: "the very very quick fox greeted the brown fox"
the very very
quick fox greeted
the brown fox
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
Sorted
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
the very very
quick fox greeted
the brown fox
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
Sorted
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 2
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
2 "phases" where we parallelize work
The first phase focuses on finding, and the second phase focuses on summing. So the first phase should only output 1s, and leave the summing for later.
Example: "the very very quick fox greeted the brown fox"
the very very
quick fox greeted
the brown fox
the, 1
very, 2
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
...
the, 1
very, 1
very, 1
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
Sorted
2 "phases" where we parallelize work
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
the very very
quick fox greeted
the brown fox
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
Combined
Sorted
the, 1
very, 1
very, 1
quick, 1
fox, 1
greeted, 1
the, 1
brown, 1
fox, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 1
fox, 1
greeted, 1
quick, 1
the, 1
the, 1
very, 1
very, 1
brown, 1
fox, 2
greeted, 1
quick, 1
the, 2
very, 2
Question: is there a way to parallelize this operation as well?
Next time: more MapReduce and CS110 wrap-up / systems principles