Principles of Computer Systems
Spring 2019
Stanford University
Computer Science Department
Lecturer: Chris Gregg
$ ./proxy
Listening for all incoming traffic on port 19419.
telnet
, instead:myth51:$ telnet myth65.stanford.edu 19419
Trying 171.64.15.30...
Connected to myth65.stanford.edu.
Escape character is '^]'.
GET http://api.ipify.org/?format=json HTTP/1.1
Host: api.ipify.org
HTTP/1.0 200 OK
content-length: 23
You're writing a proxy!Connection closed by foreign host.
myth51:$
myth51:$ telnet myth65.stanford.edu 19419
Trying 171.64.15.30...
Connected to myth65.stanford.edu.
Escape character is '^]'.
GET http://api.ipify.org/?format=json HTTP/1.1
Host: api.ipify.org
HTTP/1.1 200 OK
connection: keep-alive
content-length: 21
content-type: application/json
date: Wed, 22 May 2019 16:56:33 GMT
server: Cowboy
vary: Origin
via: 1.1 vegur
{"ip":"172.27.64.82"}Connection closed by foreign host.
myth51:$
ThreadPool
to your program, but first, write a sequential version.GET http://www.cornell.edu/research/ HTTP/1.1
GET /research/ HTTP/1.1
HTTPRequest
class, although you will have to update the operator<<
function at a later stage.x-forwarded-proto
and set its value to be http
. If x-forwarded-proto
is already included in the request header, then simply add it again.x-forwarded-for
and set its value to be the IP address of the requesting client. If x-forwarded-for
is already present, then you should extend its value into a comma-separated chain of IP addresses the request has passed through before arriving at your proxy. (The IP address of the machine you’re directly hearing from would be appended to the end).header.h/cc
files to utilize the functions, e.g., string xForwardedForStr = requestHeader.getValueAsString("x-forwarded-for");
request-handler.h/cc
, and some in request.h/cc
. HTTP://
sites you can find!blocked-domains.txt
file that lists domains that should not be let through your proxy. When the server in the blacklist is requested, you should return to the client a status code of 403
, and a payload of "Forbidden Content
":blacklist.cc
file, e.g.,
if (!blacklist.serverIsAllowed(request.getServer()) { ...
"HTTP/1.0"
as the protocol.
HTTPRequestHandler
to check to see if you've already stored a copy of a request -- if you have, just return it instead of forwarding on! You can use the HTTPCache
class to do this check (and to add sites, as well).cache.shouldCache(request, response)
), then you cache it for later.myth51:$ ./proxy --clear-cache
Clearing the cache... wait for it.... done!
Listening for all incoming traffic on port 19419.
ThreadPool
class (we give you a working version in case yours still has bugs)HTTPProxyScheduler
class.
HTTPRequestHandler
, which already has a single HTTPBlacklist
and a single HTTPCache
. You will need to go back and add synchronization directives (e.g., mutex
es) to your prior code to ensure that you don't have race conditions.
mutex
es.
size_t requestHash = hashRequest(request);)
client-socket.h/cc
files have been updated to include thread-safe versions of their functions, so no need to worry about that.myth63:$ samples/proxy_soln --port 12345
Listening for all incoming traffic on port 12345.
myth65:$ samples/proxy_soln --proxy-server myth63.stanford.edu --proxy-port 12345
Listening for all incoming traffic on port 19419.
Requests will be directed toward another proxy at myth63.stanford.edu:12345.
"x-forwarded-for"
header! You analyze that list to see if you are about to create a chain.
"x-forwarded-proto"
and "x-forwarded-for"
headers.run-proxy-farm.py
program that can manage a chain of proxies (but it doesn't check for cycles -- you would need to modify the python code to do that).https://
sites, you will have to implement the CONNECT HTTP method, which is not required for the assignment, but also not that much more work to add. We can give you some information if you want to add that support.file | changes |
---|---|
cache.cc | (very minor) |
cache.h | (very minor) |
proxy.cc | (very minor) |
request.cc | (minor) |
request.h | (minor) |
request-handler.cc | (major) |
request-handler.h | (major) |
scheduler.cc | (minor) |
scheduler.h | (very minor) |
<word> 1
for every alphabetic token in that file.import sys
import re
pattern = re.compile("^[a-z]+$") # matches purely alphabetic words
for line in sys.stdin:
line = line.strip()
tokens = line.split()
for token in tokens:
lowercaseword = token.lower()
if pattern.match(lowercaseword):
print '%s 1' % lowercaseword
myth61:$ cat anna-karenina.txt | ./word-count-mapper.py
happy 1
families 1
are 1
... // some 340000 words omitted for brevity
to 1
put 1
into 1
group-by-key
contributes to all MapReduce pipelines, not just this one. Our group-by-key.py
executable—presented on the next slide—assumes the mapper's output has been sorted so multiple instances of the same key are more easily grouped together, as with:myth61:$ cat anna-karenina.txt | ./word-count-mapper.py | sort
a 1
a 1
a 1
a 1
a 1 // plus 6064 additional copies of this same line
...
zigzag 1
zoological 1
zoological 1
zoology 1
zu 1
myth61:$ cat anna-karenina.txt | ./word-count-mapper.py | sort | ./group-by-key.py
a 1 1 1 1 1 // plus 6064 more 1's on this same line
...
zeal 1 1 1
zealously 1
zest 1
zhivahov 1
zigzag 1
zoological 1 1
zoology 1
zu 1
from itertools import groupby
from operator import itemgetter
import sys
def read_mapper_output(file):
for line in file:
yield line.strip().split(' ')
data = read_mapper_output(sys.stdin)
for key, keygroup in groupby(data, itemgetter(0)):
values = ' '.join(sorted(v for k, v in keygroup))
print "%s %s" % (key, values)
./group-by-key.py
script.import sys
def read_mapper_output(file):
for line in file:
yield line.strip().split(' ')
for vec in read_mapper_output(sys.stdin):
word = vec[0]
count = sum(int(number) for number in vec[1:])
print "%s %d" % (word, count)
myth61:$ cat anna-karenina.txt | ./word-count-mapper.py | sort \
| ./group-by-key.py | ./word-count-reducer.py
a 6069
abandon 6
abandoned 9
abandonment 1
...
zoological 2
zoology 1
zu 1