Hello HTTP!

Dr Gleb Bahmutov PhD

C / C++ / C# / Java / CoffeeScript / JavaScript / Node / Angular / Vue / Cycle.js / functional

EveryScape

virtual tours

MathWorks

MatLab on the web

Kensho

finance dashboards

VP of Eng at

Testing, the way it should be

Cypress in action https://www.cypress.io/

2017

Software Engineer

2000 B.C. - Farmer

2000 B.C. - Want to be an engineer!

The word engineer (Latin ingeniator) is derived from the Latin words ingeniare ("to contrive, devise") and ingenium ("cleverness")

80 A.D. - Good to be an engineer!

1800 - Engineers rule

1960 - Science rules

Science: 📚 or ☠️

1989

Edit and view HTML documents, 1991

source: http://digital-archaeology.org/the-nexus-browser/

first website: info.cern.ch

1991

HTTP/0.9

1991

Sir Tim Berners-Lee

Professor of Engineering in the School of Engineering with a joint appointment in the Department of Electrical Engineering and Computer Science at MIT

HTTP/0.9

1991

650 words

3800 characters

HTTP/0.9

1991

HTTP/0.9

1991

HTTP/0.9

If the port number is not specified, 80 is always assumed for HTTP

because port 79 and 81 were taken (RFC 1060)

1991

HTTP/0.9

The TCP-IP connection is broken by the server when the whole document has been transfered

1991

GET /fuzzy_bunnies.txt

HTTP/0.9

1991

The response to a simple GET request is a message in hypertext mark-up language ( HTML ). This is a byte stream of ASCII characters.

HTTP/0.9

1991

No error response codes

HTTP/0.9

1991

Only GET

No cookies / sessions / server state

HTTP/0.9

Y - you

A - aren't

G - gonna

N - need

I - it

Sept 14, 2017

What Problem Does HTTP/0.9 Solve?

<html>
  <p>More info
    <a href="http://foo.com/bar.html">here</a>
  </p>
</html>

I want to read this HTML document

GET /bar.html

<html>
  ...
</html>

HTTP relies on TCP to ask for and receive document

TCP

IP

TCP

IP

how?

where?

HTTP

HTTP performance is tied to TCP properties

1. Guaranteed data delivery

2. Guaranteed packet order

1991

Future HTTP protocols will be back-compatible with this protocol.

HTTP/0.9

1991 was

26 years ago ...

World Wide Web went from zero to "everywhere" in about two years

Then:

Date # of web sites
'93 130
'94 2700
'95 23,500
'96 100,000
<html>
  <p>See diagram
    <img src="http://foo.com/bar.jpg" />
  </p>
</html>

I want to see this JPEG image

Non-academic users:

HTTP/1.0

We Are Gonna Need It (WAGNI)

HTTP/1.0

1996

RFC 1945

60 pages

GET /mypage.html HTTP/1.0
User-Agent: NCSA_Mosaic/2.0 (Windows 3.1)

200 OK
Date: Tue, 15 Nov 1994 08:12:31 GMT
Server: CERN/3.0 libwww/2.17
Content-Type: text/html
<HTML> 
A page with an image
  <IMG src="/myimage.gif">
</HTML>

HTTP/1.0

  • Non-HTML data (images!)

  • Methods (HEAD & POST)

  • Status Codes

  • User preferences (User-Agent)

  • Format negotiation (Content-Type)

1996

RFC 1945

HTTP/1.1

Formalize best practices

HTTP/1.1

1999

RFC 2616

27 drafts, latest 2014

The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems.

HTTP/1.1

  • Performance (Caching, connections)

  • Security (HTTPS, 3rd party cookies)

  • Usability (IP scarcity, error codes)

1999

RFC 2616

HTTP/RFC

IETF

The Internet Engineering Task Force (IETF)

make the Internet work better from an engineering point of view

The IETF's official products are documents called RFCs (Requests for Comments)

HTTP/1.1 is uniform and everywhere

Reality: multiple versions, implementations, extensions, working drafts

"The Tangled Web: A Guide to Securing Modern Web Applications"

by Michal Zalewski

ISBN-13: 978-1593273880

👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍

What if we send HTTP request to SMTP?

GET /<html><body><h1>Hi! HTTP/1.1 
Host: example.com:25
220 example.com ESMTP
500 5.5.1 Invalid command: "GET /<html><body><h1>Hi! HTTP/1.1" 
500 5.1.1 Invalid command: "Host: example.com:25"
...
421 4.4.1 Timeout

If the browser blindly trusts returned text to be HTML this is reflected XSS attack

Are we talking about the same thing?

Similar attack due to insufficient protocol negotiation

Invalid Curve Attack

GET /bar.html

<html>
  title: bar
</html>

*** **

bar.html

** ***

index.html

title: bar

*** **

bar.html

Give me the page ...

HTTP/1.1 is still /0.9

request

response

* I am really ignoring all server-to-server communication happening over HTTP

*

HTTP Request and Response

State (cookies)

Cache control

Security, security, security

tracking pixels / cookies

58 requests to load

this page!

Related: Website Obesity Problem

Average page size: 2MB !

images, fonts, styles, scripts, ads

AJAX:

From websites

to web apps

i.e. "avoiding page reloads"

GET /index.html

*** **

** ***

<iframe src="bar.html">

index.html

GET /bar.html

1996: iframe in Internet Explorer

XMLHttpRequest

Gets the data from the server by a client side script

1999 Internet Explorer plugin

2004 Google Gmail, Kayak.com

2006 first spec draft

2014 latest spec draft

<button id="ajaxButton" type="button">Request</button>
<script>
  var httpRequest;
  document.getElementById("ajaxButton")
    .addEventListener('click', makeRequest);
  function makeRequest() {
    httpRequest = new XMLHttpRequest();
    httpRequest.onreadystatechange = alertContents;
    httpRequest.open('GET', 'test.html');
    httpRequest.send();
  }
  function alertContents() {
    if (httpRequest.readyState === XMLHttpRequest.DONE) {
      if (httpRequest.status === 200) {
        alert(httpRequest.responseText);
      } else {
        alert('There was a problem with the request.');
      }
    }
  }
</script>

declarative HTML

imperative JavaScript

Fetch: new standard

<button id="ajaxButton" type="button">Request</button>
<script>
  document.getElementById("ajaxButton")
    .addEventListener('click', makeRequest);
  function makeRequest() {
    fetch('<some url>')
      .then(r => r.text())
      .then(alert, console.error)
  }
</script>

HTTP/1.1: Today

IP

TCP

HTTP

REST

GraphQL

2000

2012

Client - Server communications

<div> ... </div>

<script>

....

</script>

index.html

<div> ... </div>

<script>

....

</script>

index.html

HTTP/1 => HTTP/2  or QUIC

WebSockets

WebRTC

ServiceWorker

browser

browser

HTTP is dead

Long live HTTPS

The browser APIs worth discussing all require secure connections

http://localhost or https://...

Usually a single command!

... 250k new certificates per day ...

LetsEncrypt Wildcard Certificates Coming January 2018

Browser

Server

HTTPS

connections are expensive because

TCP + TLS handshakes!

data

ServiceWorker

Blurring the line between client and server

If a tree falls while you are in the forest ...

<html manifest="example.appcache">
  ...
</html>

Application Cache

CACHE MANIFEST
# v1 2011-08-14
index.html
style.css
image1.png
# Use from network if available
NETWORK:
network.html
# Fallback content
FALLBACK:
/ fallback.html

declarative list

Application Cache

Turns out declaring caching strategy is hard.

ServiceWorker

Server

browser

Web Workers

ServiceWorker

Transforms

the response

Transforms

the request

Smart caching

OFFLINE SUPPORT

Image / video transcoding

Background data sync

Load ServiceWorker

navigator.serviceWorker.register(
    'app/my-service-worker.js')

Chrome, Opera, Firefox

Must be https

Inside ServiceWorker

self.addEventListener('install', ...)
self.addEventListener('activate', ...)
self.addEventListener('message', ...)
self.addEventListener('push', ...)
self.addEventListener('fetch', function (event) {
  console.log(event.request.url)
  event.respondWith(...)
})
// Cache API
self.addEventListener('install', e => {
  const urls = ['/', 'app.css', 'app.js']
  e.waitUntil(
    caches.open('my-app')
      .then(cache => 
        cache.addAll(urls))
  )
})

Cache resources on SW install

self.addEventListener('fetch', e => {
  e.respondWith(
    caches.open('my-app')
      .then(cache =>
        cache => match(e.request))
  )
})

Return cached resource

Offline support using ServiceWorker

Fetch event

browser

ServiceWorker

Request

Response

Server

Server

browser

ServiceWorker

Request

Response

express.js

http.ClientRequest

JavaScript

http.ServerResponse

JavaScript

Fetch event

browser

ServiceWorker

Server

express.js

Server-Side Rendering

Server

browser

ServiceWorker

express.js

http.ClientRequest(Request)

http.ServerResponse(Response)

Offline SSR-like Web Apps 

Server inside ServiceWorker

Express-service

What if an attacker can load malicious ServiceWorker script?

// returned JavaScript file
// from same domain
myFunc()
navigator.serviceWorker.register(
  'jsonp?callback=myFunc'
)

Example: unfiltered JSONP endpoint

dynamic text to code

navigator.serviceWorker.register will try to load this code as SW

// returned SW script
my malicious SW JS

Example: XSS + unfiltered JSONP endpoint

navigator.serviceWorker.register(
  'jsonp?callback="my malicious SW JS"'
)

navigator.serviceWorker.register will try to load this code as SW :(

Malicious ServiceWorker injected via XSS can be really hard to get rid of

Please protect yourself from XSS

Web Today

Ajax

ServiceWorker

HTTP/2

QUIC

HTTP/2

Making web faster

*

HTTP/1.1

  • How to quickly load 200 resources?
  • How to load some resources first?
  • How to avoid duplicate data overhead?

Source: Ilya Grigorik https://bit.ly/http2-opt

2009 - Google starts SPDY

Need for Speed 🚤

2015 - RFC 7540

binary, multiplexed protocol - 55% speed up on top sites!

H2: Binary Framing Layer

H2: Multiplexing

H2: Stream Prioritization

3:1

then

then

then

3:1

$ curl -I https://github.com
HTTP/1.1 200 OK
Server: GitHub.com
Date: Sat, 03 Jun 2017 03:08:50 GMT
Content-Type: text/html; charset=utf-8
Status: 200 OK
Cache-Control: no-cache
Vary: X-PJAX
X-UA-Compatible: IE=Edge,chrome=1
Set-Cookie: logged_in=no; domain=.github.com; path=/; 
expires=Wed, 03 Jun 2037 03:08:50 -0000; secure; HttpOnly
Set-Cookie: _gh_sess=eyJzZXNzaW9uX2lkIjoiYzFlYTc0YTQ0OTNhZTdlNzI4MTgwNzI5N2QyNTlkNWMiLCJfY
3NyZl90b2tlbiI6IngzaHhRREMxNEpLUFNid1ZmMkc4d0N1OG1xdjZ3MkdmWmh4YkRFazNYQkU9In0%3D--b7171bd148da0b79b248fb561b8bfd4aadf16ff5; path=/; secure; HttpOnly
X-Request-Id: 75c9420698c53b1a707b10b2a7a510cc
X-Runtime: 0.056786
Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; 
child-src render.githubusercontent.com; connect-src 'self' uploads.github.com 
status.github.com collector.githubapp.com api.github.com www.google-analytics.com 
github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com 
github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src 
assets-cdn.github.com; form-action 'self' github.com gist.github.com; frame-ancestors 
'none'; img-src 'self' data: assets-cdn.github.com identicons.github.com 
collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; 
media-src 'none'; script-src assets-cdn.github.com; style-src 'unsafe-inline' 
assets-cdn.github.com
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
Public-Key-Pins: max-age=5184000; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="RRM1dGqnDFsCJXBTHky16vi1obOlCgFFn/yOhI/y+ho="; pin-sha256="k2v657xBsOVe1PQRwOsHsw3bsGT2VzIqz5K+59sNQws="; pin-sha256="K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q="; pin-sha256="IQBnNBEiFuhj+8x6X8XLgh01V9Ic5/V3IRQLNFFc7v4="; pin-sha256="iie1VXtL7HzAMF+/PVPR9xzT80kQxdZeJ+zduCB3uj0="; pin-sha256="LvRiGEjRqfzurezaWuj8Wie2gyHMrW5Q06LspMnox7A="; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
X-Runtime-rack: 0.060928
Vary: Accept-Encoding
X-Served-By: e878d09eac725c89f5f15204c1326660
X-GitHub-Request-Id: F1E1:2F2B:6F2906B:A48C07A:59322842

2100 characters

~55% of HTTP/0.9 spec!

H2: Header Compression

pseudo headers

H2: Header Compression

Common headers static table on client and server

14% of all websites (Jun '17)

HTTP/2 

was 8% Jun '16

SPDY is at 8%

“One great metric around that which I enjoy is the fraction of connections created that carry just a single HTTP transaction (and thus make that transaction bear all the overhead). For HTTP/1 74% of our active connections carry just a single transaction – persistent connections just aren’t as helpful as we all want. But in HTTP/2 that number plummets to 25%. That’s a huge win for overhead reduction.” — Patrick McManus, Mozilla.

H2 is succeeding in minimizing delays

HTTP/2 Information

HTTP/2 Server Push

I know what you want

<html>
<img src="photo.jpg"/>
</html>
...
<img src="photo.jpg"/>
...

GET index.html

Server

GET photo.jpg

Client

?

photo.jpg

index.html

time

Server Push

<html>
<img src="photo.jpg"/>
</html>

Server

GET index.html

Client

photo.jpg

index.html

...
<img src="photo.jpg"/>
...

GET photo.jpg

Already have it!

time

Server Push

<html>
<link rel="stylesheet" 
  href="app.css">
<script src="app.js">
</script>
</html>

Server

GET index.html

Client

app.css

index.html

app.js

Great way to replace HTTP/1.1 practice of inlining small styles, scripts, images in the page

time

const server = require('spdy')
const express = require('express')
const app = express()
app.use(express.static('public'))

const tlsOptions = {
  key: fs.readFileSync('./server.key'),
  cert: fs.readFileSync('./server.crt')
}
const port = 5000
server.createServer(tlsOptions, app)
  .listen(port, err => { ... })

Server Push in NodeJS

const image = fs.readFileSync('./image.jpg')
const options = {
  request: {accept: 'image/*'},
  response: {'content-type': 'image/jpeg'}
}
function serveHome (req, res) {
  if (res.push) {
    const stream = res.push('/image.jpg', image, options)
    stream.end(image)
  }
}
app.get('/', serveHome)

Server Push in NodeJS

Push resource if client/server/proxy supports it

No NGINX support as of March '17, use https://github.com/indutny/bud

*

Server Push always pushed the resources even if they might be already in the browser's cache.

Server Push Behavior

"Being Pushy" by Yoav Weiss

"HTTP/2 push is tougher than I thought" by Jake Archibald

"cache-digest"

<html>
<link rel="stylesheet" 
  href="app.css">
<script src="app.js">
</script>
</html>

Server

GET index.html

Client

index.html

I have "app.css" <hash>,

"app.js" <hash>

Not going to push app.css and app.js

time

My opinion:

Wait for Server Push to be refined

HTTP/2 is changing the best performance practices

H2 changes some performance best practices

Source: Ilya Grigorik https://bit.ly/http2-opt

DO LESS WORK

H2: 4 main features 

Multiplexing

Multiple streams over same TCP connection

Compression

Headers and binary frames

H2: 4 main features 

Flow control

Dependency mechanism among resources

Resource push

Server preemptively pushes resources to the client

H2: Security Flaws

  1. Slow Read

  2. Stream Abuse

  3. HPACK Bomb

  4. Dependency DoS

multiplexing

compression

flow control

IIS10 - asking for same stream twice craches the server

Slow Read: 1 TCP connection, asking for lots of resources with small window forces the server thread pool to grow (all servers)

dependency cycle DoS (Apache, nghttpd)

Set max user agent, then send requests - the memory blows up

Web Today

Ajax

ServiceWorker

HTTP/2

QUIC

HTTP/2 Achilles' heel

H2: If a packet is lost

TCP: all have to wait!

Hooman Beheshti - HTTP/2: What no one is telling you

PLR - Packet Loss Rate

IP

TCP

HTTP/1.1

WebSockets

HTTP/2

TLS

Order of packets guarantee

TCP

HTTP/1.1

WebSockets

HTTP/2

TLS

UDP

IP

QUIC

QUIC

like SPDY but over UDP

QUIC works hard to minimize latency, which other protocols (like SCTP over DTLS over TCP/UDP) where not designed to do

'99

2009

2015

HTTP/1.1

SPDY

HTTP/2

QUIC

Secret: many Google properties communicate with Chrome using QUIC

HTTP/0.9

'91

'92

HTTP/1 - SPDY - HTTP/2 - QUIC timeline

2013

(deprecated)

Enable QUIC: chrome://flags/#enable-quic

chrome://net-internals/#quic

Visit Google property

How Secure is QUIC?

"How Secure and Quick is QUIC? Provable Security and Performance Analyses" 2015 IEEE Symposium on Security and Privacy,
S&P 2015
https://eprint.iacr.org/2015/582.pdf

We find that attacking QUIC is not easier than TCP and TLS.

The attacks are mostly DoS that prevent 0-RTT connections

Honorable mention: WebRTC data channel

Server

Content is coming from the server

As more users connect ...

Server

We need a bigger server 💸

Server

We need a bigger server 💸

🔥🔥🔥

HBO's Silicon Valley "Two Days of the Condor", season 2 finale

P2P using data channel in WebRTC

Server

offline != server is down

Checkout:

A peer-to-peer hypermedia protocol
to make the web faster, safer, and more open.

Real-Time Communications capabilities via simple APIs

Dat is the distributed data sharing tool

a streaming torrent client for the web browser and the desktop

HTTP: Conclusions

Every version of HTTP

and related protocols was a solution to a real-world problem

HTTP: Conclusions

Most of the time the problem was not apparent until real world adoption

HTTP/0.9 - loading remote HTML documents

HTTP/1.x - popularity and growth of World Wide Web

Ajax - loading data without full page reload

WebSockets - realtime communication with the server

ServiceWorker - scriptable caching

HTTP/2 - performance for loading modern websites

QUIC - an alternative take on performance for loading modern websites

Tomorrow a new protocol will appear that will try to solve HTTP/2's and QUIC's problems

Only failed products stay unchanged

Tomorrow a new protocol will appear that will try to solve HTTP/2's and QUIC's problems

Gleb Bahmutov @bahmutov

Hello, HTTP!

Hello HTTP!

By Gleb Bahmutov

Hello HTTP!

Let's walk through the history of HTTP, its original design goals and how they shape the modern Internet. We will look at the current standard, and what is coming tomorrow. We will also learn and refresh a whole bunch of acronyms: TCP, HTTP, RFC, IETF, UPD, QUIC. Every developer working or even using WWW will benefit from this presentation.

  • 5,929