Bootstrapping the Web
with Scala Native
Richard Whaling
Spantree Technology Group
This talk is about:
- Scala Native
- Systems progamming
- Server programming
but also:
- Working with emerging technology
- Improvised solutions
- OS as platform
- Language as platform
(or how to get things done without the JVM)
Talk Outline
- Introduction to Scala Native
- Introduction to Server Programming
- Minimal Viable Server
- Multiplexed Protocols
- Multiplexed I/O
- Reflections
About Me
Twitter: @RichardWhaling
https://spantree.net/blog/
Scala Native is:
- Scala!
- A scalac/sbt plugin
- An LLVM-based AOT compiler
- Great for command-line tools
- No JVM
- Includes implementations of some JDK classes
- Types and Annotations for C interop
The Basics
object Hello {
def main(args: Array[String]):Unit = {
println("Hello, CASE!")
}
}
This just works!
Structs and Pointers
type Vec = CStruct3[Double, Double, Double]
val vec:Ptr[Vec] = stackalloc[Vec]
!vec._1 = 10.0 // initialize fields
!vec._2 = 20.0
!vec._3 = 30.0
length(vec) // pass by reference
Interop
@extern object stdlib {
def malloc(size: CSize): Ptr[Byte] = extern
def free(ptr: Ptr[Byte]): CInt = extern
}
val ptr = stdlib.malloc(32)
stdlib.free(ptr)
Included Libraries
- Implementations of hundreds of JDK classes
- Partial ANSI C Bindings
- Partial POSIX C Bindings
What can we do?
What can't we do?
What does a server do?
- Listens on a known port
- Accepts incoming connections
- Reads requests from clients
- Writes back responses
A typical server:
The catch: it has to do all of these at the same time...
with (traditionally) blocking system calls.
TCP Socket System Calls
socket() -- initializes a new socket and selects protocol bind() -- assigns an address and port to a socket listen() -- begins accepting incoming connections on a bound socket accept() -- takes an incoming connection off the OS backlog connect() -- initiates an outgoing connection on an unbound socket
read()/recv()/recvmsg() -- reads bytes from a connected socket
write()/send()/sendmsg() -- writes bytes to a connected socket
close() -- closes a connected socket
ioctl/setsocketopt()/fcntl() -- evil
socket()
bind()
listen()
accept()
read()
write()
close()
Berkeley Socket Dance
socket()
connect()
write()
read()
close()
Server
Client
Berkeley Socket Dance
def serve(port:UShort): Unit = {
// Allocate and initialize address struct
val addr_size = sizeof[sockaddr_in]
val server_address = malloc(addr_size).cast[Ptr[sockaddr_in]]
!server_address._1 = AF_INET.toUShort // IP Socket
!server_address._2 = htons(port) // port
!server_address._3._1 = INADDR_ANY // bind to 0.0.0.0
// Bind and listen on socket
val sock_fd = socket(AF_INET, SOCK_STREAM, 0) // SOCK_STREAM indicates TCP and not UDP
val bind_result = bind(sock_fd, server_address.cast[Ptr[sockaddr]], addr_size.toUInt)
println(s"bind returned $bind_result")
val listen_result = listen(sock_fd, 128)
println(s"listen returned $listen_result")
// Allocate and initialize client address struct
val client_address = malloc(addr_size).cast[Ptr[sockaddr_in]]
val client_addr_size = stackalloc[UInt]
!client_addr_size = addr_size.toUInt
// Main accept() loop
while (true) {
val conn_fd = accept(sock_fd, client_address.cast[Ptr[sockaddr]], client_addr_size)
println(s"accept returned fd $conn_fd")
handleConnection(conn_fd)
}
close(sock_fd)
}
Berkeley Socket Dance
def handleConnection(conn_fd:Int, max_size:Int = 1024): Unit = {
val line_buffer = malloc(max_size)
while (true) {
val read_result = read(conn_fd, line_buffer, max_size)
println(s"read $read_result bytes")
if (read_result == 0) // EOF
return
line_buffer(read_result) = 0 // Append a string-end marker
val write_result = write(conn_fd, line_buffer, read_result)
println(s"wrote $write_result bytes")
}
}
What happens when a new connection comes in?
Introducing fork()
-
fork() clones a process in-place
-
one process calls, two return
-
parent-child relationship
-
parent is responsible for supervising the child
-
if a child exits, it stays as a "zombie" until "reaped"
socket()
bind()
listen()
accept()
read()
write()
close()
socket()
connect()
write()
read()
close()
Server
Client
fork()
Introducing fork()
Introducing fork()
def handleConnection(conn_fd:Int, max_size:Int = 1024): Unit = {
val pid = fork()
if (pid != 0) {
// In parent process
println("forked pid $pid to handle connection")
close(conn_fd)
return
} else {
// In child process
println("fork returned $pid, in child process")
val line_buffer = malloc(max_size)
while (true) {
val read_result = read(conn_fd, line_buffer, 1024)
println(s"read $read_result bytes")
if (read_result == 0) {
// Cleanup
close(conn_fd)
sys.exit()
}
line_buffer(read_result) = 0
val write_result = write(conn_fd, line_buffer, read_result)
println(s"wrote $write_result bytes")
}
}
}
Downsides
- Testing
- Robustness
- Portability
- General sanity
How can we avoid writing our own socket code?
The Unix Philosophy
- Write programs that do one thing and do it well.
- Write programs to work together.
- Write programs to handle text streams, because that is a universal interface.
Peter H. Salus,
from A Quarter Century of UNIX
The Unix Philosophy
Conjecture: HTTP is a solved problem.
What is the simplest way to use a stable HTTP server for a SN app?
How does it perform?
Introducing exec()
-
Actually a family of 6 very similar functions
-
Executes a brand-new program - cannot return
-
Can set arguments and environment variables
-
New program inherits open file descriptors
socket()
bind()
listen()
accept()
Introducing exec()
socket()
connect()
write()
read()
close()
Server
Client
fork()
exec()
?
Introducing exec()
def handleConnectionExec(conn_fd:Int, path:CString, args:Ptr[CString]): Unit = {
val pid = fork()
if (pid != 0) {
println("forked pid $pid to handle connection")
close(conn_fd)
return
} else {
println("fork returned $pid, in child process")
execv(path, args)
}
Almost there!
- fork()/exec() is enough for stream-oriented services.
- HTTP adds a request/response protocol
- HTTP introduces "resources" and other metadata
- RFC 2616 (HTTP/1.1) is about 280 pages long
What we need:
- Generic handling of concurrent HTTP connections
- Flexible routing of requests to various programs
- Simple request/response protocol for handlers
Apache httpd
CGI
- Traditional prefork based web server*
- Directly descended from NCSA httpd
- May or may not pun on "a patchy" web server
- Isolated processes per request
- All communication over standard file IO
- Headers and params in environment
- Can be implemented in bash, perl, awk, C...
A Minimal CGI Handler
object Main {
def main(args: Array[String]): Unit = {
println("Content-type: text/html\r\n\r\n")
println("Hello, Strangeloop!")
}
}
Building the app
# notice the FROM - AS structure
FROM scala-native-base-build AS build
# Set up the directory structure for our project
RUN mkdir -p /root/project-build/project
WORKDIR /root/project-build
# Resolve all our dependencies and plugins to speed up future compilations
ADD ./project/plugins.sbt project/
ADD ./project/build.properties project/
ADD build.sbt .
RUN sbt update
# Add and compile our actual application source code
ADD . /root/project-build/
RUN sbt clean nativeLink
# Copy the binary executable to a consistent location
RUN cp ./target/scala-2.11/*-out ./dinosaur-build-out
Packaging the app
# Start over from a clean Alpine image, in the same Dockefile
FROM alpine:3.3
# Copy in C libraries from previous build
COPY --from=build \
/usr/lib/libunwind.so.8 \
/usr/lib/libunwind-x86_64.so.8 \
/usr/lib/libgc.so.1 \
/usr/lib/libstdc++.so.6 \
/usr/lib/libgcc_s.so.1 \
/usr/lib/
COPY --from=build \
/usr/local/lib/libre2.so.0 \
/usr/local/lib/libre2.so.0
# Copy in the executable
COPY --from=build \
/root/project-build/dinosaur-build-out /var/www/localhost/cgi-bin/app
COPY httpd.conf /etc/apache2/httpd.conf
COPY mpm.conf /etc/apache2/mpm.conf
RUN apk --update add apache2 apache2-utils
RUN mkdir -p /run/apache2
ADD apache.entrypoint.sh /root/
ENTRYPOINT "/root/apache.entrypoint.sh"
Does it work?
A CGI Micro-framework
object main {
def main(args: Array[String]): Unit = {
Router.init()
.get("/")("<H1>Welcome to Dinosaur!</H1>")
.get("/hello") { request =>
"Hello World!"
}
.get("/who")( request =>
request.pathInfo() match {
case Seq("who") => "Who's there?"
case Seq("who",x) => "Hello, " + x
case Seq("who",x,y) => "Hello both of you"
case _ => "Hello y'all!"
}
)
.get("/bye")( request =>
request.params("who")
.map { x => "Bye, " + x }
.mkString(". ")
)
.dispatch()
}
}
A CGI Micro-framework
trait Router {
def handle(method: Method, path:String)(f: Request => Response):Router
def get(path:String)(f: Request => Response):Router = handle(GET, path)(f)
def post(path:String)(f: Request => Response):Router = handle(POST, path)(f)
def put(path:String)(f: Request => Response):Router = handle(PUT, path)(f)
def delete(path:String)(f: Request => Response):Router = handle(DELETE, path)(f)
def dispatch(): Unit
}
case class Request(
method: Function0[Method],
pathInfo: Function0[Seq[String]],
params: Function1[String, Seq[String]]
)
case class Response(
body: ResponseBody,
statusCode: Int = 200,
headers: Map[String, String] = Map("Content-type" -> "text/html; charset=utf-8")
)
A CGI Micro-framework
object CgiUtils {
def env(key: CString): String = {
val lookup = stdlib.getenv(key)
if (lookup == null) {
""
} else {
fromCString(lookup)
}
}
def parsePathInfo(pathInfo: String): Seq[String] = {
pathInfo.split("/").filter( _ != "" )
}
def parseQueryString(queryString: String): Function1[String, Seq[String]] = {
val pairs = queryString.split("&").map( pair =>
pair.split("=") match {
case Array(key, value) => (key,value)
}
).groupBy(_._1).toSeq
val groupedValues = for ( (k,v) <- pairs;
values = v.toSeq.map(_._2) )
yield (k -> values)
return groupedValues.toMap.getOrElse(_,Seq.empty)
}
}
A CGI Micro-framework
case class CGIRouter(handlers:Seq[Handler]) extends Router {
def dispatch(): Unit = {
val request = Router.parseRequest()
val matches = for ( h @ Handler(method, pattern, handler) <- this.handlers
if request.method() == method
if request.pathInfo().startsWith(pattern)) yield h
val bestHandler = matches.maxBy( _.pattern.size )
val response = bestHandler.handler(request)
for ( (k,v) <- response.inferHeaders ) {
System.out.println(k + ": " + v)
}
System.out.println()
System.out.println(response.bodyToString)
}
}
Performance
40 ms mean response with 10 users
99th percentile response goes over 1s at 150 users
mean response plateaus around 500 ms at 300 users
peaks around 400 requests/sec
Compared to a python-based CGI app, which exhibits:
136 ms mean response with 10 users
99th percentile response goes over 1s at 75 users
mean response plateaus around 5s at 250 users
peaks around 200 requests/sec
Performance
But compared to a trivial node.js/Express app:
median response 7 ms with 10 users
99th percentile stays under 1s up to 2000 users
error rate approaches 15% around 500 users
peaks around 2000 requests/sec
Performance
How can we do better?
- Multiplexed Protocols
- Multiplexed I/O
How can we do better without spending years of our lives?
Multiplexed Protocols
- Technique for combining streams onto a single connection
- Relies on a proxy server to handle raw HTTP
- All requests and responses are "framed" with an identifier
- Proxy is responsible for routing responses to correct client.
Two prominent examples:
- FastCGI
- HTTP/2
Web Server
FastCGI Application
HTTP Client
HTTP Client
HTTP Client
HTTP
FastCGI
FastCGI
What makes FastCGI different from regular CGI?
- Persistent processes
- Persistent connections
- Multiplexed requests
- Framed strings + metadata
One catch -- we need a socket.
But do we need concurrency?
FastCGI
Parsing algorithm:
Read 8 byte header from socket
Extract type, Request ID, length, padding from header
Read (length + padding bytes) from socket
if (type == FCGI_STDIN & length == 0):
request is complete, invoke handler and write response
else:
append to pending buffers for Request ID
FastCGI
def readHeader(input: Ptr[Byte], offset:Long): RecordHeader = {
val version = input(0 + offset) & 0xFF
val rec_type = (input(1 + offset) & 0xFF) match {
case 0 => FCGI_UNKNOWN_TYPE
case 1 => FCGI_BEGIN_REQUEST
case 2 => FCGI_ABORT_REQUEST
case 3 => FCGI_END_REQUEST
case 4 => FCGI_PARAMS
case 5 => FCGI_STDIN
case 6 => FCGI_STDOUT
case 7 => FCGI_STDERR
case 8 => FCGI_DATA
case 9 => FCGI_GET_VALUES
case 10 => FCGI_GET_VALUES_RESULT
case _ => FCGI_UNKNOWN_TYPE
}
val req_id_b1 = (input(2 + offset) & 0xFF)
val req_id_b0 = (input(3 + offset) & 0xFF)
val req_id = (req_id_b1 << 8) + (req_id_b0 & 0xFF)
val length = ((input(4 + offset) & 0xFF) << 8) + (input(5 + offset) & 0xFF)
val padding = input(6 + offset) & 0xFF
RecordHeader(version,rec_type,req_id,length,padding)
}
FastCGI
def readParam(byteArray: Ptr[Byte], arr_offset:Long, length:Long)
: (Ptr[Byte], Ptr[Byte], Long) = {
val name_len_offset = arr_offset + 0
val (name_len:Long, val_len_offset:Long) =
if ((byteArray(name_len_offset) & 0x80) == 0) {
val len = byteArray(name_len_offset)
(len, arr_offset + 1)
} else {
val len = ((byteArray(name_len_offset) & 0x7F) << 24) +
((byteArray(name_len_offset + 1) & 0xFF) << 16) +
((byteArray(name_len_offset + 2) & 0xFF) << 8) +
(byteArray(name_len_offset + 3) & 0xFF)
(len, arr_offset + 4)
}
val (val_len:Long, content_offset:Long) =
if ((byteArray(val_len_offset) & 0x80) == 0) {
val len = byteArray(val_len_offset)
(len, val_len_offset + 1)
} else {
val len = ((byteArray(val_len_offset) & 0x7F) << 24) +
((byteArray(val_len_offset + 1) & 0xFF) << 16) +
((byteArray(val_len_offset + 2) & 0xFF) << 8) +
(byteArray(val_len_offset + 3) & 0xFF)
(len, val_len_offset + 4)
}
val name = byteArray + content_offset
val value = byteArray + content_offset + name_len
val next_param_offset = content_offset + name_len + val_len
(name, value, next_param_offset)
}
Improvising a socket
#!/bin/bash
rm /tmp/app.socket
rm /tmp/app.fifo
mkfifo /tmp/app.fifo
nginx -g "daemon off;" &
export ROUTER_MODE=FCGI
nc -l -U /tmp/app.socket < /tmp/app.fifo | /var/www/localhost/cgi-bin/dinosaur-build-out > /tmp/app.fifo
Nginx
nc
app
socket
fifo
(better option: write a proxy in ~80 lines of Go)
Performance
- mean response in 4ms under light load
- 500 users -- .1% error rate, 283ms mean response
- Backlog starts to overflow around 1000 users
- Overflows register as fast refusals rather than timeouts
- Peaks around 1500 requests/sec
Multiplexed I/O
- Traditional options: select() and poll()
- Non-standard options: epoll, kqueue, iocp*
- All provide ways to poll the state of many sockets
- Polls listening and connection sockets at once
- "Quirky"
- Tends to require use of ioctl(), setsockopt(), fcntl()
- Not especially portable
Multiplexed I/O
listener = setUpListeningSocket()
pollSet = set(listener)
while true:
readySockets = poll(pollSet)
for socket in readySockets:
if socket == listener:
newConnection = accept(listener)
pollSet.add(newConnection)
else:
if socket.readyToRead:
read(socket)
else if socket.readyToWrite:
write(socket)
LibUV
LibUV, The node.js event loop:
- cross-platform C library (Linux, BSD, Windows)
- multiplexed IO on a single thread/single process.
- backed by native async primitives: epoll/kqueue/iocp
- callback-oriented API
- strict memory management requirements
LibUV
@link("uv")
@extern
object LibUV {
type PipeHandle = Ptr[Byte]
type Loop = Ptr[Byte]
type Buffer = CStruct2[Ptr[Byte],CSize]
type WriteReq = Ptr[Ptr[Byte]]
type ShutdownReq = Ptr[Ptr[Byte]]
type Connection = Ptr[Byte]
type ConnectionCB = CFunctionPtr2[PipeHandle,Int,Unit]
type AllocCB = CFunctionPtr3[PipeHandle,CSize,Ptr[Buffer],Unit]
type ReadCB = CFunctionPtr3[PipeHandle,CSSize,Ptr[Buffer],Unit]
type WriteCB = CFunctionPtr2[WriteReq,Int,Unit]
type ShutdownCB = CFunctionPtr2[ShutdownReq,Int,Unit]
type CloseCB = CFunctionPtr1[PipeHandle,Unit]
def uv_default_loop(): Loop = extern
def uv_loop_size(): CSize = extern
def uv_handle_size(h_type:Int): CSize = extern
def uv_req_size(r_type:Int): CSize = extern
def uv_pipe_init(loop:Loop, handle:PipeHandle, ipcFlag:Int ): Unit = extern
def uv_pipe_bind(handle:PipeHandle, socketName:CString): Int = extern
def uv_listen(handle:PipeHandle, backlog:Int, callback:ConnectionCB): Int = extern
def uv_accept(server:PipeHandle, client:PipeHandle): Int = extern
def uv_read_start(client:PipeHandle, allocCB:AllocCB, readCB:ReadCB): Int = extern
def uv_write(writeReq:WriteReq, client:PipeHandle, bufs: Ptr[Buffer], numBufs: Int, writeCB:WriteCB): Int = extern
def uv_read_stop(client:PipeHandle): Int = extern
def uv_shutdown(shutdownReq:ShutdownReq, client:PipeHandle, shutdownCB:ShutdownCB): Int = extern
def uv_close(handle:PipeHandle, closeCB: CloseCB): Unit = extern
def uv_run(loop:Loop, runMode:Int): Int = extern
}
LibUV
def dispatch(): Unit = {
val loop:Loop = uv_default_loop()
val pipe_size = uv_handle_size(7)
val pipe:PipeHandle = stackalloc[Byte](pipe_size)
uv_pipe_init(loop, pipe, 0)
var r = uv_pipe_bind(pipe, c"/tmp/app.socket")
println(s"uv_pipe_bind returned $r")
r = uv_listen(pipe, 4096, onConnectCB)
println(s"uv_listen returned $r")
r = uv_run(loop, 0)
println(s"uv_run returned $r")
}
def onConnect(server:PipeHandle, status:Int): Unit = {
println("connection received!")
val client:PipeHandle = stdlib.malloc(pipe_size)
uv_pipe_init(loop, client, 0)
var r = uv_accept(server, client)
println(s"uv_accept returned $r")
uv_read_start(client, onAllocCB, onReadCB)
}
val onConnectCB = CFunctionPtr.fromFunction2(onConnect)
LibUV
def onRead(pipe:PipeHandle, size:CSSize, buffer:Ptr[Buffer]): Unit = {
if (size >= 0) {
var position = 0
// We are going to store the positions of the CGI parameter and STDIN frames
var params:(Int,RecordHeader) = (0,null)
var stdin:(Int,RecordHeader) = (0,null)
// Scan the input buffer for the positions of useful metadata
while (position < size) {
val header = readHeader(!buffer._1,position)
reqId = header.reqId
if (header.rec_type == FCGI_PARAMS & header.length > 0)
params = (position,header)
else if (header.rec_type == FCGI_STDIN & header.length > 0)
stdin = (position, header)
position += (8 + header.length + header.padding)
}
// Generate a response and enqueue it to the pipe (re-use the input buffer for output)
val write_req:WriteReq = stdlib.malloc(write_req_size).cast[WriteReq]
!write_req = !buffer._1
!buffer._2 = makeResponse(reqId, params, stdin, !write_req)
uv_write(write_req, pipe, buffer, 1, onWriteCB)
} else {
// or we have read 0 bytes and can close the connection
uv_read_stop(pipe)
val shutdownReq = stdlib.malloc(shutdown_req_size).cast[ShutdownReq]
!shutdownReq = pipe
uv_shutdown(shutdownReq, pipe, myShutdownCB)
stdlib.free(!buffer._1)
}
}
val onReadCB = CFunctionPtr.fromFunction3(onRead)
Performance
When deployed on a UNIX socket behind nginx:
- mean response 4ms under light load
- with 1000 users:
- mean response 140ms
- error rates about 1/2 of node
- no timeouts
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
Reflections
What do our languages really need to provide?
Does serving up HTTP belong in our app or in infrastructure?
What can we expect from our OS?
What can we expect from our cluster?
Things are about to change.
Bootstrapping the Web with Scala Native
By Richard Whaling
Bootstrapping the Web with Scala Native
- 2,736