@richardwhaling
October 3 2019
"the domain of programs that demand a mental model of the computer as a machine"
It doesn't have to be this way.
Systems programming can be elegant, fun, and done in a language you enjoy.
My hot take:
from learning C, I acquired an intuitive understanding of how to solve problems in an abstract von Neumann machine
(1903-1957)
First Draft of a Report on the EDVAC
(1945)
Electronic Discrete Variable Automatic Computer
EDVAC was the first stored-program computer, which stored data and code in byte-addressable memory.
Earlier computers like ENIAC and Colossus were programmed by patch cables and switches, which was theoretically Turing-complete, but impractical to program.
Theoretical description of a realized Universal Turing Machine, i.e., a general-purpose computer
Unlike a Universal Turing Machines, Von Neumann machines were practical to construct and program
In 7 years, the first computer scientists invented:
An explosion of applications and discoveries enabled by a comprehensible, practical model of a programmable general-purpose computer
C presents an enduring abstract model
of a random-access stored-program computer, with:
Hot Take:
these are the fundamental techniques
of programming a Von Neumann machine
Scala Native is a scalac compiler plugin that compiles Scala programs to binary executables ahead-of-time
Noteworthy for: its advanced optimizer, lightweight runtime, advanced GC, and C interop
Not a JVM - Graal compiles JVM bytecode to machine binary, very different model
Because it understands Scala, Scala Native can provide an elegant DSL for low-level programming
with all the capabilities of C
We're going to illustrate the fundamental techniques:
Each with a short program of less than 20 lines of code
Caveat:
Regular Scala works just fine in Scala Native.
All the features you'll see here belong to the scalanative.unsafe API
The slides that follow will contain extremely unindiomatic, imperative Scala
val i:Int = 6
println(s"Int i has value ${i} and size ${sizeof[Int]} bytes")
val b:Byte = 4
println(s"Byte b has value ${b} and size ${sizeof[Byte]} bytes")
val d:Double = 1.0
println(s"Double d has value ${d} and size ${sizeof[Double]} bytes")
val jPtr:Ptr[Int] = stackalloc[Int]
println(s"jPtr has value ${jPtr} and size ${sizeof[Ptr[Int]]} bytes")
val j:Int = !jPtr
println(s"j has value ${j} and size ${sizeof[Int]}")
!jPtr = 5
println(s"jPtr has value ${jPtr} and size ${sizeof[Ptr[Int]]} bytes")
val j2:Int = !jPtr
println(s"j2 has value ${j2} and size ${sizeof[Int]}, j has value ${j}")
val arraySize = 16 * sizeof[Int]
val allocation:Ptr[Byte] = stdlib.malloc(arraySize)
val intArray = allocation.asInstanceOf[Ptr[Int]]
for (i <- 0 to 16) {
intArray(i) = i * 2
}
for (i <- 0 to 16) {
val address = intArray + i
val item = intArray(i)
val check = !(intArray + i) == intArray(i)
println(s"item $i at address ${intArray + i} has value $item, check: $check")
}
// just to be safe
stdlib.free(allocation)
val hello:CString = c"hello, world"
val helloLen = string.strlen(hello)
val helloString:String = fromCString(hello)
println(s"the string ${helloString} at ${hello} is ${helloLen} bytes long")
println(s"the CString value 'str' is ${sizeof[CString]} bytes long")
for (offset <- 0L to helloLen) {
val chr:CChar = hello(offset)
println(s"${chr.toChar} (${chr}) at ${hello + offset} is ${sizeof[CChar]} bytes long")
}
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Char | H | e | l | l | o | , | | w | o | r | l | d | ! | |
| Hex | 48 | 65 | 6C | 6C | 6F | 2C | 20 | 77 | 6F | 72 | 6C | 64 | 21 | 00 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
type LabeledPoint = CStruct3[CString,Int,Int]
val point:Ptr[LabeledPoint] = stackalloc[LabeledPoint]
point._1 = c"foo"
point._2 = 3
point._3 = 5
println(s"struct field ${point.at1} has value ${point._1}")
println(s"struct field ${point.at2} has value ${point._2}")
println(s"struct field ${point.at3} has value ${point._3}")
println(s"struct ${point} has size ${sizeof[LabeledPoint]}")
+--------+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+--------+----+----+----+----+----+----+----+----+
| Value | 5 | 12 |
+--------+----+----+----+----+----+----+----+----+
| Hex | 05 | 00 | 00 | 00 | 0C | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+
type LabeledFoo = CStruct2[Int,Int]
type LabeledBar = CStruct2[Int,Long]
val FOO = 0
val BAR = 1
println(s"LabeledFoo size is ${sizeof[LabeledFoo]}")
println(s"LabeledBar size is ${sizeof[LabeledBar]}")
val array = stdlib.malloc(8 * sizeof[LabeledBar]).asInstanceOf[Ptr[LabeledBar]]
for (i <- 0 until 8) {
array(i)._2 = 0
if (i % 2 == 0) {
val item = array(i).asInstanceOf[LabeledFoo]
item._1 = FOO
item._2 = Random.nextInt() % 16
} else {
val item = array(i).asInstanceOf[LabeledBar]
item._1 = BAR
item._2 = Random.nextLong % 64
}
}
for (j <- 0 until 8) {
val tag = array(j)._1
if (tag == FOO) {
val item = array(j).asInstanceOf[LabeledFoo]
println(s"Foo: ${tag} at $j = ${item._2}")
} else {
val item = array(j).asInstanceOf[LabeledBar]
println(s"Bar: ${tag} at $j = ${item._2}")
}
}
+--------+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+--------+----+----+----+----+----+----+----+----+
| Value | 5 | 12 |
+--------+----+----+----+----+----+----+----+----+
| Hex | 05 | 00 | 00 | 00 | 0C | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+
+--------+----+----+----+----+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+
| Value | 3 | 29 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+
| Hex | 03 | 00 | 00 | 00 | 1D | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+
type Comparator = CFuncPtr2[Ptr[Byte],Ptr[Byte],Int]
type Record = CStruct2[Int,Int]
val comp = new Comparator {
def apply(aPtr:Ptr[Byte], bPtr:Ptr[Byte]):Int = {
val a = !(aPtr.asInstanceOf[Ptr[Record]])
val b = !(bPtr.asInstanceOf[Ptr[Record]])
a._2 - b._2
}
}
val size = 8
val recordArray:Ptr[Record] = stdlib.malloc(8 * sizeof[Record]).asInstanceOf[Ptr[Record]]
for (i <- 0 until 8) {
recordArray(i)._1 = i
recordArray(i)._2 = Random.nextInt() % 256
}
stdlib.qsort(recordArray.asInstanceOf[Ptr[Byte]],8,sizeof[Record],comp)
for (i <- 0 until 8) {
val rec = recordArray(i)
println(s"${i}: random value ${rec._2} from original position ${rec._1}")
}
object ExecutionContext {
def global: ExecutionContextExecutor = QueueExecutionContext
private object QueueExecutionContext extends ExecutionContextExecutor {
def execute(runnable: Runnable): Unit = queue += runnable
def reportFailure(t: Throwable): Unit = t.printStackTrace()
}
private val queue: ListBuffer[Runnable] = new ListBuffer
private[runtime] def loop(): Unit = { // this runs after main() returns
while (queue.nonEmpty) {
val runnable = queue.remove(0)
try {
runnable.run()
} catch {
case t: Throwable =>
QueueExecutionContext.reportFailure(t)
}
}
}
}
libuv abstracts over different operating systems
and different kinds of IO
Consistent model of callbacks attached to handles
We just need to adapt a queue-based EC to libuv's lifecycle of callbacks
trait EventLoopLike extends ExecutionContextExecutor {
def addExtension(e:LoopExtension):Unit
def run(mode:Int = UV_RUN_DEFAULT):Unit
}
trait LoopExtension {
def activeRequests():Int
}
The LoopExtension trait lets us coordinate Future execution with other IO tasks on the same loop, and modularize our code.
object EventLoop extends EventLoopLike {
val loop = uv_default_loop()
private val taskQueue = ListBuffer[Runnable]()
def execute(runnable: Runnable): Unit = taskQueue += runnable
def reportFailure(t: Throwable): Unit = {
println(s"Future failed with Throwable $t:")
t.printStackTrace()
}
// ...
execute() is invoked as soon as a Future is ready to start running, but we can defer it until a callback fires
// ...
private def dispatchStep(handle:PrepareHandle) = {
while (taskQueue.nonEmpty) {
val runnable = taskQueue.remove(0)
try {
runnable.run()
} catch {
case t: Throwable => reportFailure(t)
}
}
if (taskQueue.isEmpty && !extensionsWorking) {
println("stopping dispatcher")
LibUV.uv_prepare_stop(handle)
}
}
private val dispatcher_cb = CFunctionPtr.fromFunction1(dispatchStep)
private def initDispatcher(loop:LibUV.Loop):PrepareHandle = {
val handle = stdlib.malloc(uv_handle_size(UV_PREPARE_T))
check(uv_prepare_init(loop, handle), "uv_prepare_init")
check(uv_prepare_start(handle, dispatcher_cb), "uv_prepare_start")
return handle
}
private val dispatcher = initDispatcher(loop)
// ...
private val extensions = ListBuffer[LoopExtension]()
private def extensionsWorking():Boolean = {
extensions.exists( _.activeRequests > 0)
}
def addExtension(e:LoopExtension):Unit = {
extensions.append(e)
}
An ExecutionContext is useless without meaningful async capabilities.
We'll implement the simplest one, a delay.
object Timer extends LoopExtension {
EventLoop.addExtension(this)
var serial = 0L
var timers = mutable.HashMap[Long,Promise[Unit]]() // the secret sauce
override def activeRequests():Int =
timers.size
def delay(dur:Duration):Future[Unit] = ???
val timerCB:TimerCB = ???
}
@extern
object TimerImpl {
type Timer: Ptr[Long] // why long and not byte?
def uv_timer_init(loop:Loop, handle:TimerHandle):Int = extern
def uv_timer_start(handle:TimerHandle, cb:TimerCB,
timeout:Long, repeat:Long):Int = extern
def uv_timer_stop(handle:TimerHandle):Int = extern
}
How is it safe to treat Timer as Ptr[Long]?
def delay(dur:Duration):Future[Unit] = {
val millis = dur.toMillis
val promise = Promise[Unit]()
serial += 1
val timer_id = serial
timers(timer_id) = promise
val timer_handle = stdlib.malloc(uv_handle_size(UV_TIMER_T))
uv_timer_init(EventLoop.loop,timer_handle)
val timer_data = timer_handle.asInstanceOf[Ptr[Long]]
!timer_data = timer_id
uv_timer_start(timer_handle, timerCB, millis, 0)
promise.future
}
We can store an 8-byte serial number in the TimerHandle, and retrieve it in our callback.
val timerCB = new TimerCB {
def apply(handle:TimerHandle):Unit = {
println("callback fired!")
val timer_data = handle.asInstanceOf[Ptr[Long]]
val timer_id = !timer_data
val timer_promise = timers(timer_id)
timers.remove(timer_id)
println(s"completing promise ${timer_id}")
timer_promise.success(())
}
}
We can dereference the TimerHandle safely - the compiler thinks it's a Ptr[Long] so it only reads the first 8 bytes.
Then we use the serial number for a map lookup to retrieve our state.
object Main {
implicit val ec:ExecutionContext = EventLoop
def main(args:Array[String]):Unit = {
println("hello!")
Timer.delay(3 seconds).onComplete { _ =>
println("goodbye!")
}
EventLoop.run()
}
}