A Brief Introduction to Systems Programming,

with Scala Native

@richardwhaling

 

October 3 2019

About me:

  • @RichardWhaling
  • Lead Data Engineer at M1Finance
  • Author of: 

About this talk

  • Scala Native has moved forward a lot in five months
  • I'll give a quick status update on Scala Native and concurrency
  • Context & background on Scala Native and systems programming more generally
  • Dive deep into example programs that show how Scala Native exposes the fundamental techniques of systems programming

What's New in Scala Native

  • Current Release is 0.4.0M2
  • M3 coming soon with 2.12/2.13 support
  • 0.4.0 to follow
  • Released official LibUV bindings, scala-native-loop
  • Aiming for Cats and Zio support for 0.4
  • Designed for compatability with existing libraries
  • Targeting FP frameworks lets us bypass Java IO

What Is Systems Programming?

"the domain of programs that demand a mental model of the computer as a machine"

  • Operating Systems, IO, Compilers, VM's, Containers, Embedded, Real-time
  • Traditionally done in C
  • Traditionally taught in school and promptly forgotten

It doesn't have to be this way.

Systems programming can be elegant, fun, and done in a language you enjoy.

Why C?

  • Inertia?
  • OS and hardware vendors...
  • Many recent arguments against this:
    • Steve Klabnik: https://tinyurl.com/y7akng69
    • David Chisnall: https://tinyurl.com/yxpapq3g
  • C describes the behavior of an abstract machine 
  • Modern processors are very different from a PDP-11

My hot take:

from learning C, I acquired an intuitive understanding of how to solve problems in an abstract von Neumann machine

John von Neumann

(1903-1957)

  • Incredibly accomplished mathematician and physicist
  • Inventor of mergesort
  • Described the architecture of modern computers in his

First Draft of a Report on the EDVAC

(1945)

EDVAC

Electronic Discrete Variable Automatic Computer

  • Proposed in 1944, operational in 1949
  • Designed by John Mauchly and J. Presper Eckert
  • 1000 34-bit words of ultrasonic mercury memory

EDVAC was the first stored-program computer, which stored data and code in byte-addressable memory.

 

Earlier computers like ENIAC and Colossus were programmed by patch cables and switches, which was theoretically Turing-complete, but impractical to program.

Von Neumann Architecture

Theoretical description of a realized Universal Turing Machine, i.e., a general-purpose computer

 

Unlike a Universal Turing Machines, Von Neumann machines were practical to construct and program

1944-1951

In 7 years, the first computer scientists invented:

  • electronic random-access memory
  • conditional branches
  • goto instructions
  • subroutine invocation and return
  • mergesort
  • two's complement integers
  • Monte Carlo methods
  • computer music
  • computer games

An explosion of applications and discoveries enabled by a comprehensible, practical  model of a programmable general-purpose computer

1972: C

C presents an enduring abstract model

of a random-access stored-program computer, with:

  • primitive data types: bytes, ints, floats
  • zero-terminated variable-length byte strings
  • arrays
  • structs (i.e., product types)
  • unions (i.e., sum types)
  • pointers
  • function pointers (i.e., functions as values)

Hot Take:

these are the fundamental techniques

of programming a Von Neumann machine

2017: Scala Native

Scala Native is a scalac compiler plugin that compiles Scala programs to binary executables ahead-of-time

Noteworthy for: its advanced optimizer, lightweight runtime, advanced GC, and C interop

Not a JVM - Graal compiles JVM bytecode to machine binary, very different model

Because it understands Scala, Scala Native can provide an elegant DSL for low-level programming

with all the capabilities of C 

Systems Programming in Scala Native

We're going to illustrate the fundamental techniques:

  1. Primitive Data Types
  2. Pointers
  3. Strings
  4. Arrays
  5. Structs
  6. Unions and "type puns"
  7. Functions

Each with a short program of less than 20 lines of code

Systems Programming in Scala Native

Caveat:

Regular Scala works just fine in Scala Native.

All the features you'll see here belong to the scalanative.unsafe API

The slides that follow will contain extremely unindiomatic, imperative Scala

1. Primitive Data Types

val i:Int = 6
println(s"Int i has value ${i} and size ${sizeof[Int]} bytes")

val b:Byte = 4
println(s"Byte b has value ${b} and size ${sizeof[Byte]} bytes")

val d:Double = 1.0
println(s"Double d has value ${d} and size ${sizeof[Double]} bytes")
  • Certain data types are fundamental
  • These types have concrete representations and fixed sizes
  • Bool, Byte, Int, Long, Float, Double
  • 1-8 bytes
  • Strings are not a primitive in C

2. Pointers

val jPtr:Ptr[Int] = stackalloc[Int]
println(s"jPtr has value ${jPtr} and size ${sizeof[Ptr[Int]]} bytes")

val j:Int = !jPtr
println(s"j has value ${j} and size ${sizeof[Int]}")

!jPtr = 5
println(s"jPtr has value ${jPtr} and size ${sizeof[Ptr[Int]]} bytes")

val j2:Int = !jPtr
println(s"j2 has value ${j2} and size ${sizeof[Int]}, j has value ${j}")
  • A pointer denotes the address of a value in memory
  • Generally a 64-bit unsigned integer under the hood
  • Pointer values are created by explicit allocation
  • Pointers are read and updated with the dereference operator "!"
  • No addressOf/& operator
  • Better safety - can't break the seal on GC managed objects
  • Semantics related to reference and pointer types in Haskell/SML

4 Rules

  • Every piece of data lives somewhere in memory
  • Every piece of data has some fixed size
  • Some objects are managed (but they still live somewhere)
  • All manipulations of addresses and sizes are simple arithmetic

3. Arrays

val arraySize = 16 * sizeof[Int]
val allocation:Ptr[Byte] = stdlib.malloc(arraySize)
val intArray = allocation.asInstanceOf[Ptr[Int]]
for (i <- 0 to 16) {
    intArray(i) = i * 2
}
for (i <- 0 to 16) {
    val address = intArray + i
    val item = intArray(i)
    val check = !(intArray + i) == intArray(i)
    println(s"item $i at address ${intArray + i} has value $item, check: $check")
}
// just to be safe
stdlib.free(allocation)
  • Arrays are really just pointers with arithmetic operators
  • Access by index is equivalent to addition and dereference
  • Address is incremented by offset times element size
  • Seeks are constant time because layout is uniform

4. Strings

val hello:CString = c"hello, world"
val helloLen = string.strlen(hello)
val helloString:String = fromCString(hello)

println(s"the string ${helloString} at ${hello} is ${helloLen} bytes long")
println(s"the CString value 'str' is ${sizeof[CString]} bytes long")

for (offset <- 0L to helloLen) {
  val chr:CChar = hello(offset)
  println(s"${chr.toChar} (${chr}) at ${hello + offset} is ${sizeof[CChar]} bytes long")
}
  • How do we handle sequential data of unknown size?
  • Two techniques: terminating with 0 byte or storing length
  • 0-terminated strings were probably a mistake
  • CChar is an alias for Byte
  • CString is an alias for Ptr[CChar]
  • Runtime helps with allocation to convert to Scala string
  • Moving to a safe representation ASAP is a huge safety win

4. Strings

  • How do we handle sequential data of unknown size?
  • Two techniques: terminating with 0 byte or storing length
  • 0-terminated strings were probably a mistake
  • CChar is an alias for Byte
  • CString is an alias for Ptr[CChar]
  • Runtime helps with allocation to convert to Scala string
  • Moving to a safe representation ASAP is a huge safety win
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Offset | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | A  | B  | C  | D  |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Char   | H  | e  | l  | l  | o  | ,  |    | w  | o  | r  | l  | d  | !  |    |
| Hex    | 48 | 65 | 6C | 6C | 6F | 2C | 20 | 77 | 6F | 72 | 6C | 64 | 21 | 00 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+

Recap

  • Ptr[T] indicates the address of zero or more items of T
  • Abstracts over Option, Seq, String-like capabilities
  • Best thought of as a mutable container:
  • Represents a capability to change remote data
  • Unsafe!  Segfaults, undefined behavior, etc.

5. Structs

  type LabeledPoint = CStruct3[CString,Int,Int]
  val point:Ptr[LabeledPoint] = stackalloc[LabeledPoint]
  point._1 = c"foo"
  point._2 = 3
  point._3 = 5

  println(s"struct field ${point.at1} has value ${point._1}")
  println(s"struct field ${point.at2} has value ${point._2}")
  println(s"struct field ${point.at3} has value ${point._3}")

  println(s"struct ${point} has size ${sizeof[LabeledPoint]}")
  • A Struct is a product type, like a case class or tuple
  • Tuple-like behavior by default
  • Fields are stored contiguously, address offset is known a priori
  • ._1 etc retrieves field. .at1 returns address of field
  • Syntactic sugar dereferences pointers to structs for convenience
  • Almost always manipulated via a pointer

5. Structs

  • A Struct is a product type, like a case class or tuple
  • Tuple-like behavior by default
  • Fields are stored contiguously, address offset is known a priori
  • ._1 etc retrieves field. .at1 returns address of field
  • Syntactic sugar dereferences pointers to structs for convenience
  • Almost always manipulated via a pointer
+--------+----+----+----+----+----+----+----+----+
| Offset | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  |
+--------+----+----+----+----+----+----+----+----+
| Value  | 5                 | 12                | 
+--------+----+----+----+----+----+----+----+----+
| Hex    | 05 | 00 | 00 | 00 | 0C | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+

6. Unions

    type LabeledFoo = CStruct2[Int,Int]
    type LabeledBar = CStruct2[Int,Long]
    val FOO = 0
    val BAR = 1
    println(s"LabeledFoo size is ${sizeof[LabeledFoo]}")
    println(s"LabeledBar size is ${sizeof[LabeledBar]}")
    val array = stdlib.malloc(8 * sizeof[LabeledBar]).asInstanceOf[Ptr[LabeledBar]]
    for (i <- 0 until 8) {
      array(i)._2 = 0
      if (i % 2 == 0) {
        val item = array(i).asInstanceOf[LabeledFoo]
        item._1 = FOO
        item._2 = Random.nextInt() % 16
      } else {
        val item = array(i).asInstanceOf[LabeledBar]
        item._1 = BAR
        item._2 = Random.nextLong % 64
      }
    }
    for (j <- 0 until 8) {
      val tag = array(j)._1
      if (tag == FOO) {
        val item = array(j).asInstanceOf[LabeledFoo]
        println(s"Foo: ${tag} at $j = ${item._2}")
      } else {
        val item = array(j).asInstanceOf[LabeledBar]
        println(s"Bar: ${tag} at $j = ${item._2}")
      }
    }
  • A value that can be one or more types can be modeled as a sum type or union
  • Idiomatically in C we do this with unsafe casts
  • If two structs have a common prefix of fields, those fields may be safely used interchangeably
  • In the "tagged union" pattern we use a prefix field to hold type metadata
+--------+----+----+----+----+----+----+----+----+
| Offset | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  |
+--------+----+----+----+----+----+----+----+----+
| Value  | 5                 | 12                | 
+--------+----+----+----+----+----+----+----+----+
| Hex    | 05 | 00 | 00 | 00 | 0C | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+

+--------+----+----+----+----+----+----+----+----+----+----+----+----+
| Offset | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | A  | B  |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+
| Value  | 3                 | 29                                    |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+
| Hex    | 03 | 00 | 00 | 00 | 1D | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+

7. Function Pointers

type Comparator = CFuncPtr2[Ptr[Byte],Ptr[Byte],Int]
type Record = CStruct2[Int,Int]

val comp = new Comparator { 
  def apply(aPtr:Ptr[Byte], bPtr:Ptr[Byte]):Int = {
    val a = !(aPtr.asInstanceOf[Ptr[Record]])
    val b = !(bPtr.asInstanceOf[Ptr[Record]])
    a._2 - b._2
  }
}
val size = 8
val recordArray:Ptr[Record] = stdlib.malloc(8 * sizeof[Record]).asInstanceOf[Ptr[Record]]
for (i <- 0 until 8) {
  recordArray(i)._1 = i
  recordArray(i)._2 = Random.nextInt() % 256
}
stdlib.qsort(recordArray.asInstanceOf[Ptr[Byte]],8,sizeof[Record],comp)
for (i <- 0 until 8) {
  val rec = recordArray(i)
  println(s"${i}: random value ${rec._2} from original position ${rec._1}")  
}
  • Functions are values in C, but not the same as Scala functions
  • A function has a fixed address and no lexical scope
  • Function call is basically just argument marshalling and GOTO
  • Much faster than method dispatch
  • You can fake lexical scope by storing context in extra arguments 
  • No polymorphism, but you can pass Ptr[Byte] and cast

Mechanical Sympathy, Functional Affinity

  • There is an affinity between systems and FP
  • Functions as values, sum/product types
  • Deep roots in Scala's heritage (cf Standard ML)
  • Prior to Scala most functional languages had powerful low-level facilities (even Haskell!)

 

  • I find Scala Native's unsafe API easier, safer, and more productive than writing C
  • Working with the system directly feels, to me, more elegant than going through an OOP layer
  • Folks have suggested porting the scalanative.unsafe API to the JVM and Graal via sun.misc.Unsafe
  • A genuine breakthrough in ergonomics

Thanks!

Questions?

Thanks!

ExecutionContext from Scratch

  • Scala Native includes an EC already
  • The catch - it runs after main() returns
object ExecutionContext {
  def global: ExecutionContextExecutor = QueueExecutionContext

  private object QueueExecutionContext extends ExecutionContextExecutor {
    def execute(runnable: Runnable): Unit = queue += runnable
    def reportFailure(t: Throwable): Unit = t.printStackTrace()
  }

  private val queue: ListBuffer[Runnable] = new ListBuffer

  private[runtime] def loop(): Unit = {  // this runs after main() returns
    while (queue.nonEmpty) {
      val runnable = queue.remove(0)
      try {
        runnable.run()
      } catch {
        case t: Throwable =>
          QueueExecutionContext.reportFailure(t)
      }
    }
  }
}

LibUV's IO system

libuv abstracts over different operating systems

and different kinds of IO

 

Consistent model of callbacks attached to handles

LibUV's event loop

We just need to adapt a queue-based EC to libuv's lifecycle of callbacks
 

  • We queue up work
  • A prepare handle run immediately prior to IO
  • It runs tasks until the queue is exhausted
  • When there are no more tasks and no more IO, we are done!
  • The catch - how do we track IO that isn't a Future?

EventLoop and LoopExtensions

trait EventLoopLike extends ExecutionContextExecutor {
  def addExtension(e:LoopExtension):Unit 
  def run(mode:Int = UV_RUN_DEFAULT):Unit
}

trait LoopExtension {
  def activeRequests():Int
}

The LoopExtension trait lets us coordinate Future execution with other IO tasks on the same loop, and modularize our code.

Our EventLoop

object EventLoop extends EventLoopLike {
  val loop = uv_default_loop()

  private val taskQueue = ListBuffer[Runnable]()
  def execute(runnable: Runnable): Unit = taskQueue += runnable
  def reportFailure(t: Throwable): Unit = {
    println(s"Future failed with Throwable $t:")
    t.printStackTrace()
  }
  // ...

execute() is invoked as soon as a Future is ready to start running, but we can defer it until a callback fires

Our EventLoop callback

  // ...
  private def dispatchStep(handle:PrepareHandle) = {
    while (taskQueue.nonEmpty) {
      val runnable = taskQueue.remove(0)
      try {
        runnable.run()
      } catch {
        case t: Throwable => reportFailure(t)
      }
    }
    if (taskQueue.isEmpty && !extensionsWorking) {
      println("stopping dispatcher")
      LibUV.uv_prepare_stop(handle)
    }
  }

  private val dispatcher_cb = CFunctionPtr.fromFunction1(dispatchStep)

  private def initDispatcher(loop:LibUV.Loop):PrepareHandle = {
    val handle = stdlib.malloc(uv_handle_size(UV_PREPARE_T))
    check(uv_prepare_init(loop, handle), "uv_prepare_init")
    check(uv_prepare_start(handle, dispatcher_cb), "uv_prepare_start")
    return handle
  }

  private val dispatcher = initDispatcher(loop)
  // ...

LoopExtensions

  private val extensions = ListBuffer[LoopExtension]()

  private def extensionsWorking():Boolean = {
    extensions.exists( _.activeRequests > 0)
  }

  def addExtension(e:LoopExtension):Unit = {
    extensions.append(e)
  }

An ExecutionContext is useless without meaningful async capabilities.

 

We'll implement the simplest one, a delay.

Timer

object Timer extends LoopExtension {
  EventLoop.addExtension(this)

  var serial = 0L
  var timers = mutable.HashMap[Long,Promise[Unit]]() // the secret sauce

  override def activeRequests():Int = 
    timers.size

  def delay(dur:Duration):Future[Unit] = ???

  val timerCB:TimerCB = ???
}

@extern
object TimerImpl {
  type Timer: Ptr[Long] // why long and not byte?
  def uv_timer_init(loop:Loop, handle:TimerHandle):Int = extern
  def uv_timer_start(handle:TimerHandle, cb:TimerCB, 
                     timeout:Long, repeat:Long):Int = extern
  def uv_timer_stop(handle:TimerHandle):Int = extern
}

How is it safe to treat Timer as Ptr[Long]?

Timer

  def delay(dur:Duration):Future[Unit] = {
    val millis = dur.toMillis

    val promise = Promise[Unit]()
    serial += 1
    val timer_id = serial
    timers(timer_id) = promise

    val timer_handle = stdlib.malloc(uv_handle_size(UV_TIMER_T))
    uv_timer_init(EventLoop.loop,timer_handle)
    val timer_data = timer_handle.asInstanceOf[Ptr[Long]]
    !timer_data = timer_id
    uv_timer_start(timer_handle, timerCB, millis, 0)

    promise.future
  }

We can store an 8-byte serial number in the TimerHandle, and retrieve it in our callback.

Timer

  val timerCB = new TimerCB { 
    def apply(handle:TimerHandle):Unit = {
      println("callback fired!")
      val timer_data = handle.asInstanceOf[Ptr[Long]]
      val timer_id = !timer_data
      val timer_promise = timers(timer_id)
      timers.remove(timer_id)
      println(s"completing promise ${timer_id}")
      timer_promise.success(())
    }
  }

We can dereference the TimerHandle safely - the compiler thinks it's a Ptr[Long] so it only reads the first 8 bytes.

 

Then we use the serial number for a map lookup to retrieve our state.

Timer

object Main {
  implicit val ec:ExecutionContext = EventLoop

  def main(args:Array[String]):Unit = {
    println("hello!")
    Timer.delay(3 seconds).onComplete { _ =>
      println("goodbye!")
    }

    EventLoop.run()
  }
}

Copy of A Brief Introduction to Systems Programming with Scala Native

By Richard Whaling

Copy of A Brief Introduction to Systems Programming with Scala Native

  • 373