Richard Whaling
M1 Finance
Scala Days North America 2018
(or how to get things done without the JVM)
Twitter: @RichardWhaling
Scala Native contributor, but speaking only for myself
Author of "Modern Systems Programming in Scala", coming soon from Pragmatic
Data Engineer at M1 Finance
object Hello {
def main(args: Array[String]):Unit = {
println("Hello, Scala Days!")
}
}
This just works!
import scalanative.native._, stdio._
object Hello {
def main(args: Array[String]):Unit = {
printf(c"Hello, Scala Days!\n")
}
}
This just works!
import scalanative.native._, stdio._
object Hello {
def main(args: Array[String]):Unit = {
val who:CString = c"Scala Days"
stdio.printf(c"Hello, %s!\n", who)
}
}
it really is the glibc printf()
Certain data types are primitive: Ints, Floats, and Bytes
All unmanaged data has an address, represented by a pointer
Pointers are typed, but can be cast to another type at compile time
Pointers are a numeric type and support arithmetic operations
Casting allows one to emulate more robust type systems (awkardly)
A struct is a data structure with typed fields and static layout
The address of any field of a struct is known at compile-time
An array is a typed, sequence-like data structure with static layout
The address of any item in an array is computable in constant time
Arrays can contain primitives, pointers, arrays, and structs
Structs can contain primitives, pointers, arrays, and structs
Strings are not a first-class data type; a string is just a pointer to bytes
Since we don't know array size at runtime, strings are painful
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Char | H | e | l | l | o | , | | w | o | r | l | d | ! | |
| Hex | 48 | 65 | 6C | 6C | 6F | 2C | 20 | 77 | 6F | 72 | 6C | 64 | 21 | 00 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
val str:CString = c"hello, world"
val str_len = strlen(str)
printf(c"the string '%s' at %p is %d bytes long\n", str, str, str_len)
printf(c"the CString value 'str' is %d bytes long\n", sizeof[CString])
for (offset <- 0L to str_len) {
val chr:CChar = str(offset)
printf(c"'%c' is %d bytes long and has binary value %d\n",
chr, sizeof[CChar], chr)
}
val str = c"hello, world"
val str_len = strlen(str)
printf(c"the string '%s' at %p is %d bytes long\n", str, str, str_len)
printf(c"the value 'str' itself is %d bytes long\n", sizeof[CString])
for (offset <- 0L to str_len) {
val chr_addr = str + offset // pointer address arithmetic
val chr = !chr_addr // pointer address dereference
stdio.printf(c"'%c'\t(%d) at address %p is %d bytes long\n",
chr, chr, chr_addr, sizeof[CChar])
}
dereference (!)
A CString is a Ptr[Byte], so we can re-implement array lookup with two basic pointer operators:
the string 'hello, world' at address 0x5653b7aa0974 is 12 bytes long
the Ptr[Byte] value 'str' itself is 8 bytes long
'h' (104) at address 0x5653b7aa0974 is 1 bytes long
'e' (101) at address 0x5653b7aa0975 is 1 bytes long
'l' (108) at address 0x5653b7aa0976 is 1 bytes long
'l' (108) at address 0x5653b7aa0977 is 1 bytes long
'o' (111) at address 0x5653b7aa0978 is 1 bytes long
',' (44) at address 0x5653b7aa0979 is 1 bytes long
' ' (32) at address 0x5653b7aa097a is 1 bytes long
'w' (119) at address 0x5653b7aa097b is 1 bytes long
'o' (111) at address 0x5653b7aa097c is 1 bytes long
'r' (114) at address 0x5653b7aa097d is 1 bytes long
'l' (108) at address 0x5653b7aa097e is 1 bytes long
'd' (100) at address 0x5653b7aa097f is 1 bytes long
'' (0) at address 0x5653b7aa0980 is 1 bytes long
val short_lived_int:Ptr[Int] = stackalloc[Int]
val three_short_lived_ints:Ptr[Int] = stackalloc[Int](3)
val uninitialized_string_buffer:CString = stackalloc[CChar](16)
val short_lived_int:Ptr[Int] = stackalloc[Int]
val three_short_lived_ints:Ptr[Int] = stackalloc[Int](3)
val uninitialized_string_buffer:CString = stackalloc[CChar](16)
val uninitialized_buffer:Ptr[Byte] = malloc(1024)
val three_ints:Ptr[Int] = malloc(3 * sizeof[Int]).cast[Ptr[Int]]
val six_ints:Ptr[Int] = realloc(three_ints.cast[Ptr[Byte], 6 * sizeof[Int])
.cast[Ptr[Int]]
def heapalloc[T](num:Int = 1) = malloc(num * sizeof[T]).cast[Ptr[T]]
@extern object mystdio {
def fgetc(stream: Ptr[FILE]): CInt = extern
def fgets(str: CString, count: CInt, stream: Ptr[FILE]): CString = extern
def fputc(ch: CInt, stream: Ptr[FILE]): CInt = extern
@name("scalanative_libc_stdin")
def stdin: Ptr[FILE] = extern
@name("scalanative_libc_stdout")
def stdout: Ptr[FILE] = extern
}
val buffer = stackalloc[Byte](1024)
val line = mystdio.fgets(buffer, 1023, mystdio.stdin)
def fprintf(stream: Ptr[FILE], format: CString, args: CVararg*): CInt
def fgets(str: CString, count: CInt, stream: Ptr[FILE]): CString
def sscanf(buffer: CString, format: CString, args: CVararg*): CInt
def strcmp(lhs: CString, rhs: CString): CInt
def strncpy(dest: CString, src: CString, count: CSize): CString
Let's be real: the C stdlib's string facilities are badly broken.
We'll be ensuring safety in three ways:
Now we're (finally) ready to write some real programs!
Systems programming is never trivial, but Scala Native presents the fundamental concepts more clearly than C.
Scala Native's compiler is state-of-the-art, but the DSL it provides for C-style memory semantics is more powerful than any compiler.
The techniques we've learned will let us write programs with dramatically different performance characteristics than JVM Scala.
Word counts for the entire Google Books corpus, ~50GB total
This is big enough data to ask some interesting questions:
A'Aang_NOUN 1879 45 5
A'Aang_NOUN 1882 5 4
A'Aang_NOUN 1885 1 1
A'Aang_NOUN 1891 1 1
A'Aang_NOUN 1899 20 4
A'Aang_NOUN 1927 3 1
A'Aang_NOUN 1959 5 2
A'Aang_NOUN 1962 2 2
A'Aang_NOUN 1963 1 1
A'Aang_NOUN 1966 45 13
A'Aang_NOUN 1967 6 4
A'Aang_NOUN 1968 5 4
A'Aang_NOUN 1970 6 2
A'Aang_NOUN 1975 4 1
A'Aang_NOUN 2001 1 1
A'Aang_NOUN 2004 3 1
A'que_ADJ 1808 1 1
A'que_ADJ 1849 2 1
A'que_ADJ 1850 1 1
A'que_ADJ 1852 4 3
var max = 0
var max_word = ""
var max_year = 0
for (line <- scala.io.Source.stdin.getLines) {
val split_fields = line.split("\\s+")
val word = split_fields(0)
val year = split_fields(1)
val count = split_fields(2).toInt
if (count > max) {
max = count
max_word = word
max_year = year
}
}
println(s"max count: ${max_word}, ${max_year}; ${max} occurrences")
val linebuffer = stackalloc[Byte](1024)
val max_count = stackalloc[Int]
val max_word = stackalloc[Byte](1024)
val max_year = stackalloc[Int]
while (fgets(stdin, linebuffer, 1023) != null) }
scan_and_compare(linebuffer, 1023, max_count, max_word, max_year)
}
printf(c"maximum word count: %d %s %d\n", max_count, max_word, max_year)
Our strategy:
def scan_and_compare(buffer:Ptr[Byte], max_count:Ptr[Int],
max_word:Ptr[Byte], max_year:Ptr[Int]):Unit = {
val tmp_count = stackalloc[Int]
val tmp_word = stackalloc[Byte](1024)
val tmp_year = stackalloc[Int]
val tmp_doc_count = stackalloc[Int]
val scan_result = sscanf(buffer, c"%1023s %d %d %d\n",
tmp_word,tmp_count, tmp_year, tmp_doc_count)
if (scan_result != 4) {
throw new Exception("Bad sscanf result")
}
if (!tmp_count > !max_count) {
val word_length = strlen(temp_word)
safer_strncpy(temp_word, max_word, 1024)
!max_count = !temp_count
!max_year = !temp_year
}
}
Word counts for the entire Google Books corpus, ~50GB total
For our next trick:
case class NGram(word:String, count:Int, year:Int, doc_count:Int)
def read_input(input:Source):ArrayBuffer[NGram] = {
val data = ArrayBuffer[NGram]()
var lines_read = 0
for (line <- scala.io.Source.stdin.getLines) {
val split_fields = line.split("\\s+")
val word = split_fields(0)
val year = split_fields(1).toInt
val count = split_fields(2).toInt
val doc_count = split_fields(3).toInt
val new_item = NGram(word, year, count, doc_count)
data += new_item
}
return data
}
def main(args:Array[String]):Unit = {
val data:ArrayBuffer[NGram] = read_input(scala.io.Source.stdin)
val by_count_ascending = Ordering.by { n:NGram => n.count }.reverse
val sorted = data.sorted(by_count_ascending)
val show_count = if (lines_read < 20) lines_read else 20
for (i <- 0 until show_count) {
println(s"${sorted(i).word} ${sorted(i).count}")
}
}
How do we do this in a Native idiom?
This will require more techniques than the prior case:
type StructPoint = CStruct2[Int, Int]
val point = stackalloc[StructPoint]
point._1 = 5
point._2 = 12
+--------+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+--------+----+----+----+----+----+----+----+----+
| Value | 5 | 12 |
+--------+----+----+----+----+----+----+----+----+
| Hex | 05 | 00 | 00 | 00 | 0C | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+
case class NGram(word:String, count:Int, year:Int, doc_count:Int)
type NGramData = CStruct4[CString, Int, Int, Int]
final case class WrappedArray[T](var data:Ptr[T], var used:Int, var capacity:Int)
def makeWrappedArray[T](size:Int):WrappedArray[T] = {
val data = malloc(size * sizeof[T]).cast[Ptr[T]]
return WrappedArray(data, 0, size)
}
def growWrappedArray[T](array:WrappedArray[T], size:Int):Unit = {
val new_capacity = array.capacity + size
val new_size = new_capacity * sizeof[T]
val new_data = realloc(array.data.cast[Ptr[Byte]], new_size)
wa.data = new_data
wa.capacity = new_capacity
}
def qsort(data:Ptr[Byte],
num:Int,
size:Long,
comparator:CFunctionPtr2[Ptr[Byte], Ptr[Byte], Int]):Unit = extern
def sort_alphabetically(a:Ptr[Byte], b:Ptr[Byte]):Int = {
val a_string_pointer = a.cast[Ptr[CString]]
val b_string_pointer = b.cast[Ptr[CString]]
return string.strcmp(!a_string_pointer, !b_string_pointer)
}
def sort_by_count(p1:Ptr[Byte], p2:Ptr[Byte]):Int = {
val ngram_ptr_1 = p1.cast[Ptr[NGramData]]
val ngram_ptr_2 = p2.cast[Ptr[NGramData]]
val count_1 = !ngram_ptr_1._2
val count_2 = !ngram_ptr_2._2
return count_2 - count_1
}
val block_size = 65536 * 16 // ~ 1 million items - too big?
val linebuffer = stackalloc[Byte](1024)
var array = makeWrappedArray[NGramData](block_size)
while (stdio.fgets(line_buffer, 1023, stdin) != null) {
if (array.used == array.capacity) {
growWrappedArray(array, block_size)
}
parseLine(line_buffer, array.data + array.used)
array.used += 1
}
qsort.qsort(array.data.cast[Ptr[Byte]], array.used,
sizeof[NGramData], by_count)
val to_show = if (array.used <= 20) array.used else 20
for (i <- 0 until to_show) {
stdio.printf(c"word n: %s %d\n", !(array.data + i)._1, !(array.data + i)._2)
}
def parseLine(line_buffer:Ptr[Byte], data:Ptr[NGramData]):Unit = {
val word = data._1
val count = data._2
val year = data._3
val doc_count = data._4
val sscanf_result = stdio.sscanf(line_buffer, c"%ms %d %d %d\n", word, year, count, doc_count)
if (sscanf_result < 4) {
throw new Exception("input error")
}
}
Word counts for the entire Google Books corpus, ~50GB total
Final use case:
def read_input(input:Source):ArrayBuffer[NGram] = {
val data = ArrayBuffer[NGram]()
var prev_word = ""
for (line <- scala.io.Source.stdin.getLines) {
val split_fields = line.split("\\s+")
// ... check for errors
val word = split_fields(0)
val year = split_fields(1).toInt
val count = split_fields(2).toInt
val doc_count = split_fields(3).toInt
if (word == prev_word) {
data.last.count += count
} else {
val new_item = NGram(word, year, count, doc_count)
data += new_item
prev_word = word
}
}
return data
}
Our strategy:
What we'll change:
var prev_item:Ptr[NGramData] = null
while (stdio.fgets(line_buffer, 1023, stdin) != null) {
if (array.used == array.capacity) {
growWrappedArray(array, block_size)
}
val is_new_word = parseLine(line_buffer, array.data + array.used, prev_item)
if (is_new_word) {
prev_item = array.data + array.used
array.used += 1
}
}
def parseLine(line_buffer:CString, prev_item:Ptr[NGramData],
current_item:Ptr[NGramData]):Boolean = {
val temp_word = stackalloc[Byte](1024)
val temp_count = current_item._2
val temp_year = current_item._3
val temp_doc_count = current_item._4
sscanf(line_buffer, c"%1023s %d %d %d\n", temp_word, temp_year, temp_count, temp_doc_count)
val new_word_length = strlen(temp_word)
if (prev_item == null) {
val new_word_buffer = malloc(new_word_length + 1)
safer_strncpy(temp_word, new_word_buffer, 1023)
!current_item._1 = new_word_buffer
return true
}
else if (strcmp(temp_word, !prev_item._1) == 0) {
!current_item._2 = !current_item._2 + !temp_count
return false
} else {
val new_word_buffer = malloc(new_word_length + 1)
safer_strncpy(temp_word, new_word_buffer, 1023)
current_item._1 = new_word_buffer
return true
}
}
I hope that I've demonstrated that:
If you accept this, it raises the question:
where could it be appropriate to use Scala Native?
Caveats:
However:
I suggest that two of the highest profile applications of Scala fall into this big-heap/heavy-io domain:
Existing Scala projects in these areas are outstanding software. But there are hardware changes on the horizon.
This will break every assumption about storage, memory, and cache hierarchy made by legacy systems.
To the extent that a hard break with the past is necessary, Scala Native is not a step back, but a step forward.