Richard Whaling
M1 Finance
Scala Days Europe 2018
(or how to get things done without the JVM)
Twitter: @RichardWhaling
Scala Native contributor, but speaking only for myself
Author of "Modern Systems Programming in Scala", coming soon from Pragmatic
Software Engineer at M1 Finance
object Hello {
def main(args: Array[String]):Unit = {
println("Hello, Scala Days!")
}
}
This just works!
import scalanative.native._, stdio._
object Hello {
def main(args: Array[String]):Unit = {
printf(c"Hello, Scala Days!\n")
}
}
This just works!
import scalanative.native._, stdio._
object Hello {
def main(args: Array[String]):Unit = {
val who:CString = c"Scala Days"
stdio.printf(c"Hello, %s!\n", who)
}
}
it really is the glibc printf()
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Char | H | e | l | l | o | , | | w | o | r | l | d | ! | |
| Hex | 48 | 65 | 6C | 6C | 6F | 2C | 20 | 77 | 6F | 72 | 6C | 64 | 21 | 00 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
val str:CString = c"hello, world"
val str_len = strlen(str)
printf(c"the string '%s' at %p is %d bytes long\n", str, str, str_len)
printf(c"the CString value 'str' is %d bytes long\n", sizeof[CString])
for (offset <- 0L to str_len) {
val chr:CChar = str(offset)
printf(c"'%c' is %d bytes long and has binary value %d\n",
chr, sizeof[CChar], chr)
}
/project/path> ./target/scala-2.11/cstring_experiment_1-out
the string 'hello, world' at address 0x55e525a2c944 is 12 bytes long
the CString value 'str' itself is 8 bytes long
'h' is 1 bytes long and has binary value 104
'e' is 1 bytes long and has binary value 101
'l' is 1 bytes long and has binary value 108
'l' is 1 bytes long and has binary value 108
'o' is 1 bytes long and has binary value 111
',' is 1 bytes long and has binary value 44
' ' is 1 bytes long and has binary value 32
'w' is 1 bytes long and has binary value 119
'o' is 1 bytes long and has binary value 111
'r' is 1 bytes long and has binary value 114
'l' is 1 bytes long and has binary value 108
'd' is 1 bytes long and has binary value 100
'' is 1 bytes long and has binary value 0
As in C, pointers (addresses) are first-class values.
The value of a CString is its address, not its content
val str = c"hello, world"
val str_len = strlen(str)
printf(c"the string '%s' at %p is %d bytes long\n", str, str, str_len)
printf(c"the value 'str' itself is %d bytes long\n", sizeof[CString])
for (offset <- 0L to str_len) {
val chr_addr = str + offset // pointer address arithmetic
val chr = !chr_addr // pointer address dereference
stdio.printf(c"'%c'\t(%d) at address %p is %d bytes long\n",
chr, chr, chr_addr, sizeof[CChar])
}
dereference (!)
A CString is a Ptr[Byte], so we can re-implement array lookup with two basic pointer operators:
the string 'hello, world' at address 0x5653b7aa0974 is 12 bytes long
the Ptr[Byte] value 'str' itself is 8 bytes long
'h' (104) at address 0x5653b7aa0974 is 1 bytes long
'e' (101) at address 0x5653b7aa0975 is 1 bytes long
'l' (108) at address 0x5653b7aa0976 is 1 bytes long
'l' (108) at address 0x5653b7aa0977 is 1 bytes long
'o' (111) at address 0x5653b7aa0978 is 1 bytes long
',' (44) at address 0x5653b7aa0979 is 1 bytes long
' ' (32) at address 0x5653b7aa097a is 1 bytes long
'w' (119) at address 0x5653b7aa097b is 1 bytes long
'o' (111) at address 0x5653b7aa097c is 1 bytes long
'r' (114) at address 0x5653b7aa097d is 1 bytes long
'l' (108) at address 0x5653b7aa097e is 1 bytes long
'd' (100) at address 0x5653b7aa097f is 1 bytes long
'' (0) at address 0x5653b7aa0980 is 1 bytes long
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| Char | H | e | l | l | o | , | | w | o | r | l | d | ! | |
| Hex | 48 | 65 | 6C | 6C | 6F | 2C | 20 | 77 | 6F | 72 | 6C | 64 | 21 | 00 |
+--------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
val short_lived_int:Ptr[Int] = stackalloc[Int]
val three_short_lived_ints:Ptr[Int] = stackalloc[Int](3)
val uninitialized_string_buffer:CString = stackalloc[CChar](16)
@extern object mystdio {
def fgetc(stream: Ptr[FILE]): CInt = extern
def fgets(str: CString, count: CInt, stream: Ptr[FILE]): CString = extern
def fputc(ch: CInt, stream: Ptr[FILE]): CInt = extern
@name("scalanative_libc_stdin")
def stdin: Ptr[FILE] = extern
@name("scalanative_libc_stdout")
def stdout: Ptr[FILE] = extern
}
val buffer = stackalloc[Byte](1024)
val line = mystdio.fgets(buffer, 1023, mystdio.stdin)
def fprintf(stream: Ptr[FILE], format: CString, args: CVararg*): CInt
def fgets(str: CString, count: CInt, stream: Ptr[FILE]): CString
def sscanf(buffer: CString, format: CString, args: CVararg*): CInt
def strcmp(lhs: CString, rhs: CString): CInt
def strncpy(dest: CString, src: CString, count: CSize): CString
Let's be real: the C stdlib's string facilities are badly broken.
We'll be ensuring safety in three ways:
def strncpy(dest: CString, src: CString, count: CSize): CString
def safer_strncpy(src:Ptr[Byte], dest:Ptr[Byte], dest_size:Ptr[Byte]):Int = {
val src_size = strlen(src)
strncpy(dest, src, src_size)
if (src_size >= dest_size) {
dest(dest_size - 1) = 0
return dest_size - 1
} else {
return src_size
}
}
Word counts for the entire Google Books corpus, ~50GB total
This is big enough data to ask some interesting questions:
A'Aang_NOUN 1879 45 5
A'Aang_NOUN 1882 5 4
A'Aang_NOUN 1885 1 1
A'Aang_NOUN 1891 1 1
A'Aang_NOUN 1899 20 4
A'Aang_NOUN 1927 3 1
A'Aang_NOUN 1959 5 2
A'Aang_NOUN 1962 2 2
A'Aang_NOUN 1963 1 1
A'Aang_NOUN 1966 45 13
A'Aang_NOUN 1967 6 4
A'Aang_NOUN 1968 5 4
A'Aang_NOUN 1970 6 2
A'Aang_NOUN 1975 4 1
A'Aang_NOUN 2001 1 1
A'Aang_NOUN 2004 3 1
A'que_ADJ 1808 1 1
A'que_ADJ 1849 2 1
A'que_ADJ 1850 1 1
A'que_ADJ 1852 4 3
var max = 0
var max_word = ""
var max_year = 0
for (line <- scala.io.Source.stdin.getLines) {
val split_fields = line.split("\\s+")
val word = split_fields(0)
val year = split_fields(1)
val count = split_fields(2).toInt
if (count > max) {
max = count
max_word = word
max_year = year
}
}
println(s"max count: ${max_word}, ${max_year}; ${max} occurrences")
val linebuffer = stackalloc[Byte](1024)
val max_count = stackalloc[Int]
val max_word = stackalloc[Byte](1024)
val max_year = stackalloc[Int]
while (fgets(stdin, linebuffer, 1023) != null) }
scan_and_compare(linebuffer, 1023, max_count, max_word, max_year)
}
printf(c"maximum word count: %d %s %d\n", max_count, max_word, max_year)
Our strategy:
def scan_and_compare(buffer:Ptr[Byte], max_count:Ptr[Int],
max_word:Ptr[Byte], max_year:Ptr[Int]):Unit = {
val tmp_count = stackalloc[Int]
val tmp_word = stackalloc[Byte](1024)
val tmp_year = stackalloc[Int]
val tmp_doc_count = stackalloc[Int]
val scan_result = sscanf(buffer, c"%1023s %d %d %d\n",
tmp_word,tmp_count, tmp_year, tmp_doc_count)
if (scan_result != 4) {
throw new Exception("Bad sscanf result")
}
if (!tmp_count > !max_count) {
val word_length = strlen(temp_word)
safer_strncpy(temp_word, max_word, 1024)
!max_count = !temp_count
!max_year = !temp_year
}
}
Word counts for the entire Google Books corpus, ~50GB total
For our next trick:
case class NGram(word:String, count:Int, year:Int, doc_count:Int)
def read_input(input:Source):ArrayBuffer[NGram] = {
val data = ArrayBuffer[NGram]()
var lines_read = 0
for (line <- scala.io.Source.stdin.getLines) {
val split_fields = line.split("\\s+")
val word = split_fields(0)
val year = split_fields(1).toInt
val count = split_fields(2).toInt
val doc_count = split_fields(3).toInt
val new_item = NGram(word, year, count, doc_count)
data += new_item
}
return data
}
def main(args:Array[String]):Unit = {
val data:ArrayBuffer[NGram] = read_input(scala.io.Source.stdin)
val by_count_ascending = Ordering.by { n:NGram => n.count }.reverse
val sorted = data.sorted(by_count_ascending)
val show_count = if (lines_read < 20) lines_read else 20
for (i <- 0 until show_count) {
println(s"${sorted(i).word} ${sorted(i).count}")
}
}
How do we do this in a Native idiom?
This will require some new techniques and new syscalls
type StructPoint = CStruct2[Int, Int]
val point = stackalloc[StructPoint]
point._1 = 5
point._2 = 12
+--------+----+----+----+----+----+----+----+----+
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+--------+----+----+----+----+----+----+----+----+
| Value | 5 | 12 |
+--------+----+----+----+----+----+----+----+----+
| Hex | 05 | 00 | 00 | 00 | 0C | 00 | 00 | 00 |
+--------+----+----+----+----+----+----+----+----+
case class NGram(word:String, count:Int, year:Int, doc_count:Int)
type NGramData = CStruct4[CString, Int, Int, Int]
val short_lived_int:Ptr[Int] = stackalloc[Int]
val three_short_lived_ints:Ptr[Int] = stackalloc[Int](3)
val uninitialized_string_buffer:CString = stackalloc[CChar](16)
val uninitialized_buffer:Ptr[Byte] = malloc(1024)
val three_ints:Ptr[Int] = malloc(3 * sizeof[Int]).cast[Ptr[Int]]
val six_ints:Ptr[Int] = realloc(three_ints.cast[Ptr[Byte], 6 * sizeof[Int])
.cast[Ptr[Int]]
final case class WrappedArray[T](var data:Ptr[T], var used:Int, var capacity:Int)
def makeWrappedArray[T](size:Int):WrappedArray[T] = {
val data = malloc(size * sizeof[T]).cast[Ptr[T]]
return WrappedArray(data, 0, size)
}
def growWrappedArray[T](array:WrappedArray[T], size:Int):Unit = {
val new_capacity = array.capacity + size
val new_size = new_capacity * sizeof[T]
val new_data = realloc(array.data.cast[Ptr[Byte]], new_size)
wa.data = new_data
wa.capacity = new_capacity
}
def qsort(data:Ptr[Byte],
num:Int,
size:Long,
comparator:CFunctionPtr2[Ptr[Byte], Ptr[Byte], Int]):Unit = extern
def sort_alphabetically(a:Ptr[Byte], b:Ptr[Byte]):Int = {
val a_string_pointer = a.cast[Ptr[CString]]
val b_string_pointer = b.cast[Ptr[CString]]
return string.strcmp(!a_string_pointer, !b_string_pointer)
}
def sort_by_count(p1:Ptr[Byte], p2:Ptr[Byte]):Int = {
val ngram_ptr_1 = p1.cast[Ptr[NGramData]]
val ngram_ptr_2 = p2.cast[Ptr[NGramData]]
val count_1 = !ngram_ptr_1._2
val count_2 = !ngram_ptr_2._2
return count_2 - count_1
}
val block_size = 65536 * 16 // ~ 1 million items - too big?
val linebuffer = stackalloc[Byte](1024)
var array = makeWrappedArray[NGramData](block_size)
while (stdio.fgets(line_buffer, 1023, stdin) != null) {
if (array.used == array.capacity) {
growWrappedArray(array, block_size)
}
parseLine(line_buffer, array.data + array.used)
array.used += 1
}
qsort.qsort(array.data.cast[Ptr[Byte]], array.used,
sizeof[NGramData], by_count)
val to_show = if (array.used <= 20) array.used else 20
for (i <- 0 until to_show) {
stdio.printf(c"word n: %s %d\n", !(array.data + i)._1, !(array.data + i)._2)
}
def parseLine(line_buffer:Ptr[Byte], data:Ptr[NGramData]):Unit = {
val word = data._1
val count = data._2
val year = data._3
val doc_count = data._4
val sscanf_result = stdio.sscanf(line_buffer, c"%ms %d %d %d\n", word, year, count, doc_count)
if (sscanf_result < 4) {
throw new Exception("input error")
}
}
Word counts for the entire Google Books corpus, ~50GB total
Final use case:
def read_input(input:Source):ArrayBuffer[NGram] = {
val data = ArrayBuffer[NGram]()
var prev_word = ""
for (line <- scala.io.Source.stdin.getLines) {
val split_fields = line.split("\\s+")
// ... check for errors
val word = split_fields(0)
val year = split_fields(1).toInt
val count = split_fields(2).toInt
val doc_count = split_fields(3).toInt
if (word == prev_word) {
data.last.count += count
} else {
val new_item = NGram(word, year, count, doc_count)
data += new_item
prev_word = word
}
}
return data
}
Our strategy:
What we'll change:
var prev_item:Ptr[NGramData] = null
while (stdio.fgets(line_buffer, 1023, stdin) != null) {
if (array.used == array.capacity) {
growWrappedArray(array, block_size)
}
val is_new_word = parseLine(line_buffer, array.data + array.used, prev_item)
if (is_new_word) {
prev_item = array.data + array.used
array.used += 1
}
}
def parseLine(line_buffer:CString, prev_item:Ptr[NGramData],
current_item:Ptr[NGramData]):Boolean = {
val temp_word = stackalloc[Byte](1024)
val temp_count = current_item._2
val temp_year = current_item._3
val temp_doc_count = current_item._4
sscanf(line_buffer, c"%1023s %d %d %d\n", temp_word, temp_year, temp_count, temp_doc_count)
val new_word_length = strlen(temp_word)
if (prev_item == null) {
val new_word_buffer = malloc(new_word_length + 1)
safer_strncpy(temp_word, new_word_buffer, 1023)
!current_item._1 = new_word_buffer
return true
}
else if (strcmp(temp_word, !prev_item._1) == 0) {
!current_item._2 = !current_item._2 + !temp_count
return false
} else {
val new_word_buffer = malloc(new_word_length + 1)
safer_strncpy(temp_word, new_word_buffer, 1023)
current_item._1 = new_word_buffer
return true
}
}
I hope that I've demonstrated that:
If you accept this, it raises the question:
where could it be appropriate to use Scala Native?
I suggest that two of the highest profile applications of Scala fall into this big-heap/heavy-io domain:
Incumbent Scala projects in these areas are outstanding software. But there are hardware changes on the horizon.
Caveats:
However:
This will break every assumption about the storage/memory/cache hierarchy made by legacy systems, including Java, including Rust
To the extent that a hard break with the past is necessary, Scala Native is not a step back, but a step forward.
Thank You!