Working with Bytes

Binary Serialization & Elm

Working with Bytes

  • Introduction to binary serialization
  • Bytes vs Json
  • Decoding binary file formats

Working with Bytes

A byte is 8 bits of information

0\ \cdot 2^0\\ 1\ \cdot 2^1\\ 0\ \cdot 2^2\\ 1\ \cdot 2^3\\ 0\ \cdot 2^4\\ 1\ \cdot 2^5\\ 0\ \cdot 2^6\\ 0\ \cdot 2^7\\
= 2 + 8 + 32 = 42

Working with Bytes

A byte is 8 bits of information

42 = 10 \cdot 1 + 2 \cdot 16 = 2A

Bytes vs Json

Why Bytes?

Send fewer bytes over the wire?

Bytes vs Json

Take the number `2019`

Format Representation Size
Json 2019 4 bytes
Bytes 00000111 11100011 2 bytes

50% smaller, the gain is larger for larger (i.e. longer) numbers

Bytes vs Json

{ 
    "title": "foo",
    "subject": "spam"
}

json stores a lot of structure: key names, {}, "", []

Bytes vs Json

 3  f o o  4  s p a m 
 33 666f6f 34 7370616d
Field # of bytes type
titleLength 1 uint8
title variable string
subjectLength 1 uint8
subject variable string

Working with Bytes

The Bytes type is a sequence of bytes

Bytes is to binary serialization what String is to json decoders (and parsers in general)

Working with Bytes

The api looks a lot like json

primitives for

  • integers
  • floats
  • string
  • Bytes

and combinators like map/map2/andThen

Bytes vs Json

So binary serialization is more compact, we should use it right?

Bytes vs Json

type Posix = Posix Int
type alias Item =
    { title : String
    , link : String
    , media : String
    , dateTaken : Posix
    , description : String
    , published : Posix
    , author : String
    , authorId : String
    , tags : List String
    }

Bytes vs Json

Json is 2 times faster!

  • number of bytes is similar
  • utf-8 decoding
  • string slicing

Bytes vs Json

type alias Vec3 =
    { x : Float, y : Float, z : Float }


type alias Triangle =
    { normal : Vec3, p1 : Vec3, p2 : Vec3, p3 : Vec3 }

Here bytes are much faster

  • parsing performance

Bytes vs Json

  • Performance gain is in decoding speed, not number of bytes sent
  • Bytes are faster for numbers
  • Json is faster for strings

Bytes vs Json

Decoding Binary Files

most file types are binary encoded. We can now use them from elm

 

examples include zip, tar, png, mp3, otf

Decoding Binary Files

I decode font files because 

 

font files are segmented into tables

tables are still lists of fields, sizes, types

 

header table stores info about tables

Decoding Binary Files

we have table A and B

table A contains a variable length array, but

 

table B contains the length of that array

 

circular dependency

Decoding Binary Files

similar but not the same. With json

  • we decode in one pass
  • we decode everything

Decoding Binary Files

solution: decode in 2 passes

Dict String Bytes

create decoders for individual tables

data dependencies are decoder arguments

Conclusion

Bytes create many new possibilities

 

  • efficiently load numerical data from the backend
  • read, vizualize and manipulate new types of data stored in binary files