Working with Bytes

Binary Serialization & Elm

  • Introduction to binary serialization
  • Bytes vs Json
  • Decoding binary file formats

A byte is 8 bits of information

0\ \cdot 2^0\\ 1\ \cdot 2^1\\ 0\ \cdot 2^2\\ 1\ \cdot 2^3\\ 0\ \cdot 2^4\\ 1\ \cdot 2^5\\ 0\ \cdot 2^6\\ 0\ \cdot 2^7\\
= 2 + 8 + 32 = 42

42 = 10 \cdot 1 + 2 \cdot 16 = 2A

Bytes vs Json

Why Bytes?

Send fewer bytes over the wire?

Take the number `2019`

Format Representation Size
Json 2019 4 bytes
Bytes 00000111 11100011 2 bytes

50% smaller, the gain is larger for larger (i.e. longer) numbers

    "title": "foo",
    "subject": "spam"

json stores a lot of structure: key names, {}, "", []

 3  f o o  4  s p a m 
 33 666f6f 34 7370616d
Field # of bytes type
titleLength 1 uint8
title variable string
subjectLength 1 uint8
subject variable string

The Bytes type is a sequence of bytes

Bytes is to binary serialization what String is to json decoders (and parsers in general)

The api looks a lot like json

primitives for

  • integers
  • floats
  • string
  • Bytes

and combinators like map/map2/andThen

So binary serialization is more compact, we should use it right?

type Posix = Posix Int
type alias Item =
    { title : String
    , link : String
    , media : String
    , dateTaken : Posix
    , description : String
    , published : Posix
    , author : String
    , authorId : String
    , tags : List String

Json is 2 times faster!

  • number of bytes is similar
  • utf-8 decoding
  • string slicing

type alias Vec3 =
    { x : Float, y : Float, z : Float }

type alias Triangle =
    { normal : Vec3, p1 : Vec3, p2 : Vec3, p3 : Vec3 }

Here bytes are much faster

  • parsing performance

  • Performance gain is in decoding speed, not number of bytes sent
  • Bytes are faster for numbers
  • Json is faster for strings

Decoding Binary Files

most file types are binary encoded. We can now use them from elm


examples include zip, tar, png, mp3, otf

I decode font files because 


font files are segmented into tables

tables are still lists of fields, sizes, types


header table stores info about tables

we have table A and B

table A contains a variable length array, but


table B contains the length of that array


circular dependency

similar but not the same. With json

  • we decode in one pass
  • we decode everything

solution: decode in 2 passes

Dict String Bytes

create decoders for individual tables

data dependencies are decoder arguments


Bytes create many new possibilities


  • efficiently load numerical data from the backend
  • read, vizualize and manipulate new types of data stored in binary files

By folkert de vries

