Working with Bytes

Binary Serialization & Elm

Core Idea

Sacrifice readability for compactness and speed

Bytes 101

  • A bit is a 0 or 1
  • A byte is 8 consecutive bits e.g. 01101100

Conversion between decimal and binary

Bytes 101

JSON "2019" 4 bytes
bytes 0000 0111 1110 0011 2 bytes

Bytes are more compact

And faster to decode

JSON "2019" parse 4 digits; arithmetic
bytes 0000 0111 1110 0011 just put it in memory

API Overview

decodeVec3 : Decoder Vec3
decodeVec3 =
    Decode.succeed vec3
        |> andMap (Decode.float32 LE)
        |> andMap (Decode.float32 LE)
        |> andMap (Decode.float32 LE)

API Overview

type Bytes

Encode.unsignedInt8 : Int -> Encoder
Encode.string : String -> Encoder

Decode.unsignedInt8 : Decoder Int
Decode.string : Int -> Decoder String

Decode.decode : Bytes -> Decoder a -> Maybe a

Compaction by

extracting structure

{ 
    "title": "foo",
    "subject": "spam"
}

Compaction by

extracting structure

 3  f o o  4  s p a m 
 03 666f6f 04 7370616d
Field # of bytes type
titleLength 1 uint8
title variable string
subjectLength 1 uint8
subject variable string

An API Response

type alias Item =
    { title : String
    , link : String
    , media : String
    , dateTaken : Int
    , description : String
    , published : Int
    , author : String
    , authorId : String
    , tags : List String
    }

Compaction by

extracting structure

raw bytes zipped bytes
JSON 864 437
Bytes 732 363
15% less 17% less

But, decoding is slower

List of 100 floats

List of 1000 floats

Results

Compaction and speed depend heavily on the specific data

consider the maintenance cost of binary serialization

schema technologies (like protobuf) still run into these issues

Case 2: Base64

data:text/plain;base64,SGVsbG8sIFdvcmxkIQ%3D%3D

A binary-to-text conversion method

used for

  • inlining small files in stylesheets
  • creating images in elm

Case 2: Base64

Compaction through cleverness

6 bits are enough to store 64 distinct characters

Compaction through cleverness

but we can only write/read whole bytes (8 bits)

we will not waste 2 bits per character!

Encode.unsignedInt8 : Int -> Encoder

Compaction through cleverness

solution: store 4 digits in 3 bytes 

E 4 00 0100
V 21 01 0101
A 0 00 0000
N 13 00 1101

Compaction through cleverness

use bit shifts to line up

E 00000000 00000000 00000100
V 00000000 00000101 01000000
A 00000000 00000000 00000000
N 00110100 00000000 00000000

then bitwise or to combine

00110100 00000101 01000100

Efficiency by Benchmark

You are responsible for performance

why I like bytes

learning while writing meaningful code

and you should too

  • Bytes enable new things
  • A nice way to learn some fundamental CS
  • Lots of low-hanging fruit

Thank You

Folkert de Vries

@folkertdev