Working with Bytes

Binary Serialization & Elm

Core Idea

Sacrifice readability for compactness and speed

Bytes 101

A bit is a 0 or 1
A byte is 8 consecutive bits e.g. 01101100

Conversion between decimal and binary

Bytes 101

JSON	"2019"	4 bytes
bytes	0000 0111 1110 0011	2 bytes

Bytes are more compact

And faster to decode

JSON	"2019"	parse 4 digits; arithmetic
bytes	0000 0111 1110 0011	just put it in memory

API Overview

decodeVec3 : Decoder Vec3
decodeVec3 =
    Decode.succeed vec3
        |> andMap (Decode.float32 LE)
        |> andMap (Decode.float32 LE)
        |> andMap (Decode.float32 LE)

API Overview

type Bytes

Encode.unsignedInt8 : Int -> Encoder
Encode.string : String -> Encoder

Decode.unsignedInt8 : Decoder Int
Decode.string : Int -> Decoder String

Decode.decode : Bytes -> Decoder a -> Maybe a

Compaction by

extracting structure

{ 
    "title": "foo",
    "subject": "spam"
}

Compaction by

extracting structure

 3  f o o  4  s p a m 
 03 666f6f 04 7370616d

Field	# of bytes	type
titleLength	1	uint8
title	variable	string
subjectLength	1	uint8
subject	variable	string

An API Response

type alias Item =
    { title : String
    , link : String
    , media : String
    , dateTaken : Int
    , description : String
    , published : Int
    , author : String
    , authorId : String
    , tags : List String
    }

Compaction by

extracting structure

	raw bytes	zipped bytes
JSON	864	437
Bytes	732	363
	15% less	17% less

But, decoding is slower

List of 100 floats

List of 1000 floats

Results

Compaction and speed depend heavily on the specific data

consider the maintenance cost of binary serialization

schema technologies (like protobuf) still run into these issues

Case 2: Base64

data:text/plain;base64,SGVsbG8sIFdvcmxkIQ%3D%3D

A binary-to-text conversion method

used for

inlining small files in stylesheets
creating images in elm

Case 2: Base64

Compaction through cleverness

6 bits are enough to store 64 distinct characters

Compaction through cleverness

but we can only write/read whole bytes (8 bits)

we will not waste 2 bits per character!

Encode.unsignedInt8 : Int -> Encoder

Compaction through cleverness

solution: store 4 digits in 3 bytes

E	4	00 0100
V	21	01 0101
A	0	00 0000
N	13	00 1101

Compaction through cleverness

use bit shifts to line up

E		00000000 00000000 00000100
V		00000000 00000101 01000000
A		00000000 00000000 00000000
N		00110100 00000000 00000000

then bitwise or to combine

00110100 00000101 01000100

Efficiency by Benchmark

You are responsible for performance

why I like bytes

learning while writing meaningful code

and you should too

Bytes enable new things

A nice way to learn some fundamental CS

Lots of low-hanging fruit

Thank You

Folkert de Vries

@folkertdev