Understanding compressions

GZIP & Brotli

Yatharth Khatri

Design Systems and Frontend Architect

Classical Pianist

GitHub:    yatharthk

Twitter:    yatharthkhatri

TAKE-AWAYS:

  • Understand the GZip magic
  • Understand the data compression algorithms at work
  • What's Brotli and how's it more powerful than Gzip?
  • Should we stop using GZip and use Brotli instead for our web-apps?

Why to understand the compression? 🙄

Always nice to understand the under-the-hood concepts of tools and tech that we use every day

You cannot build better, if you don't know the already built.

"You cannot understand everything but you should always try to understand the system."

 

- Ryan Dahl, Creator of NodeJS

What is GZip?

GZip is a loss-less data-compression tool.

(and it's not new)

A Rough Timeline

WEB 2.0

What is GZip made up of?

Uses an algorithm called "DEFLATE"

LZ77

(Invented by Lempel and Ziv in 1977)

Huffman Coding

(Invented by David Huffman in 1950s)

Updated Timeline

WEB 2.0

How GZip/Deflate works?

function add(number1, number2) {
  return number1 + number2;
}

function subtract(number1, number2) {
  return number1 - number2;
}

function multiply(number1, number2) {
  return number1 * number2;
}

function divide(number1, number2) {
  return number1 / number2;
}

export default {
  add,
  subtract,
  multiply,
  divide
};

LZ77

Huffman codes

GZip Code

function add(number1, number2) {
  return number1 + number2;
}

function subtract(number1, number2) {
  return number1 - number2;
}

function multiply(number1, number2) {
  return number1 * number2;
}

function divide(number1, number2) {
  return number1 / number2;
}

export default {
  add,
  subtract,
  multiply,
  divide
};

LZ77

Huffman codes

Server

Client (eg browser)

GZIP

LZ77 Algorithm

Mr. Buffer

Mr. Sliding Window

LZ77 Algorithm

Mr. Sliding Window

Smart. And does all logical, heavy stuff.

LZ77 Algorithm

Mr. Sliding Window

32 KB

32kb capacity box

Bag for backup

Task: Reduce the text as much as possible without loss of data

Came up with a solution and asked for:
 

  1. A 32kB capacity box
  2. A bag for putting up text into if it exceeds 32kB
  3. An assistant

LZ77 Algorithm

Mr. Sliding Window

32 KB

32kb capacity box

Bag for backup

The solution:

  1. When receive anything new, check in box if it's already there.
  2. If not, put it into box.
  3. If it's there, just give a back-reference in form
    <offset from current,     how much to copy>
  4. If the box gets filled, start putting the text into the bag, starting from oldest.

LZ77 Algorithm

Mr. Buffer

Mr. Sliding Window

Can only read and pass...

LZ77 Algorithm

"ABCABC"

(text to be compressed)

LZ77 Algorithm

Mr. Buffer

Mr. Sliding Window

ABCABC

A

B

A

B

C

C

A

C

B

A

B

C

<3, 3>

<3, 3>  = Go 3 chars back and copy 3 chars

LZ77 Algorithm

Did you notice the limitation?

If the char or phrase does not appear in the last 32kB of data stored, it cannot be back-referenced. 

Huffman Codes

(Also called variable length encoding.)

Huffman Coding reduces the regular byte size of your code.

For ex.

AAABCAD

(7B or 56bits)

AAABCAD

(~50 bits)

Huffman Codes

Let's do simple math

AAABCAD

(7B or 56bits)

= 7 chars = 7 * 8 bits = 56bits

We have fixed byte size in computing (8 bits for each char)

Give shortest possible bit size to most frequently appeared characters and longest to the least frequently appreared

Huffman Codes

AAABCAD

A

B

C

D

0

10

11

111

Dictionary

= 1 +1+1+ 2 + 2+1+ 3 = 11bits

But we need to send dictionary as well, 😯

which is 41 bits

Total = 52 bits

< 56 bits

Let's see it in action 🚀

function add(number1, number2) {
  return number1 + number2;
}

function subtract(number1, number2) {
  return number1 - number2;
}

function multiply(number1, number2) {
  return number1 * number2;
}

function divide(number1, number2) {
  return number1 / number2;
}

export default {
  add,
  subtract,
  multiply,
  divide
};

RAW - 343B

Let's see it in action 🚀

function add(t,u){return t+u}function subtract(t,u){return t-u}function multiply(t,u){return t*u}function divide(t,u){return t/u}export default{add:add,subtract:subtract,multiply:multiply,divide:divide};

Minified - 204B

function add(t,u){return t+u}` 4$subtract` 4)-` 7&multiply` V)*` Y&divide` v)/u}export default{add:add,`!*#:`!3#,` z#:`!##,` l!:` s!};

After LZ77 (Using JS implementation of LZ77) - 134B

����]math.js.lz77�A
�0��L�S#.��Bw�1-�L����t��[<���@aJ`�;�f�gO�Ѕ����Z]�͇#�9n/�"�kpaU7.�;O������^��N�h�Aq��wH�� ,�t��U
�

After Huffman Coding (Complete GZip) - 152B

Let's see it in action 🚀

GZip (in binary) - 152B

What is Brotli?

  • Loss-less data compression tool, based on LZ77 and Huffman C., created by Google.
  • Came out in 2013 for offline compression of heavy font files.
  • Was re-released in September 2015 with:
    1. Improved compression ratio
    2. Sped up encoder and decoder
    3. improved streaming API
    4. Extra compression levels added

What makes Brotli outperform GZip?

  • Static Dictionary
    Unlike in Gzip, where you need to send entire huffman dictionary, here you already have a dictionary you can refer to.
    120kB and 13,000 commons phrases, words and context from HTML documents.
     
  • Bigger and Flexible Sliding Window
    Can be as large as 16MB.
     
  • Order-2 Context Modeling.

Should we upgrade to Brotli now or wait?

  • Brotli is ready to use-on-production.
  • It is great option for static assets. Like your fonts, scripts, styles.
  • It offers upto 11 levels of compression and at level 5 it's better than GZip compression level 9.
  • After level 5, Brotli starts using it's advanced "2-Order Context Modeling" feature for compressions and that consumes high amounts of memory.
  • If you do compression on-the-fly, or dynamic compression, Brotli isn't for you.

I hope you did learn few good things today

Understanding the compressions: GZIP and Brotli

By Yatharth K

Understanding the compressions: GZIP and Brotli

  • 275