Understanding compressions
GZIP & Brotli
Yatharth Khatri
Design Systems and Frontend Architect
Classical Pianist
GitHub: yatharthk
Twitter: yatharthkhatri
TAKE-AWAYS:
- Understand the GZip magic
- Understand the data compression algorithms at work
- What's Brotli and how's it more powerful than Gzip?
- Should we stop using GZip and use Brotli instead for our web-apps?
Why to understand the compression? 🙄
Always nice to understand the under-the-hood concepts of tools and tech that we use every day
You cannot build better, if you don't know the already built.
"You cannot understand everything but you should always try to understand the system."
- Ryan Dahl, Creator of NodeJS
What is GZip?
GZip is a loss-less data-compression tool.
(and it's not new)
A Rough Timeline
WEB 2.0
What is GZip made up of?
Uses an algorithm called "DEFLATE"
LZ77
(Invented by Lempel and Ziv in 1977)
Huffman Coding
(Invented by David Huffman in 1950s)
Updated Timeline
WEB 2.0
How GZip/Deflate works?
function add(number1, number2) {
return number1 + number2;
}
function subtract(number1, number2) {
return number1 - number2;
}
function multiply(number1, number2) {
return number1 * number2;
}
function divide(number1, number2) {
return number1 / number2;
}
export default {
add,
subtract,
multiply,
divide
};
LZ77
Huffman codes
GZip Code
function add(number1, number2) {
return number1 + number2;
}
function subtract(number1, number2) {
return number1 - number2;
}
function multiply(number1, number2) {
return number1 * number2;
}
function divide(number1, number2) {
return number1 / number2;
}
export default {
add,
subtract,
multiply,
divide
};
LZ77
Huffman codes
Server
Client (eg browser)
GZIP
LZ77 Algorithm
Mr. Buffer
Mr. Sliding Window
LZ77 Algorithm
Mr. Sliding Window
Smart. And does all logical, heavy stuff.
LZ77 Algorithm
Mr. Sliding Window
32 KB
32kb capacity box
Bag for backup
Task: Reduce the text as much as possible without loss of data
Came up with a solution and asked for:
- A 32kB capacity box
- A bag for putting up text into if it exceeds 32kB
- An assistant
LZ77 Algorithm
Mr. Sliding Window
32 KB
32kb capacity box
Bag for backup
The solution:
- When receive anything new, check in box if it's already there.
- If not, put it into box.
- If it's there, just give a back-reference in form
<offset from current, how much to copy> - If the box gets filled, start putting the text into the bag, starting from oldest.
LZ77 Algorithm
Mr. Buffer
Mr. Sliding Window
Can only read and pass...
LZ77 Algorithm
"ABCABC"
(text to be compressed)
LZ77 Algorithm
Mr. Buffer
Mr. Sliding Window
ABCABC
A
B
A
B
C
C
A
C
B
A
B
C
<3, 3>
<3, 3> = Go 3 chars back and copy 3 chars
LZ77 Algorithm
Did you notice the limitation?
If the char or phrase does not appear in the last 32kB of data stored, it cannot be back-referenced.
Huffman Codes
(Also called variable length encoding.)
Huffman Coding reduces the regular byte size of your code.
For ex.
AAABCAD
(7B or 56bits)
AAABCAD
(~50 bits)
Huffman Codes
Let's do simple math
AAABCAD
(7B or 56bits)
We have fixed byte size in computing (8 bits for each char)
Give shortest possible bit size to most frequently appeared characters and longest to the least frequently appreared
Huffman Codes
AAABCAD
A
B
C
D
0
10
11
111
Dictionary
But we need to send dictionary as well, 😯
which is 41 bits
Total = 52 bits
< 56 bits
Let's see it in action 🚀
function add(number1, number2) {
return number1 + number2;
}
function subtract(number1, number2) {
return number1 - number2;
}
function multiply(number1, number2) {
return number1 * number2;
}
function divide(number1, number2) {
return number1 / number2;
}
export default {
add,
subtract,
multiply,
divide
};
RAW - 343B
Let's see it in action 🚀
function add(t,u){return t+u}function subtract(t,u){return t-u}function multiply(t,u){return t*u}function divide(t,u){return t/u}export default{add:add,subtract:subtract,multiply:multiply,divide:divide};
Minified - 204B
function add(t,u){return t+u}` 4$subtract` 4)-` 7&multiply` V)*` Y÷` v)/u}export default{add:add,`!*#:`!3#,` z#:`!##,` l!:` s!};
After LZ77 (Using JS implementation of LZ77) - 134B
����]math.js.lz77�A
�0��L�S#.��Bw�1-�L����t��[<���@aJ`�;�f�gO�Ѕ����Z]�͇#�9n/�"�kpaU7.�;O������^��N�h�Aq��wH�� ,�t��U
�
After Huffman Coding (Complete GZip) - 152B
Let's see it in action 🚀
GZip (in binary) - 152B
What is Brotli?
- Loss-less data compression tool, based on LZ77 and Huffman C., created by Google.
- Came out in 2013 for offline compression of heavy font files.
- Was re-released in September 2015 with:
- Improved compression ratio
- Sped up encoder and decoder
- improved streaming API
- Extra compression levels added
What makes Brotli outperform GZip?
- Static Dictionary
Unlike in Gzip, where you need to send entire huffman dictionary, here you already have a dictionary you can refer to.
120kB and 13,000 commons phrases, words and context from HTML documents.
- Bigger and Flexible Sliding Window
Can be as large as 16MB.
- Order-2 Context Modeling.
Should we upgrade to Brotli now or wait?
- Brotli is ready to use-on-production.
- It is great option for static assets. Like your fonts, scripts, styles.
- It offers upto 11 levels of compression and at level 5 it's better than GZip compression level 9.
- After level 5, Brotli starts using it's advanced "2-Order Context Modeling" feature for compressions and that consumes high amounts of memory.
- If you do compression on-the-fly, or dynamic compression, Brotli isn't for you.
I hope you did learn few good things today
Understanding the compressions: GZIP and Brotli
By Yatharth K
Understanding the compressions: GZIP and Brotli
- 275