3h Data Compression

3h Data Compression

  • Explain what data compression is.
  • Understand why data may be compressed and that there are different ways to compress data.
  • Explain how data can be compressed using Huffman coding.
  • Be able to interpret/create Huffman trees.
  • Be able to calculate the number of bits required to store a piece of data compressed using Huffman coding.
  • Be able to calculate the number of bits required to store a piece of uncompressed data in ASCII.
  • Explain how data can be compressed using run length encoding (RLE).
  • Represent data in RLE frequency/data pairs.

Data Compression

  • Data compression is the reduction in file size to reduce download times and storage requirements.
  • Compression results in smaller file sizes and faster transfer of data around a network.
  • Compression is achieved by removing the repetition of identical sets of data bits.

Data Compression

  • Data compression is the reduction in file size to reduce download times and storage requirements.
  • Compression results in smaller file sizes and faster transfer of data around a network.
  • Compression is achieved by removing the repetition of identical sets of data bits.

Huffman Coding

Use Huffman encoding to compress "Mississippi river"

Huffman Coding

Mississippi river

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

Huffman Coding

Mississippi river

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

Huffman Coding

Mississippi river

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

2

2

Huffman Coding

Mississippi river

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

2

2

Huffman Coding

Mississippi river

4

4

2

2

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

Huffman Coding

Mississippi river

4

4

2

2

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

Huffman Coding

Mississippi river

8

4

4

2

2

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

9

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

Huffman Coding

Mississippi river

17

8

4

4

2

2

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

9

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

Huffman Coding

Mississippi river

17

8

4

4

2

2

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

9

1

e

1

v

1

r

2

space

1

p

2

s

4

i

5

M

1

0

1

0

1

0

1

0

1

0

1

0

1

0

Huffman Coding

Mississippi river

17

8

4

4

2

2

i

5

s

4

p

2

r

2

M

1

v

1

e

1

space

1

9

1

0

1

0

1

0

1

0

1

0

1

0

1

0

M

1100

e

v

r

space

p

s

i

00

01

100

1111

101

1101

1110

Huffman Coding

Mississippi river

1100000101000101001001000011111010011011110101

Huffman encoded:

46 bits

8-bit ascii code:

17 characters × 8 bits = 136 bits

M

1100

e

v

r

space

p

s

i

00

01

100

1111

101

1101

1110

Repeated characters will compress more.

Example

Use Huffman encoding to compress "access"

a = 1

c = 2

e = 1

s = 2

c

2

s

2

a

1

e

1

2

Example

Use Huffman encoding to compress "access"

a = 1

c = 2

e = 1

s = 2

a = 110

c = 0

e = 111

s = 10

c

2

s

2

a

1

e

1

2

4

6

1

0

0

0

1

1

access

8-bit ascii: 6 × 8 = 48 bits

Huffman: 3 + 1 + 1 + 3 + 2 + 2 = 12 bits

Run Length Encoding

Run length encoding (RLE) works well with repeated data.

It is especially effective with bitmap images because you get blocks of the same colour.

Run Length Encoding

Consider this 2 colour bitmap:

In binary:

1 1 1 1 1 1
1 0 0 0 0 1
1 0 0 0 0 1
1 0 0 0 0 1
1 1 1 1 1 1

Image size = 6 × 5 × 1 = 30 bits

Run Length Encoding

Consider this 2 colour bitmap:

Image size = 6 × 5 × 1 = 30 bits

Could write this as:

7B
4W
2B
4W
2B
4W
7B

or:

71
40
21
40
21
40
71

Convert to RLE Format

1B
1B
1B

Write as number of continuous colours:

9W
6B
3W
1B
2W

then:

90 61 30 11 20 11 40 11 20 11 40 11 20 11 40 11 20 11 100
4W
1B
2W
1B
4W
2W
1B
2W
10W
4W
1B

Draw from RLE Format

Split the data in to number-colour pairs:

then replace the 0 with white and the 1 with black (or whatever colours you are given):

1 0 4 1 5 0 1 1 4 0 1 1 4 0 1 1 4 0 1 1 5 0 4 1 1 0
10 41 50 11 40 11 40 11 40 11 50 41 10
1B
4W
1B
5W
4B
1W
1W
4B
5W
1B
4W
1B
4W

Questions

  1. In computing explain what compression is.
  1. Show how to compress this diagram using run length encoding (RLE).
  1. Show how to compress 'forever' using Huffman coding.
  1. What are the two forms of compression you need to know and explain?

Questions

  1. In computing explain what compression is.
  1. What are the two forms of compression you need to know and explain?

Using programming/algorithms to reduce the storage requirements for a set of data (text/image/video/sound).

Huffman encoding

Run length encoding

Questions

  1. Show how to compress 'forever' using Huffman coding.

f = 1

o = 1

r = 2

e = 2

v = 1

e:2

r:2

f:1

o:1

v:1

2

3

4

7

1

0

0

0

0

1

1

1

forever: 10 110 01 00 111 00 01

Questions

  1. Show how to compress this diagram using run length encoding (RLE).

5b 2w 2b 2w 6b 2w 2b 2w 5b

51 20 21 20 61 20 21 20 51

3h Data Compression

  • Explain what data compression is.
  • Understand why data may be compressed and that there are different ways to compress data.
  • Explain how data can be compressed using Huffman coding.
  • Be able to interpret/create Huffman trees.
  • Be able to calculate the number of bits required to store a piece of data compressed using Huffman coding.
  • Be able to calculate the number of bits required to store a piece of uncompressed data in ASCII.
  • Explain how data can be compressed using run length encoding (RLE).
  • Represent data in RLE frequency/data pairs.

3h Data Compression

By David James

3h Data Compression

Computer Science - Fundamentals of Data Representation - Data Compression

  • 395