deck

Objectives

Use names, symbols and corresponding powers of 2 for binary prefixes, e.g., Ki, Mi
Differentiate between the character code of a decimal digit and its pure binary representation
Describe how character sets (ASCII and Unicode) are used to represent text

Computers process and store large amounts of bytes, often in the order of millions or billions
- When dealing with large quantities it is more convenient to summarise this using number prefixes
- A common example of this is the kilogram, which is the equivalent of 1000g.

The same number prefixes for decimal values can be used to summarise large quantities of bytes
This includes:

To eliminate the confusion, in 1998 the International Electrotechnical Commission (IEC) established different prefixes to represent multiples of base 2:

If a computer only understands 1s and 0s, what happens when the 'M' key is pressed on the keyboard?

In 1963, the American Standard Code for Information Interchange, (ASCII), was established to encode symbols found in the English alphabet.

It was composed of a 7-bit character set, giving just 128 possible binary codes.

What are the limitations of having only a 7-bit character set?

95 unique characters

10 digits 0-9

26 lowercase letters

26 uppercase letters

33 special characters

Numeric characters are also encoded
- The code 0111001 represents the character '9' in ASCII
- The binary byte representing '9' would be 000010012
What are the implications of this difference?
What will the following code output?

Hint: ord() will return the unicode representation of a character

Numeric characters are also encoded
- The code 0111001 represents the character '9' in ASCII
- The binary byte representing '9' would be 000010012
What are the implications of this difference?
What will the following code output?

Hint: ord() will return the unicode representation of a character

Unicode was then introduced to standardise the encoding of characters from every language
- Unicode can apply a variable length of encoding at either 16 bits or 32 bits long
- In order to improve the implementation of this the first 128 Unicode characters were set to be the same as the 128 in ASCII
What could be a disadvantage of using 4 bytes per character?

In Unicode, every character in every language in the world, every mathematical and scientific symbol, etc. can be represented:

Español

한국어

Македонски

ਪੰਜਾਬੀ ਦੇ

ελληνικά

Have a go at the ASCII/Unicode worksheet on Moodle

Extension: Complete the ASCII Exam Questions on Moodle