Information Representation

Recall how the storage devices stores data

Magnetic
Optical
RAM

Info. Representation

Computer stores everything as two states, on/off, 0/1, yes/no etc.
Some method to "code" binary data into other format so that the computer can store and process them.

Binary Revision

Show understanding of binary magnitudes and the difference between binary prefixes and decimal prefixes
Use the binary, denary, hexadecimal number bases
Convert an integer value from one number base / representation to another
Perform binary addition and subtraction
Show understanding of how overflow can occur

Prefixes

Binary Prefix	Value	Denary Prefix	Value
kibi	2^10 or 1024	kilo	1000
mebi / mibi	2^20	mega	10^6

Fun facts

Storage Devices vendors usually measured in "Decimal Prefix"
Thus a 250GB HDD literally means:
- 250 * 10^9 Bytes
However, Windows measure in "Binary Prefix", so it assume 1GB = 2^30 Bytes
Thus a 250 GB HDD can only store 232GB of data

Conversion and Addition

Class Exercises
You need to know:
- Denary <-> Binary <-> Hexadecimal
- Binary addition

Integers coding

Use the one’s and two’s complement representation for binary numbers
Binary addition and subtraction
Using positive and negative binary integers
Show understanding of how overflow can occur

Integers coding

For unsigned integer, directly converting it to binary while filling the unused bits with 0s
- e.g. 8-bit (1 byte) for value 2 will be 00000010
For signed integer, there are three methods:
- Sign and magnitude: use the highest bit as sign
- One's complement: binary number obtained by subtracting each digit in a binary number from 1
- Two's complement: one's complement of a binary number plus 1

Practice

Denary	2's complement (8bit)
	0010 0011
-35
71
	1010 0001

Use the result from above, calculate, in Binary Addition, 35-71

Those old classic rpg games may use 16-bit integer to store exp points

More about Signed Integer

Integer is stored as fixed length, for example java has a few different types of integer
Python integer is 4 bytes, must be signed
Whenever a number is too big to be stored in an integer, it is called overflow, usually happens when incrementing the number

Overflow

Binary coded decimal (BCD)

Every 4 bits represents a decimal numeral
e.g. 26 in decimal is represented by
0010 0110 BCD
2 6
Packed BCD is to store two digits in 1 byte (above example), another type is each byte represents one digit so 26 will became:
- 00000010 00000110

Advantage and application

Compare to usual binary conversion, it is much easier (very straightforward)
- Common in electronics for displaying numbers (e.g. digital clock)
Precision in representing decimal numbers.
e.g. there is no exact representation for 0.1 in binary encoding, but BCD can be
Disadvantage?

In floating point number system, it is impossible to represents 0.1 in binary (Detail will be explained in A-Level)

BCD arithmetic

Add like usual Binary addition
If any resultant nibbles (4-bit) is impossible (i.e. more than 9, or 1001) then:
- Add 0110 to that nibble
For details study figure 1.04 in textbook

Practice questions (s15/qp11/q1)

Text encoding

Objectives

Understand how and why a computer represents text and the use of character sets, including American standard code for information interchange (ASCII) and Unicode
- Text is converted to binary to be processed by a computer
- Unicode allows for a greater range of characters and symbols than ASCII, including different languages and emojis
- Unicode requires more bits per character than ASCII

Text encoding

There are different character sets for different languages
ASCII is a 7-bit character sets, which includes:
- English characters
- Numbers and symbols
- Non-printing characters called control codes
  - e.g. 10 for new line (equiv. to '\n' in Python)
  - 0 for NULL character, marking the end of string
There are 8-bit extended ASCII set, but no common standard was agreed

Think Pair Share

Try to look at the characters code for 0-9 and the character codes for upper case and lower case alphabets. What is the pattern you can find there?
ASCII is the most common character sets and it is good enough for many situations. But what is the limitations to it?

Unicode

Characters are represented by 1-6 bytes
Most common one is UTF-8 which varies from 1-4 bytes
The first 128 character on Unicode is the same as ASCII
Unicode "code point" are written in this format:
- U+0041 refer to alphabet A, note this is the same as ASCII code, thus the whole ASCII character sets is part of the Unicode
- U+1F606 will give this face 😆
Unicode standard is evolving and new characters/emoji/language are revised and adding to the standard

Bitmap Graphics

Pictures are composed of "dots" called Pixel (Picture element), each of them has its own color information
Color Depth is the number of bits that a pixel stores color information.
- e.g. a Black and white picture is 1-bit
- 24-bit (R, G, B channel each of 8 bits) color depth is the most commonly used to store photos
To calculate the size of (uncompressed) bitmap graphics, we can multiple the pixel dimension with color depth

Vector Graphics

Graphics are defined by geometric formulae and their properties, e.g. line color and style

Bitmap	Vector
Stores color of each pixel	Stores instructions to construct the graphic
Size of the file ∝ Number of pixel and color depth	Size of file ∝ Complexity of the graphics (i.e. not related to image dimension)
Enlarge will reduce quality	Enlarge will not affect quality
Common file types: jpg (lossy), png (lossless), gif, bmp, psd (photoshop)	Common file types: ai (illustrator), svg (scalable vector graphics)
Usually good for photos	Usually good for graphics and illustrations

Calculating size of bitmap data

Total number of pixels * color depth
Image resolution number of pixels in bitmap file (usually written as width x height, e.g. 800x600)
Screen resolution number of pixels that a screen can display
Color depth number of bits to represent one pixel
Bit depth number of bits to represent each color (e.g. RGB)
DPI dots per inch (linear)

Sound waves

Sound are variations of pressure, when picked up by the microphone, it forms a varying voltage
Sampling measurement of value of analogue signal at a given time
Sampling Resolution is the resolution of the ADC when converting the sound amplitude digitally
Sampling Rate is how often the amplitude is stored in the computer
The higher rate and resolution, the better quality of sound but also more storage is required

Sound editors

Sound recorded digitally can be processed by sound editors
A sound editor can:
- Combine sound from different sources
- Remove noise
- Adjust volume

Video

Video is a sequence of still image (Frame) and display in sequence
Frame rate is the frequency of frame displayed
- Human eyes usually perceive 25 fps or more as continuous motion
Progressive encoding stores the data for an entire frame and display at once
Interlaced encoding each frame is encoded in two groups, one for odd lines and one for even lines
https://www.youtube.com/watch?v=H_o5h5SK_70

Progressive

The data frame is transmitted and display
Requires more bandwidth

Interlaced

Each frame only half of the lines (even or odd) is transmitted
Half of the bandwidth is required

Compression

Lossless Compression content remains the same as original when decompressed
- RLE (Run length encoding) replace occurrences of consecutive pattern
- Huffman coding encode more frequently used characters with shorter codes
Lossy compression some information maybe lost during compression
https://www.youtube.com/watch?v=By30SCp-Tsw
https://www.youtube.com/watch?v=dM6us854Jk0

Lossy

Usually used in multimedia files

Sacrificing quality to gain better compression rate

Graphics: jpeg

Audio: MP3

Video: MP4

Lossless

Usually used in file / data compression

Quality/content will retain

File: zip

Graphics: png, gif

Internet: HTTP (when transmit webpage/content)

Example questions

RLE:
- s16_qp_11#4