Information Representation
Recall how the storage devices stores data
- Magnetic
- Optical
- RAM
Info. Representation
- Computer stores everything as two states, on/off, 0/1, yes/no etc.
- Some method to "code" binary data into other format so that the computer can store and process them.
Binary Revision
- Show understanding of binary magnitudes and the difference between binary prefixes and decimal prefixes
- Use the binary, denary, hexadecimal number bases
- Convert an integer value from one number base / representation to another
- Perform binary addition and subtraction
- Show understanding of how overflow can occur
Prefixes
Binary Prefix | Value | Denary Prefix | Value |
---|---|---|---|
kibi | 2^10 or 1024 | kilo | 1000 |
mebi / mibi | 2^20 | mega | 10^6 |
Fun facts
- Storage Devices vendors usually measured in "Decimal Prefix"
- Thus a 250GB HDD literally means:
- 250 * 10^9 Bytes
- However, Windows measure in "Binary Prefix", so it assume 1GB = 2^30 Bytes
- Thus a 250 GB HDD can only store 232GB of data
Conversion and Addition
- Class Exercises
- You need to know:
- Denary <-> Binary <-> Hexadecimal
- Binary addition
Integers coding
- Use the one’s and two’s complement representation for binary numbers
- Binary addition and subtraction
- Using positive and negative binary integers
- Show understanding of how overflow can occur
Integers coding
- For unsigned integer, directly converting it to binary while filling the unused bits with 0s
- e.g. 8-bit (1 byte) for value 2 will be 00000010
- For signed integer, there are three methods:
- Sign and magnitude: use the highest bit as sign
- One's complement: binary number obtained by subtracting each digit in a binary number from 1
- Two's complement: one's complement of a binary number plus 1
Practice
Denary | 2's complement (8bit) |
---|---|
0010 0011 | |
-35 | |
71 | |
1010 0001 |
Use the result from above, calculate, in Binary Addition, 35-71
Those old classic rpg games may use 16-bit integer to store exp points
More about Signed Integer
- Integer is stored as fixed length, for example java has a few different types of integer
- Python integer is 4 bytes, must be signed
- Whenever a number is too big to be stored in an integer, it is called overflow, usually happens when incrementing the number
Overflow
Binary coded decimal (BCD)
- Every 4 bits represents a decimal numeral
- e.g. 26 in decimal is represented by
0010 0110 BCD
2 6 -
Packed BCD is to store two digits in 1 byte (above example), another type is each byte represents one digit so 26 will became:
- 00000010 00000110
Advantage and application
- Compare to usual binary conversion, it is much easier (very straightforward)
- Common in electronics for displaying numbers (e.g. digital clock)
- Precision in representing decimal numbers.
e.g. there is no exact representation for 0.1 in binary encoding, but BCD can be - Disadvantage?
In floating point number system, it is impossible to represents 0.1 in binary (Detail will be explained in A-Level)
In floating point number system, it is impossible to represents 0.1 in binary (Detail will be explained in A-Level)
BCD arithmetic
- Add like usual Binary addition
- If any resultant nibbles (4-bit) is impossible (i.e. more than 9, or 1001) then:
- Add 0110 to that nibble
- For details study figure 1.04 in textbook
Practice questions (s15/qp11/q1)
Text encoding
Objectives
- Understand how and why a computer represents text and the use of character sets, including American standard code for information interchange (ASCII) and Unicode
- Text is converted to binary to be processed by a computer
- Unicode allows for a greater range of characters and symbols than ASCII, including different languages and emojis
- Unicode requires more bits per character than ASCII
Text encoding
- There are different character sets for different languages
-
ASCII is a 7-bit character sets, which includes:
- English characters
- Numbers and symbols
- Non-printing characters called control codes
- e.g. 10 for new line (equiv. to '\n' in Python)
- 0 for NULL character, marking the end of string
- There are 8-bit extended ASCII set, but no common standard was agreed
Think Pair Share
- Try to look at the characters code for 0-9 and the character codes for upper case and lower case alphabets. What is the pattern you can find there?
- ASCII is the most common character sets and it is good enough for many situations. But what is the limitations to it?
Unicode
- Characters are represented by 1-6 bytes
- Most common one is UTF-8 which varies from 1-4 bytes
- The first 128 character on Unicode is the same as ASCII
- Unicode "code point" are written in this format:
- U+0041 refer to alphabet A, note this is the same as ASCII code, thus the whole ASCII character sets is part of the Unicode
- U+1F606 will give this face 😆
- Unicode standard is evolving and new characters/emoji/language are revised and adding to the standard
Bitmap Graphics
- Pictures are composed of "dots" called Pixel (Picture element), each of them has its own color information
-
Color Depth is the number of bits that a pixel stores color information.
- e.g. a Black and white picture is 1-bit
- 24-bit (R, G, B channel each of 8 bits) color depth is the most commonly used to store photos
- To calculate the size of (uncompressed) bitmap graphics, we can multiple the pixel dimension with color depth
Vector Graphics
- Graphics are defined by geometric formulae and their properties, e.g. line color and style
Bitmap | Vector |
---|---|
Stores color of each pixel | Stores instructions to construct the graphic |
Size of the file ∝ Number of pixel and color depth |
Size of file ∝ Complexity of the graphics (i.e. not related to image dimension) |
Enlarge will reduce quality | Enlarge will not affect quality |
Common file types: jpg (lossy), png (lossless), gif, bmp, psd (photoshop) |
Common file types: ai (illustrator), svg (scalable vector graphics) |
Usually good for photos | Usually good for graphics and illustrations |
Calculating size of bitmap data
- Total number of pixels * color depth
- Image resolution number of pixels in bitmap file (usually written as width x height, e.g. 800x600)
- Screen resolution number of pixels that a screen can display
- Color depth number of bits to represent one pixel
- Bit depth number of bits to represent each color (e.g. RGB)
- DPI dots per inch (linear)
Sound waves
- Sound are variations of pressure, when picked up by the microphone, it forms a varying voltage
- Sampling measurement of value of analogue signal at a given time
- Sampling Resolution is the resolution of the ADC when converting the sound amplitude digitally
- Sampling Rate is how often the amplitude is stored in the computer
- The higher rate and resolution, the better quality of sound but also more storage is required
Sound editors
- Sound recorded digitally can be processed by sound editors
- A sound editor can:
- Combine sound from different sources
- Remove noise
- Adjust volume
Video
- Video is a sequence of still image (Frame) and display in sequence
-
Frame rate is the frequency of frame displayed
- Human eyes usually perceive 25 fps or more as continuous motion
- Progressive encoding stores the data for an entire frame and display at once
- Interlaced encoding each frame is encoded in two groups, one for odd lines and one for even lines
- https://www.youtube.com/watch?v=H_o5h5SK_70
Progressive
- The data frame is transmitted and display
- Requires more bandwidth
Interlaced
- Each frame only half of the lines (even or odd) is transmitted
- Half of the bandwidth is required
Compression
-
Lossless Compression content remains the same as original when decompressed
- RLE (Run length encoding) replace occurrences of consecutive pattern
- Huffman coding encode more frequently used characters with shorter codes
- Lossy compression some information maybe lost during compression
- https://www.youtube.com/watch?v=By30SCp-Tsw
- https://www.youtube.com/watch?v=dM6us854Jk0
Lossy
Usually used in multimedia files
Sacrificing quality to gain better compression rate
Graphics: jpeg
Audio: MP3
Video: MP4
Lossless
Usually used in file / data compression
Quality/content will retain
File: zip
Graphics: png, gif
Internet: HTTP (when transmit webpage/content)
Example questions
- RLE:
- s16_qp_11#4
[F5CS] Information Representation
By Andy tsui
[F5CS] Information Representation
- 218