Information Representation

Recall how the storage devices stores data

  • Magnetic
  • Optical
  • RAM

Info. Representation

  • Computer stores everything as two states, on/off, 0/1, yes/no etc. 
  • Some method to "code" binary data into other format so that the computer can store and process them. 

Binary Revision

  • Show understanding of binary magnitudes and the difference between binary prefixes and decimal prefixes
  • Use the binary, denary, hexadecimal number bases
  • Convert an integer value from one number base / representation to another
  • Perform binary addition and subtraction
  • Show understanding of how overflow can occur

Prefixes

Binary Prefix Value Denary Prefix Value
kibi 2^10 or 1024 kilo 1000
mebi / mibi 2^20 mega 10^6

Fun facts

  • Storage Devices vendors usually measured in "Decimal Prefix"
  • Thus a 250GB HDD literally means:
    • 250 * 10^9 Bytes
  • However, Windows measure in "Binary Prefix", so it assume 1GB = 2^30 Bytes
  • Thus a 250 GB HDD can only store 232GB of data

Conversion and Addition

  • Class Exercises
  • You need to know:
    • Denary <-> Binary <-> Hexadecimal
    • Binary addition

Integers coding

  • Use the one’s and two’s complement representation for binary numbers
  • Binary addition and subtraction
  • Using positive and negative binary integers
  • Show understanding of how overflow can occur

Integers coding

  • For unsigned integer, directly converting it to binary while filling the unused bits with 0s
    • e.g. 8-bit (1 byte) for value 2 will be 00000010
  • For signed integer, there are three methods:
    • Sign and magnitude: use the highest bit as sign
    • One's complement: binary number obtained by subtracting each digit in a binary number from 1
    • Two's complement: one's complement of a binary number plus 1

Practice

Denary 2's complement (8bit)
0010 0011
-35
71
1010 0001

Use the result from above, calculate, in Binary Addition, 35-71

Those old classic rpg games may use 16-bit integer to store exp points

More about Signed Integer

  • Integer is stored as fixed length, for example java has a few different types of integer
  • Python integer is 4 bytes, must be signed
  • Whenever a number is too big to be stored in an integer, it is called overflow, usually happens when incrementing the number

Overflow

Binary coded decimal (BCD)

  • Every 4 bits represents a decimal numeral
  • e.g. 26 in decimal is represented by
    0010 0110 BCD
        2       6
  • Packed BCD is to store two digits in 1 byte (above example), another type is each byte represents one digit so 26 will became:
    • 00000010 00000110

Advantage and application

  • Compare to usual binary conversion, it is much easier (very straightforward)
    • Common in electronics for displaying numbers (e.g. digital clock)
  • Precision in representing decimal numbers.
    e.g. there is no exact representation for 0.1 in binary encoding, but BCD can be
  • Disadvantage?

In floating point number system, it is impossible to represents 0.1 in binary (Detail will be explained in A-Level)

In floating point number system, it is impossible to represents 0.1 in binary (Detail will be explained in A-Level)

BCD arithmetic

  • Add like usual Binary addition
  • If any resultant nibbles (4-bit) is impossible (i.e. more than 9, or 1001) then:
    • Add 0110 to that nibble
  • For details study figure 1.04 in textbook

Practice questions (s15/qp11/q1)

Text encoding

Objectives

  • Understand how and why a computer represents text and the use of character sets, including American standard code for information interchange (ASCII) and Unicode
    • Text is converted to binary to be processed by a computer
    • Unicode allows for a greater range of characters and symbols than ASCII, including different languages and emojis
    • Unicode requires more bits per character than ASCII

Text encoding

  • There are different character sets for different languages
  • ASCII is a 7-bit character sets, which includes:
    • English characters
    • Numbers and symbols
    • Non-printing characters called control codes
      • ​e.g. 10 for new line (equiv. to '\n' in Python)
      • 0 for NULL character, marking the end of string
  • There are 8-bit extended ASCII set, but no common standard was agreed

Think Pair Share

  1. Try to look at the characters code for 0-9 and the character codes for upper case and lower case alphabets. What is the pattern you can find there?
  2. ASCII is the most common character sets and it is good enough for many situations. But what is the limitations to it?

Unicode

  • Characters are represented by 1-6 bytes
  • Most common one is UTF-8 which varies from 1-4 bytes
  • The first 128 character on Unicode is the same as ASCII
  •  Unicode "code point" are written in this format:
    • U+0041 refer to alphabet A, note this is the same as ASCII code, thus the whole ASCII character sets is part of the Unicode
    • U+1F606 will give this face 😆
  • Unicode standard is evolving and new characters/emoji/language are revised and adding to the standard

Bitmap Graphics

  • Pictures are composed of "dots" called Pixel (Picture element), each of them has its own color information
  • Color Depth is the number of bits that a pixel stores color information.
    • e.g. a Black and white picture is 1-bit
    • 24-bit (R, G, B channel each of 8 bits) color depth is the most commonly used to store photos
  • To calculate the size of (uncompressed) bitmap graphics, we can multiple the pixel dimension with color depth

Vector Graphics

  • Graphics are defined by geometric formulae and their properties, e.g. line color and style
Bitmap Vector
Stores color of each pixel Stores instructions to construct the graphic
Size of the file

Number of pixel and color depth
Size of file ∝ Complexity of the graphics (i.e. not related to image dimension)
Enlarge will reduce quality Enlarge will not affect quality
Common file types:
jpg (lossy), png (lossless), gif, bmp, psd (photoshop)
Common file types:
ai (illustrator), svg (scalable vector graphics)
Usually good for photos Usually good for graphics and illustrations

Calculating size of bitmap data

  • Total number of pixels * color depth
  • Image resolution number of pixels in bitmap file (usually written as width x height, e.g. 800x600)
  • Screen resolution number of pixels that a screen can display
  • Color depth number of bits to represent one pixel
  • Bit depth number of bits to represent each color (e.g. RGB)
  • DPI dots per inch (linear)

Sound waves

  • Sound are variations of pressure, when picked up by the microphone, it forms a varying voltage
  • Sampling measurement of value of analogue signal at a given time
  • Sampling Resolution is the resolution of the ADC when converting the sound amplitude digitally
  • Sampling Rate is how often the amplitude is stored in the computer
  • The higher rate and resolution, the better quality of sound but also more storage is required

Sound editors

  • Sound recorded digitally can be processed by sound editors
  • A sound editor can:
    • Combine sound from different sources
    • Remove noise
    • Adjust volume

Video

  • Video is a sequence of still image (Frame) and display in sequence
  • Frame rate is the frequency of frame displayed
    • Human eyes usually perceive 25 fps or more as continuous motion
  • Progressive encoding stores the data for an entire frame and display at once
  • Interlaced encoding each frame is encoded in two groups, one for odd lines and one for even lines
  • https://www.youtube.com/watch?v=H_o5h5SK_70

Progressive

  • The data frame is transmitted and display
  • Requires more bandwidth

Interlaced

  • Each frame only half of the lines (even or odd) is transmitted
  • Half of the bandwidth is required

Compression

  • Lossless Compression content remains the same as original when decompressed
    • RLE (Run length encoding) replace occurrences of consecutive pattern
    • Huffman coding encode more frequently used characters with shorter codes
  • Lossy compression some information maybe lost during compression
  • https://www.youtube.com/watch?v=By30SCp-Tsw
  • https://www.youtube.com/watch?v=dM6us854Jk0

Lossy

Usually used in multimedia files

Sacrificing quality to gain better compression rate

Graphics: jpeg

Audio: MP3

Video: MP4

Lossless

Usually used in file / data compression

Quality/content will retain

File: zip

Graphics: png, gif

Internet: HTTP (when transmit webpage/content)

Example questions

  • RLE:
    • s16_qp_11#4

[F5CS] Information Representation

By Andy tsui

[F5CS] Information Representation

  • 214