Numbers in Digital Systems

Numerical Methods

David Mayerich

Scalable Tissue Imaging and Modeling (STIM) Laboratory

Department of Electrical and Computer Engineering

Cullen College of Engineering

University of Houston

David Mayerich

STIM Laboratory, University of Houston

Radix

Representing and Converting Base

Binary Numbers and Arithmetic

Bases in Digital Systems

David Mayerich

STIM Laboratory, University of Houston

Numerical Bases

  • The radix or base is the number of unique digits used to represent a number:

David Mayerich

STIM Laboratory, University of Houston

2473_{10} = 2 \times 10^3 + 4 \times 10^2 + 7 \times 10^1 + 3 \times 10^0 = 2473
4651_{8} = 4 \times 8^3 + 6 \times 8^2 + 5 \times 8^1 + 1 \times 8^0 = 2473
\begin{split} 1001 1010 1001_{2} = & 1 \times 2^{11} + 0 \times 2^{10} + 0 \times 2^{9} + 1\times 2^{8} +\\ & 1 \times 2^{7} + 0 \times 2^{6} + 1 \times 2^{5} + 0 \times 2^{4} +\\ & 1 \times 2^{3} + 0 \times 2^{2} + 0 \times 2^{1} + 1 \times 2^{0} = 2473 \end{split}
9A9_{16} = 9 \times 16^2 + 10 \times 16^1 + 9 \times 16^1 = 2473
10 = A \quad 11 = B \quad 12 = C \quad 13 = D \quad 14 = E \quad 15 = F

Hexidecimal (base \(16\))

Binary (base \(2\))

Octal (base \(8\))

Decimal (base \(10\))

Radix Points

  • Separates whole numbers from fractions in any base

David Mayerich

STIM Laboratory, University of Houston

182.5_{10} = 1 \times 10^2 + 8 \times 10^1 + 2 \times 10^0 + 5 \times 10^{-1} = 182.5
266.4_{8} = 2 \times 8^2 + 6 \times 8^1 + 6 \times 8^0 + 4 \times 8^{-1} = 182.5
\begin{split} 1011 0110.1_{2} = & 1 \times 2^{11} + 0 \times 2^{10} + 1 \times 2^{9} + 1\times 2^{8} +\\ & 0 \times 2^{7} + 1 \times 2^{6} + 1 \times 2^{5} + 0 \times 2^{4} +\\ & 1 \times 2^{-1} = 182.5 \end{split}
B6.8_{16} = 11 \times 16^1 + 6 \times 16^0 + 8 \times 16^{-1} = 182.5
10 = A \quad 11 = B \quad 12 = C \quad 13 = D \quad 14 = E \quad 15 = F

Hexidecimal (base \(16\))

Binary (base \(2\))

Octal (base \(8\))

Decimal (base \(10\))

Binary Numbers

Registers

Converting Binary Numbers

Binary Arithmetic

David Mayerich

STIM Laboratory, University of Houston

Binary

  • Modern computers represent numbers using memory cells

  • Individual cells can occupy two distinct states: high and low voltage

  • Each cell represents one binary digit: high = 1, low = 0

David Mayerich

STIM Laboratory, University of Houston

0100 \ \ 1101 \ \ 0110 \ \ 1010 \ \ 1111 \ \ 0011
0
0
0
0
9
9
9
9
0
0
0
0
1
1
1
1

\(\rightarrow 10^4\) values

\(\rightarrow 2^4\) values

9
9
9
9
9
9
10
2
0
0
0
0
0
0
10
10
10
2
2
2
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1

\(\rightarrow 10^6\) values

\(\rightarrow 2^6\) values

byte

nibble

  • Numbers are represented as sequences of digits

  • Digits define the number of different values that can be represented

min

max

min

max

Reading Binary Numbers

  1. Initialize a decimal register \(x_{0} = 0\)

  2. For each binary digit, double \(x\) and add the associated digit as a decimal value

David Mayerich

STIM Laboratory, University of Houston

1101 \rightarrow 1 \quad 1 \quad 0 \quad 1
  • Used to be known as "double dabble"

x_0 = 0
\begin{split} x_1 = 2(0)&+1\\ &=1 \end{split}
\begin{split} x_2 = 2(1) &+1\\ &=3 \end{split}
\begin{split} x_3 = 2(3) &+0\\ &=6 \end{split}
\begin{split} x_4 = 2(6) &+1\\ &=13 \end{split}

Reading Binary Numbers

David Mayerich

STIM Laboratory, University of Houston

0110 \ 1010 \rightarrow 0 \quad\quad 1 \quad\quad 1 \quad\quad 0 \quad\quad 1 \quad\quad 0 \quad\quad 1 \quad\quad 0
x_0 = 0
\begin{split} x_1 = 2(0)&+0\\ &=0 \end{split}
\begin{split} x_1 = 2(0)&+1\\ &=1 \end{split}
\begin{split} x_1 = 2(1)&+1\\ &=3 \end{split}
\begin{split} x_1 = 2(3)&+0\\ &=6 \end{split}
\begin{split} x_1 = 2(6)&+1\\ &=13 \end{split}
\begin{split} x_1 = 2(13)&+0\\ &=26 \end{split}
\begin{split} x_1 = 2(26)&+1\\ &=53 \end{split}
\begin{split} x_1 = 2(53)&+0\\ &=106 \end{split}

Fractional Binary Numbers

David Mayerich

STIM Laboratory, University of Houston

1101.1011 \rightarrow 1 \quad\quad 1 \quad\quad 0 \quad\quad 1 \quad.\quad 1 \quad\quad 0 \quad\quad 1 \quad\quad 1
13
\frac{11}{}
16
x_0 = 0
\begin{split} x_1 = 2(0)&+1\\ &=1 \end{split}
\begin{split} x_1 = 2(1)&+1\\ &=3 \end{split}
\begin{split} x_1 = 2(3)&+0\\ &=6 \end{split}
\begin{split} x_1 = 2(3)&+0\\ &=13 \end{split}
x_0 = 0
\begin{split} x_1 = 2(0)&+1\\ &=1 \end{split}
\begin{split} x_1 = 2(1)&+0\\ &=2 \end{split}
\begin{split} x_1 = 2(2)&+1\\ &=5 \end{split}
\begin{split} x_1 = 2(5)&+1\\ &=11 \end{split}

Binary Arithmetic

David Mayerich

STIM Laboratory, University of Houston

0\ 1\ 0\ 1\ .\ 0\ 1\ 0
+ 0\ 0\ 1\ 1\ .\ 1\ 0\ 1
1\ 0\ 0\ 0\ .\ 1\ 1\ 1
0\ 1\ 0\ 1\ .\ 0\ 1\ 0
\times 0\ 0\ 1\ 1\ .\ 1\ 0\ 1
5 . 250
+3 . 625
8 . 875
5 . 250
\times 3 . 625
19 . 03125
  • Arithmetic works the same way in any base

1
1
1
0\ 1\ 0\ 1\ \ \ 0\ 1\ 0
0\ 0\ 0\ 0\ 0\ \ \ 0\ 0
0\ 1\ 0\ 1\ 0\ 1\ \ \ 0
0\ 1\ 0\ 1\ 0\ 1\ 0
0\ 1\ 0\ 1\ 0\ 1\ 0
1\ 0\ 0\ 1\ 1\ 0. 0\ 0\ \ \ 0\ 1\ 0
1
10
1
1
1
1
1

Representing Integers

Integers

Signed/Unsigned Integers

Overflow

David Mayerich

STIM Laboratory, University of Houston

Signed and Unsigned Integers

  • One's complement - negative values are the bitwise NOT of positive values

David Mayerich

STIM Laboratory, University of Houston

0001\ 0001_2 = 17_{10}
1110\ 1110_2 = -17_{10}
0111\ 1011_2 = 122_{10}
1000\ 0100_2 = -122_{10}
0101\ 0110_2 = 86_{10}
1010\ 1001_2 = -86_{10}
  • Two's complement - negative values are the bitwise NOT \(+1\)

  • Represent negative integer values in registers: \(-17_{10}=-0001\ 0001_2\)

  • Sign and magnitude - leading bit represents the sign \(\bm{+}\rightarrow 0\) and \(\bm{-}\rightarrow 1\)

0001\ 0001_2 = 17_{10}
1001\ 0001_2 = -17_{10}
0111\ 1011_2 = 122_{10}
1111\ 1011_2 = -122_{10}
0101\ 0110_2 = 86_{10}
1101\ 0110_2 = -86_{10}
0001\ 0001_2 = 17_{10}
1110\ 1110_2 +1
1110\ 1111_2 = -17_{10}
0111\ 1011_2 = 122_{10}
1000\ 0100_2 +1
1000\ 0101_2 = -122_{10}
0101\ 0110_2 = 86_{10}
1010\ 1001_2 +1
1010\ 1010_2 = -86_{10}

Integer Overflow

  • What happens when the result exceeds the available register size?

  • Consider a \(6\) bit addition:

David Mayerich

STIM Laboratory, University of Houston

1
0
0
0
0
1
1
  • Overflow is defined for unsigned integers:

    1. Assume an \(n\)-bit operation

    2. Perform the operation

    3. Keep the \(n\) least significant bits

  • Keeping \(n\) bits is equivalent to the operation \(x\ \text{mod}\ 2^n\):

97\ \text{mod}\ 2^n=33

§6.2.5/9
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

  • Overflow is undefined for signed integers

1
1
0
1
0
1
1
0
1
1
0
0
+
53
44
97
+

What Does This Mean?

  • Every register is limited - \(n\) bits let you represent \(2^n\) different values

  • Signed integers require a signed bit to flag \(+\)/\(-\) values so the range is halved:

David Mayerich

STIM Laboratory, University of Houston

unsigned char a;	// [0, 255]
char b;				// [-128, 127]

unsigned int x;		// [0, 2^32 - 1]
int x;				// [-2^31, 2^31 - 1]
  • Most integer overflows are undefined

  • Unsigned integers overflow by "wrapping around" to the minimum value:
     

    • This is the same result as a modulo operation:

unsigned int a = UINT_MAX + 1;	// a = 0
unsigned int c = a + b;
unsigned int z = (a + b) % pow(2, 32);
// c == z

Implementing Floating Point

Floating Point Numbers

Floating Point Arithmetic

Digital Representations

David Mayerich

STIM Laboratory, University of Houston

Scientific Notation

  • Compressed format using a mantissa \(m\) and an exponent \(n\):

David Mayerich

STIM Laboratory, University of Houston

a = m \times 10^n
  • Represent large and small numbers to simplify calculations:

  • Computers and calculators often use "E" to denote the exponent: 6.6743E-11

  • The mantissa is normalized: \(1 \leq m < b\) where \(b\) is the base

ensures one representation for any value

\text{where}\quad m\in \mathbb{Q},\ n\in\mathbb{Z}

mantissa
or significand

exponent

G=6.6743 \times 10^{-11}

gravitational constant:

c=2.9979 \times 10^{8}

speed of light:

elementary charge:

e=1.6022 \times 10^{-19}

electric permittivity:

\epsilon_0=8.8542 \times 10^{-12}
G=667.43 \times 10^{-14}
G=0.0066743 \times 10^{-8}
G=6.6743 \times 10^{-11}

Floating Point

  • Scientific notation can be used in any basis \(b\)

  • Specify a precision: number of digits for \(m\) and \(n\)

  • Mantissa and exponent can be negative or positive

David Mayerich

STIM Laboratory, University of Houston

a = \pm \ m \times b^n
G= + 6\ .\ 6\ 7\ 4\ 3 \times 10^{-\ 1\ 1}
c=+ 2\ .\ 9\ 9\ 7\ 9 \times 10^{+\ 0\ 8}
=+ 1\ .\ 4\ 7\ 7\ 7 \times 2^{-\ 6\ 3}
=+ 1\ .\ 1\ 1\ 6\ 8 \times 2^{+\ 2\ 8}

5 digit

precision

2 digit

exponent

sign

Floating Point Arithmetic

  • Fixed precision means that a floating point value \(\text{fl}(x)\) may not match the target value \(x\)

David Mayerich

STIM Laboratory, University of Houston

+\ 1\ 3\ .\ 1\ 4\ 4\ 1 \times 10^{-\ 1\ 1}
  • A non-representable number \(x\) is surrounded by two representable values: \(x_-\) and \(x_+\):

     

  • Rounding options:

    • round-by-chopping:
       

    • round-to-nearest: \(\text{fl}(x)\) is the closest representable value to \(x\) (ties resolve to the closest even value)

1.3144 < 1.31441 < 1.3145
\text{fl}(x) = \begin{cases} x_+ & x \leq 0\\ x_- & x > 0\\ \end{cases}
+\ 7\ .\ 3\ 1\ 2\ 4 \times 10^{-\ 1\ 1}
+\ 5\ .\ 8\ 3\ 1\ 7 \times 10^{-\ 1\ 1}
+\ 1\ .\ 3\ 1\ 4\ 4\ 1 \times 10^{-\ 1\ 0}
+\ 1\ .\ 3\ 1\ 4\ 4 \times 10^{-\ 1\ 0}

Implications

  • Operations on floating point numbers are not necessarily associative or distributive:

David Mayerich

STIM Laboratory, University of Houston

\text{fl}(\text{fl}(x + y) + z) \neq \text{fl}(x + \text{fl}(y + z))
\text{fl}(z \times \text{fl}(x + y)) \neq \text{fl}(\text{fl}(z \times x) + \text{fl}(z \times y))
  • Cumulative operations can fail:

9.993 \times 10^1 + 4.000 \times 10^{-2} = 9.997 \times 10^1
9.997\times 10^1 + 4.000\times 10^{-2} = 1.0001\times 10^1 \rightarrow 1.000\times 10^1
1.000\times 10^1 + 4.000\times 10^{-2} = 1.0004\times 10^1 \rightarrow 1.000\times 10^1
(0.03842 +1.273) -1.221
0.03842
+1.273
1.31142
-1.221
0.090
0.03842 +(1.273 -1.221)

vs.

+1.273
-1.221
0.052
+0.03842
0.09042
9\ 9\ .\ 9\ 3
0\ 0\ .\ 0\ 4
9\ 9\ .\ 9\ 7
0\ 0\ .\ 0\ 4
1\ 0\ 0\ .\ 0\ 1
0\ 0\ .\ 0\ 4
1\ 0\ 0\ .\ 0\ 4

Implementation Quirks

  • Digital systems implement floating point using binary values

  • Sign-and-magnitude is used for negative/positive values

  • Normalization: \(1\) is always the leading bit, so it doesn't have to be stored (implied \(1\))

  • Exponent bias

    • signed exponents are required, but two's complement makes comparisons slower

    • a static bias \(B\) is introduced

David Mayerich

STIM Laboratory, University of Houston

1.m \times 2^{x - B}

where \(m\in \mathbb{Q}\) and \(x\in\mathbb{Z}\) are both binary values

Floating Point Standards

  • The IEEE 754 standard is the most common for floating point in computing:

David Mayerich

STIM Laboratory, University of Houston

standard C/C++ m bits x bits bias
binary16 single 10 5 15
binary32 float 23 8 127
binary64 double 53 11 1023
binary128 N/A 113 15 16383
binary256 N/A 19 237 262143
  • Floating point storage within a register:

1\ \ 1\ \ 0\ \ 1\ \ 1\ \ 0\ \ 0\ \ 1\ \ 1\ \ 0\ \ 1\ \ 0\ \ 0\ \ 0\ \ 1\ \ 0
-1.0110100010_{2}\times 2^{10110_2 - 15}
2^{22 - 15} = 2^7
-10110100.010_{2} = -180.25

sign

exponent

mantissa

Floating Point Standards (32-bit)

David Mayerich

STIM Laboratory, University of Houston

Word Size and Endianness

  • The word size is a single unit of data stored, handled by an operation, or transmitted

  • The smallest addressable data size is usually a byte (8 bits)

David Mayerich

STIM Laboratory, University of Houston

1\ 1\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 1\ 0\ 0\ 1\ 0\ 0\ 1\ 0\ 0\ 0\ 0\ 1\ 1\ 1\ 1\ 1\ 1\ 0\ 1\ 1\ 0\ 1\ 1

Consider a 32-bit floating point value representing \(-\pi\):

  • Each 4-bit nibble has \(2^4=16\) possible values, often represented using hexadecimal

  • Bytes in a sequence can be stored in two orders

    • Big-Endian:    C0  49  0F  DB

    • Little-Endian:  DB  0F  49  C0

C

0

4

9

0

F

D

B

sign

exponent (8 bits)

mantissa (23 bits)

1 byte

4 bit

"nibble"

more common

Data Dumps and Memory

David Mayerich

STIM Laboratory, University of Houston

Discussion

  • What is stored in this IEEE 754 binary32 register:

David Mayerich

STIM Laboratory, University of Houston

A1\quad 1E\quad EF\quad 3B
3B\quad EF\quad 1E\quad A1
0011\ 1011\quad 1110\ 1111\quad 0001\ 1110\quad 1010\ 0001
+1.11011110001111010100001_2 \times 2^{127 - 01110111_2}
119 - 127 = -8
+1.86812222003936767578 \times 2^{-8}
7.2973524220287799835205078125 \times 10^{-3}
\text{actual}\quad 7.2973525643 \times 10^{-3}
\epsilon= -1.422712200164794921875 \times 10^{-10}

endianness