Why 0.1 + 0.2

is not equal to 0.3

Converting decimal to binary

Converting fraction

base-q expansion form

0.625 = x_1\cdot 2^{-1}+x_2\cdot 2^{-2}+x_3\cdot 2^{-3}
0.625=x121+x222+x3230.625 = x_1\cdot 2^{-1}+x_2\cdot 2^{-2}+x_3\cdot 2^{-3}
0.625 = x_1\cdot \frac{1}{2}+x_2\cdot \frac{1}{2^2}+x_3\cdot \frac{1}{2^3}
0.625=x112+x2122+x31230.625 = x_1\cdot \frac{1}{2}+x_2\cdot \frac{1}{2^2}+x_3\cdot \frac{1}{2^3}

Factor out 1/2

0.625 = \frac{1}{2}(x_1+\frac{1}{2}x_2+\frac{1}{2^2}x_3)
0.625=12(x1+12x2+122x3)0.625 = \frac{1}{2}(x_1+\frac{1}{2}x_2+\frac{1}{2^2}x_3)

What is      ?

x_1
x1x_1

Single out first digit

If                               are at most equal to 1, and the right side is greater than 1, then must be equal to 1

x_2\cdot\frac{1}{2^2}+x_3\cdot\frac{1}{2^3}
x2122+x3123x_2\cdot\frac{1}{2^2}+x_3\cdot\frac{1}{2^3}
1.25 = x_1+x_2\cdot 2^{-1}+x_3\cdot 2^{-2}
1.25=x1+x221+x3221.25 = x_1+x_2\cdot 2^{-1}+x_3\cdot 2^{-2}
x_1
x1x_1

Factor out 

0.25 = \frac{1}{2}\cdot(x_2+\frac{1}{2}\cdot x_3)
0.25=12(x2+12x3)0.25 = \frac{1}{2}\cdot(x_2+\frac{1}{2}\cdot x_3)

What is      ?

x_2
x2x_2

Repeat until no digits with     left

0.5 = x_2+\frac{1}{2}\cdot x_3
0.5=x2+12x30.5 = x_2+\frac{1}{2}\cdot x_3
1 = x_3
1=x31 = x_3
0.5 = x_3\cdot\frac{1}{2}
0.5=x3120.5 = x_3\cdot\frac{1}{2}
\frac{1}{2}
12\frac{1}{2}
\frac{1}{2}
12\frac{1}{2}

So the algorithm is:

continue multiplying by 2 until the remainder is zero

0.625_{10} = 0.101_2
0.62510=0.10120.625_{10} = 0.101_2
0.375_{10} = ?_2
0.37510=?20.375_{10} = ?_2

Converting integer

base-q expansion form

12 = x_3\cdot 2^3+x_2\cdot 2^2+x_1\cdot 2^1+x_0\cdot 2^0
12=x323+x222+x121+x02012 = x_3\cdot 2^3+x_2\cdot 2^2+x_1\cdot 2^1+x_0\cdot 2^0
12 = x_3\cdot 2^3+x_2\cdot 2^2+x_1\cdot 2^1+x_0
12=x323+x222+x121+x012 = x_3\cdot 2^3+x_2\cdot 2^2+x_1\cdot 2^1+x_0

Factor out 2

12 = 2\cdot(x_3\cdot 2^2+x_2\cdot 2^1+x_1\cdot 2^0) +x_0
12=2(x322+x221+x120)+x012 = 2\cdot(x_3\cdot 2^2+x_2\cdot 2^1+x_1\cdot 2^0) +x_0

What is      ?

x_0
x0x_0

Single out last digits

If       and                 are both even numbers, than       must be 

x_0
x0x_0
2y + x_0
2y+x02y + x_0
x_0
x0x_0
0
00

Factor out 2

6 = 2\cdot(x_2\cdot 2^1+x_2\cdot 2^0) + x_1
6=2(x221+x220)+x16 = 2\cdot(x_2\cdot 2^1+x_2\cdot 2^0) + x_1

What is      ?

x_1
x1x_1

Repeat until no digits with 2 left

Continue...

So the algorithm is:

continue factoring out 2 until the remainder is zero

12_{10}=1100_2
1210=1100212_{10}=1100_2

Converting binary to decimal

Converting fraction

base-q expansion form

0.625 = x_1\cdot \frac{1}{2}+x_2\cdot \frac{1}{2^2}+x_3\cdot \frac{1}{2^3}
0.625=x112+x2122+x31230.625 = x_1\cdot \frac{1}{2}+x_2\cdot \frac{1}{2^2}+x_3\cdot \frac{1}{2^3}
0.625_{10} = 0.101_2
0.62510=0.10120.625_{10} = 0.101_2

So, the algorithm is...

start from 0, add the current result to the current digit and divide by 2

0.625_{10} = 0.101_2
0.62510=0.10120.625_{10} = 0.101_2
0.1011_2=?_{10}
0.10112=?100.1011_2=?_{10}

Converting integer

base-q expansion form

12 = x_3\cdot 2^3+x_2\cdot 2^2+x_1\cdot 2^1+x_0\cdot 2^0
12=x323+x222+x121+x02012 = x_3\cdot 2^3+x_2\cdot 2^2+x_1\cdot 2^1+x_0\cdot 2^0

So, the algorithm is...

start from 0, multiply the current result by 2 and add the current digit

12_{10} = 1100_2
1210=1100212_{10} = 1100_2

Converting

0.1 and 0.2

to binary

Converting 0.1

0.1\cdot2=0.2\enspace\enspace\enspace0.0...
0.12=0.20.0...0.1\cdot2=0.2\enspace\enspace\enspace0.0...
0.2\cdot2=0.4\enspace\enspace\enspace0.00...
0.22=0.40.00...0.2\cdot2=0.4\enspace\enspace\enspace0.00...
0.4\cdot2=0.8\enspace\enspace\enspace0.000...
0.42=0.80.000...0.4\cdot2=0.8\enspace\enspace\enspace0.000...
0.8\cdot2=1.6\enspace\enspace\enspace0.0001...
0.82=1.60.0001...0.8\cdot2=1.6\enspace\enspace\enspace0.0001...
0.6\cdot2=1.2\enspace\enspace\enspace0.00011...
0.62=1.20.00011...0.6\cdot2=1.2\enspace\enspace\enspace0.00011...
0.2\cdot2=0.4\enspace\enspace\enspace0.000110...
0.22=0.40.000110...0.2\cdot2=0.4\enspace\enspace\enspace0.000110...

Converting 0.2

0.2\cdot2=0.4\enspace\enspace\enspace0.00...
0.22=0.40.00...0.2\cdot2=0.4\enspace\enspace\enspace0.00...
0.4\cdot2=0.8\enspace\enspace\enspace0.00...
0.42=0.80.00...0.4\cdot2=0.8\enspace\enspace\enspace0.00...
0.8\cdot2=1.6\enspace\enspace\enspace0.001...
0.82=1.60.001...0.8\cdot2=1.6\enspace\enspace\enspace0.001...
0.6\cdot2=1.2\enspace\enspace\enspace0.0011...
0.62=1.20.0011...0.6\cdot2=1.2\enspace\enspace\enspace0.0011...
0.2\cdot2=0.4\enspace\enspace\enspace0.00110...
0.22=0.40.00110...0.2\cdot2=0.4\enspace\enspace\enspace0.00110...

Representinga number in a

scientific notation

Why scientific notation?

Scientific notation is a way to work easily with

very large or small numbers.

General form

2 = 2\cdot10^0
2=21002 = 2\cdot10^0

Normalized form

1100.101
1100.1011100.101
1100.101 = 1.100101\cdot2^3
1100.101=1.100101231100.101 = 1.100101\cdot2^3

?

Representing 0.1 and 0.2 using normalized form

Rounding binary numbers

Possible rouding options

Round 0.42385 to 2 places. What numbers can it be round to?

0.42 and 0.43

Round 0.110110 to 2 places. What numbers can it be round to?

  • for a lower number we simply discard the remaining places
  • for a larger number we add 1 to the number in the last place

0.11 and 1.00 (0.11+0.01)

How we define that

Defining shortest distance

x_1 < x_2
x1<x2x_1 < x_2
x_2 < x_1
x2<x1x_2 < x_1

Is                             or                                  ?

But!

Subtraction is cumbersome and

won't work for infinite fractions

x_1 = |0.11-0.110110|=0.000110
x1=0.110.110110=0.000110x_1 = |0.11-0.110110|=0.000110
x_2 = |1.00-0.110110|=0.001010
x2=1.000.110110=0.001010x_2 = |1.00-0.110110|=0.001010

So,                               and we should round to 

x_1 < x_2
x1<x2x_1 < x_2
0.11
0.110.11

Comparing with the middle

So,                                         and we should round to 

0.11
0.110.11
0.110110 > 0.111
0.110110>0.1110.110110 > 0.111
0.110110 < 0.111
0.110110<0.1110.110110 < 0.111

Is                                             or                                              ?

0.110110
0.1101100.110110
0.111000
0.1110000.111000
0.110110 < 0.111
0.110110<0.1110.110110 < 0.111

larger

General rule for rounding

  • If X is 0,

Round  0.UVXYZ... to 2 places, the middle is 0.UV1, so

  • If X is 1 and and any remaining digit is 1,

round down

  • If X is 1 and all remaining digits are 0,

apply tie breaker rule (round to even)

0.UV0YZ...

0.UV1               (middle)

0.UV100...

0.UV1               (middle)

round up

0.UV101...

0.UV1               (middle)

Rounding 0.1 and 0.2

Rounding 0.1 to 52 bits

Finding middle

>

Rounding up

Comparing

Rounding 0.2 to 52 bits

Finding middle

>

Rounding up

Comparing

Representing negative numbers with offset binary (Excess-K, biased representation)

Offset-K

How many numbers can represent with 4 bits?

2^4 = 16
24=162^4 = 16
[0;15]
[0;15][0;15]

For positive numbers the range is

But if we include negative numbers, it can be

[-8;7]
[8;7][-8;7]
[0000;1111]
[0000;1111][0000;1111]
[0000;1111]
[0000;1111][0000;1111]
K = 2^{n-1}
K=2n1K = 2^{n-1}

where n, is the number of bits

[-7;8]
[7;8][-7;8]
[0000;1111]
[0000;1111][0000;1111]

Offset defines the range

K = 2^{n-1}-1
K=2n11K = 2^{n-1}-1

(floating point standard)

8 and 7 is an offset, referred to as `K` (excess-K, excess-7)

(common usage)

Calculating number

How to store the number 3 in 4 bits?

1. Calculating offset (K)

K = 2^{4-1}-1 = 7
K=2411=7K = 2^{4-1}-1 = 7

If 0000 is -7, then what number should we add to get 3?

-7+10=3\rightarrow10=3+7
7+10=310=3+7-7+10=3\rightarrow10=3+7

representation = number to store + K

3+K=3+7=10_{10}=1010_2
3+K=3+7=1010=101023+K=3+7=10_{10}=1010_2

number to store = representation - K

1010_2=10_{10};\enspace\enspace10-K=10-7=3
10102=1010;10K=107=31010_2=10_{10};\enspace\enspace10-K=10-7=3

First bit defines the sign

-8_{10} = 0000_2
810=00002-8_{10} = 0000_2
8_{10} = 1000_2
810=100028_{10} = 1000_2
-8_{10} + 8_{10} = 0000_{2}+1000_{2} = 0_{10} = 1000_{2}
810+810=00002+10002=010=10002-8_{10} + 8_{10} = 0000_{2}+1000_{2} = 0_{10} = 1000_{2}

0 - negative number

1 - positive number

Floating point according to the IEEE754

Format

Name Total bits Exponent Significand
Single precision 32 8 23
Double precision 64 11 52

0 - positive value

1- negative value

Sign

Exponent

offset binary (biased representation)

Converting 0.1          

Calculating exponent

to EEE-754 double precision

K = 2^{11-1}-1 = 1023
K=21111=1023K = 2^{11-1}-1 = 1023
-4+1023=1019_{10}=01111111011_2
4+1023=101910=011111110112-4+1023=1019_{10}=01111111011_2

Converting 0.2          

Calculating exponent

to EEE-754 double precision

K = 2^{11-1}-1 = 1023
K=21111=1023K = 2^{11-1}-1 = 1023
-3+1023=1020_{10}=01111111100_2
3+1023=102010=011111111002-3+1023=1020_{10}=01111111100_2

Validating conversion 

function to64bitFloat(number) {
    var f = new Float64Array(1);
    f[0] = number;
    var view = new Uint8Array(f.buffer);
    var i, result = "";
    for (i = view.length - 1; i >= 0; i--) {
        var bits = view[i].toString(2);
        if (bits.length < 8) {
            bits = new Array(8 - bits.length).fill('0').join("") + bits;
        }
        result += bits;
    }
    return result;
}
to64bitFloat(0.1);
// 0 01111111011 1001100110011001100110011001100110011001100110011010

to64bitFloat(0.2);
// 0 01111111100 1001100110011001100110011001100110011001100110011010

Calculating 0.1 + 0.2

Adjusting the exponent

1.100110011001100110011001100110011001100110011001101\cdot2^{-3} =
1.10011001100110011001100110011001100110011001100110123=1.100110011001100110011001100110011001100110011001101\cdot2^{-3} =
11.00110011001100110011001100110011001100110011001101\cdot2^{-4}
11.001100110011001100110011001100110011001100110011012411.00110011001100110011001100110011001100110011001101\cdot2^{-4}

Adding numbers

0.1100110011001100110011001100110011001100110011001101
0.11001100110011001100110011001100110011001100110011010.1100110011001100110011001100110011001100110011001101
1.1001100110011001100110011001100110011001100110011010
1.10011001100110011001100110011001100110011001100110101.1001100110011001100110011001100110011001100110011010

+

10.0110011001100110011001100110011001100110011001100111
10.011001100110011001100110011001100110011001100110011110.0110011001100110011001100110011001100110011001100111

Normalizing

Rounded 0.1

Rounded 0.2

1.100110011001100110011001100110011001100110011001101\cdot2^{-4}
1.100110011001100110011001100110011001100110011001101241.100110011001100110011001100110011001100110011001101\cdot2^{-4}
10.0110011001100110011001100110011001100110011001100111\cdot2^{-3}=
10.011001100110011001100110011001100110011001100110011123=10.0110011001100110011001100110011001100110011001100111\cdot2^{-3}=

Rounding

Finding middle

=

Round to even (round up here)

Comparing

1.00110011...001100110011
1.00110011...0011001100111.00110011...001100110011
0.00000000...000000000001
0.00000000...0000000000010.00000000...000000000001

+

1.00110011...001100110100
1.00110011...0011001101001.00110011...001100110100

Converting to decimal

1.00110011001100110011001100110011001100110011001101 \cdot 2^{-2}=
1.0011001100110011001100110011001100110011001100110122=1.00110011001100110011001100110011001100110011001101 \cdot 2^{-2}=
0.0100110011001100110011001100110011001100110011001101
0.01001100110011001100110011001100110011001100110011010.0100110011001100110011001100110011001100110011001101
Big.DP = 52;
var numbers = "0100110011001100110011001100110011001100110011001101".split("").reverse();
var sum = Big(0);

numbers.forEach(function (number) {
    sum = sum.add(number).div(2);
    console.log(sum.toString());
});

Output

0.5
0.25
0.625
0.8125
0.40625
0.203125
0.6015625
0.80078125
0.400390625
0.2001953125
0.60009765625
0.800048828125
0.4000244140625
0.20001220703125
0.600006103515625
0.8000030517578125
0.40000152587890625
0.200000762939453125
0.6000003814697265625
0.80000019073486328125
0.400000095367431640625
0.2000000476837158203125
0.60000002384185791015625
0.800000011920928955078125
0.4000000059604644775390625
0.20000000298023223876953125
0.600000001490116119384765625
0.8000000007450580596923828125
0.40000000037252902984619140625
0.200000000186264514923095703125
0.6000000000931322574615478515625
0.80000000004656612873077392578125
0.400000000023283064365386962890625
0.2000000000116415321826934814453125
0.60000000000582076609134674072265625
0.800000000002910383045673370361328125
0.4000000000014551915228366851806640625
0.20000000000072759576141834259033203125
0.600000000000363797880709171295166015625
0.8000000000001818989403545856475830078125
0.40000000000009094947017729282379150390625
0.200000000000045474735088646411895751953125
0.6000000000000227373675443232059478759765625
0.80000000000001136868377216160297393798828125
0.400000000000005684341886080801486968994140625
0.2000000000000028421709430404007434844970703125
0.60000000000000142108547152020037174224853515625
0.800000000000000710542735760100185871124267578125
0.4000000000000003552713678800500929355621337890625
0.20000000000000017763568394002504646778106689453125
0.600000000000000088817841970012523233890533447265625
0.3000000000000000444089209850062616169452667236328125

0.3000000000000000444089209850062616169452667236328125

Result

Calculating

Resources

http://www.exploringbinary.com/binary-converter/

 

http://blog.chewxy.com/2014/02/24/what-every-javascript-developer-should-know-about-floating-point-numbers/

 

https://en.wikipedia.org/wiki/Offset_binary

Made with Slides.com