Catastrophic Cancellation

Numerical Methods

David Mayerich

Scalable Tissue Imaging and Modeling (STIM) Laboratory

Department of Electrical and Computer Engineering

Cullen College of Engineering

University of Houston

David Mayerich

STIM Laboratory, University of Houston

Loss of Precision

Losing Digits to Cancellation

Loss of Precision Theorem

David Mayerich

STIM Laboratory, University of Houston

Catastrophic Cancellation

Consider a floating point representation with a 5-digit mantissa:

David Mayerich

STIM Laboratory, University of Houston

n\ .\ n\ n\ n\ n\ \times 10^c

Apply a simple addition and subtraction:

3.2342 \times 10^3 + 2.51 \times 10^{-1} - 3.2343 \times 10^3

3234.451

0.151

0.2

3234.2

0.251

3234.451

3234.3

3234.5

3234.3

round

chop

Catastrophic Cancellation

This subtraction effectively reduced our numerical precision:

David Mayerich

STIM Laboratory, University of Houston

x=0.151 \quad \rightarrow \quad x=0.2

Each operation maintains the appropriate precision
However, we ended up with far less precision than we expect:
- The result \(x=0.2\) has one digit of precision
- The correct value \(x=0.151\) requires two digits of precision
- We have \(5\) digits of precision, expect \(3\), but only get \(2\)
This is known as catastrophic cancellation

creating a relative error of

\epsilon_r = \frac{|0.2-0.151|}{|0.151|}\approx 32.5\%

Loss of Precision Theorem

David Mayerich

STIM Laboratory, University of Houston

Let \(x\) and \(y\) be normalized floating-point numbers such that \(x > y > 0\). For a base \(b\) value, we can find two adjacent natural numbers \(p,q \in \mathbb{N}\) such that:

b^{-p} \leq 1 - \frac{y}{x} \leq b^{-q} \quad \text{where} \quad q+1 = p

At most \(p\) and at least \(q\) significant digits are lost in the subtraction \(x-y\).

Calculate the number of digits lost when calculating \(x-y\) for:

x=37.593621 \quad \text{and} \quad y=37.584216

Lost Digits

David Mayerich

STIM Laboratory, University of Houston

You will lose between \(\left\lfloor \frac{-log\left(1 - \frac{y}{x} \right)}{\log b} \right\rfloor\) and \(\left\lceil \frac{-log\left(1 - \frac{y}{x} \right)}{\log b} \right\rceil\)

b^{-p} \leq 1 - \frac{y}{x} \leq b^{-q}

Loss of Precision Theorem

\log \left(b^{-p}\right) \leq \log \left(1 - \frac{y}{x}\right) \leq \log \left(b^{-q}\right)

Calculate the logarithm of all terms

-p \log \left(b\right) \leq \log \left(1 - \frac{y}{x}\right) \leq -q\log \left(b\right)

Separate exponents

p \log \left(b\right) \geq -\log \left(1 - \frac{y}{x}\right) \geq q\log \left(b\right)

Multiply by \(-1\) (and reverse inequalities)

p \leq \frac{-\log \left(1 - \frac{y}{x}\right)}{\log \left(b\right)} \leq q

Divide by \(\log(b)\)

Lost Digits

David Mayerich

STIM Laboratory, University of Houston

p \leq \frac{-\log \left(1 - \frac{y}{x}\right)}{\log \left(10\right)} \leq q

p \leq \frac{-\log \left(1 - \frac{37.584216}{37.593621}\right)}{\log \left(10\right)} \leq q

p \leq \frac{8.2933482}{\log \left(10\right)} \leq q

p \leq 3.6017554 \leq q

p = \left\lfloor 3.6017554 \right\rfloor = 3

q = \left\lceil 3.6017554 \right\rceil = 4

0\ 0\ .\ 0\ 0\ 9\ 4\ 0\ 5

Calculate the number of digits lost when calculating \(x-y\) for:

x=37.593621 \quad \text{and} \quad y=37.584216

3\ 7\ .\ 5\ 9\ 3\ 6\ 2\ 1

-\ 3\ 7\ .\ 5\ 8\ 4\ 2\ 1\ 6

Lost Bits

David Mayerich

STIM Laboratory, University of Houston

p = \left\lfloor 11.964772 \right\rfloor = 11

q = \left\lceil 11.964772 \right\rceil = 12

000000.00000010011010000110

x = 100101.10010111111101111001\\ y=100101.10010101100011110011

Calculate the number of bits lost when calculating \(x-y\) for:

x=37.593621 \quad \text{and} \quad y=37.584216

p \leq 11.964772 \leq q

p \leq \frac{-\log \left(1 - \frac{y}{x}\right)}{\log \left(2\right)} \leq q

p \leq \frac{-\log \left(1 - \frac{37.584216}{37.593621}\right)}{\log \left(2\right)} \leq q

p \leq \frac{8.2933482}{\log \left(2\right)} \leq q

Discussion: Precision After Subtraction

How many bits of precision are lost when calculating \(\alpha - \sin\alpha\) if \(\alpha = \frac{1}{64}\)?

David Mayerich

STIM Laboratory, University of Houston

p = \left\lfloor 14.58 \right\rfloor = 14

q = \left\lceil 14.58 \right\rceil = 15

p \leq \frac{-\log \left(1 - \frac{\sin\left(\frac{1}{64}\right)}{\frac{1}{64}}\right)}{\log \left(2\right)} \leq q

p \leq 14.58 \leq q

What precision can we expect if this calculation is performed on float32 registers?

23-15=8

standard	C/C++	m bits	x bits	bias
binary16	`single`	10	5	15
binary32	`float`	23	8	127
binary64	`double`	53	11	1023
binary128	N/A	113	15	16383
binary256	N/A	19	237	262143

IEEE 754 Standard

Recovering Precision

Reformulation

Verifying Reformulations

The Quadratic Equation

David Mayerich

STIM Laboratory, University of Houston

Summary of Arithmetic Operations

Given \(d\) digits of precision:
- multiplication: preserves \(d\) digits because the exponent and mantissa can be calculated independently
- addition: exponents must be equal, so significant digits may be chopped
Evaluate \(x=1 - \sin 1.6\) using three digits of precision:

David Mayerich

STIM Laboratory, University of Houston

y = 1.00- \sin 1.61

= 0.00100

= 0.0007684

\epsilon_r=\frac{|0.000232|}{|0.000768|} = 0.30 \quad (30\%)

expected result (with 3 digits)

y = 1.00 - 0.9992316

y = 1.00 - 0.999

y = 1 - 0.9992316

\epsilon_a=|0.00100 - .000768| = .000232

Reformulation

Precision loss appears after subtraction (but the loss of data occurs earlier)
Reformulate expressions to remove subtractions

David Mayerich

STIM Laboratory, University of Houston

y = 1 - \sin x

y = \left( 1 - \sin x \right) \left( \frac{1+ \sin x}{1+ \sin x} \right)

y = \frac{1 - \sin^2 x}{1 + \sin x}

y = \frac{\cos^2 1.61}{1 + \sin 1.61}

y = \frac{\left(-0.0391936 \right)^2}{1.00 + 0.9992316}

y = \frac{0.00153664}{1.999}

y = .00077

\epsilon_r=\frac{|.000768-.00077|}{|0.000768|} = 0.0026 \quad (0.26\%)

\sin^2 x + \cos^2 x = 1

y = \frac{\cos^2 x}{1 + \sin x}

y = \frac{\left(-0.0392 \right)^2}{1.00 + 0.999}

y = \frac{0.00154}{2.00}

Reformulation (Part Deux)

When can catastrophic cancellation occur in the expression \(\sqrt{x+1}-\sqrt{x}\)?
When \(x\) is large -- fix it for \(x=10\):

David Mayerich

STIM Laboratory, University of Houston

y = \left(\sqrt{x+1} - \sqrt{x}\right) \left( \frac{\sqrt{x+1} + \sqrt{x}}{\sqrt{x+1} + \sqrt{x}} \right)

y = \left( \frac{x+1-x}{\sqrt{x+1} + \sqrt{x}} \right)

y = \frac{1}{\sqrt{x+1} + \sqrt{x}}

Original Expression:

y = \sqrt{11} - \sqrt{10}

y = 3.31662 - 3.16228

y = 3.32 - 3.16 = 0.160

Reformulation:

y = \frac{1.00}{3.32 + 3.16}

y = \frac{1.00}{6.48} = 1.54321

Expected Result:

y = 0.154347 = 0.154

\epsilon_r = \frac{|0.154 - 0.160}{|0.154|} \approx 3.9\%

\epsilon_r = 0\%

Quadratic Equation

Loss of Precision

Reformulation

Implementation

David Mayerich

STIM Laboratory, University of Houston

Quadratic Equation

Find the roots of a polynomial \(ax^2+bx+c=0\)
Quadratic Equation:

David Mayerich

STIM Laboratory, University of Houston

x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}

Two cases cause computational problems:
1. If \(a=0\), then the equation is undefined (the polynomial isn't quadratic)
2. If \(4ac \ll b^2\) then catastrophic cancellation can occur in the numerator

Two cases for cancellation:

if \(b>0\) then \(-b+\sqrt{b^2-4ac}\) can result in cancellation
if \(b<0\), then \(b-\sqrt{b^2-4ac}\) can result in cancellation

Testing the Quadratic Equation

Consider the roots of the polynomial \(x^2 + 182^3x-1\) (using actual floating point):

David Mayerich

STIM Laboratory, University of Houston

x_1 = \frac{-b - \sqrt{b^2-4ac}}{2a}

x_2 = \frac{-b + \sqrt{b^2-4ac}}{2a}

x_1 = \frac{-182^3 - \sqrt{182^6-4}}{2}

x_2 = \frac{-182^3 + \sqrt{182^6-4}}{2}

Performing this calculation using IEEE 754 floating point, we get:

\epsilon_r \approx 9.5\%

binary64

x_1 = -6.0286\times 10^8

x_1 = -1.657865 \times 10^{-7}

binary32

x_1 = -6.0286\times 10^8

x_1 = -1.5 \times 10^{-7}

Reformulating the Quadratic Equation

When \(b \gg 4ac\) there is one very small root
Assume that \(b>0\) and reformulate the "small" root:

David Mayerich

STIM Laboratory, University of Houston

x_2 = \frac{-b + \sqrt{b^2-4ac}}{2a}

x_2 = \frac{-b + \sqrt{b^2-4ac}}{2a} \cdot \frac{-b - \sqrt{b^2-4ac}} {-b - \sqrt{b^2-4ac}}

x_2 = \frac{b^2 - (b^2-4ac)} {2a \left( -b-\sqrt{b^2-4ac} \right)}

x_2 = \frac{-2c} { b+\sqrt{b^2-4ac}}

Simplifying the Reformulation

For the case where \(b > 0\) we have two roots:

David Mayerich

STIM Laboratory, University of Houston

-2ax_1 = b + \sqrt{b^2-4ac}

x_1 = \frac{-b - \sqrt{b^2-4ac}}{2a}

(large root)

x_2 = \frac{-2c} { b+\sqrt{b^2-4ac}}

(small root)

x_2 = \frac{-2c} { -2ax_1}

x_1 = \frac{-b - \sqrt{b^2-4ac}}{2a}

x_2 = \frac{c} {ax_1}

Implementing the Quadratic Equation

Consider the reformulation of both cases for \(b\):

David Mayerich

STIM Laboratory, University of Houston

x_1 = \frac{-b - \sqrt{b^2-4ac}}{2a}

x_2 = \frac{c} {ax_1}

b>0

b\leq0

x_1 = \frac{-b + \sqrt{b^2-4ac}}{2a}

x_2 = \frac{c} {ax_1}

q= -\frac{1}{2}\left[b+\text{sgn}(b)\sqrt{b^2 - 4ac} \right]

\text{where}\quad \text{sgn}(x)= \begin{cases} -1 & \text{if}\ x<0\\ 1 & \text{if}\ x \geq 0 \end{cases}

x_1 = \frac{q}{a}

x_2 = \frac{c}{q}

Series Approximations

Preserving Precision

Mean Value Theorem

David Mayerich

STIM Laboratory, University of Houston

Approximating Special Functions

Identify and fix the catastrophic cancellation in the following expression at \(x=0.123\):

David Mayerich

STIM Laboratory, University of Houston

y = \text{sinc}(x) - \cos(x)

Calculate expansions at \(c=0\):

\sin(x) \approx x - \frac{x^3}{6}

\text{where}\ \ \text{sinc}(x) =\frac{\sin(x)}{x}

\cos(x) \approx 1 - \frac{x^2}{2}

y=\frac{x - \frac{x^3}{6}}{x} - 1 + \frac{x^2}{2}

y=1 - \frac{x^2}{6} - 1 + \frac{x^2}{2}

y=-\frac{x^2}{6} + \frac{3x^2}{6}

y=\frac{x^2}{3}

Reformulate:

Testing the Reformulation

Evaluate \(y=\text{sinc}(x) - \cos(x)\) at \(x=0.123\):

David Mayerich

STIM Laboratory, University of Houston

Original Expression:

Reformulation:

Expected Result:

y = 0.00503537 = 0.00504

y = \frac{0.12269}{0.123} - 0.992445

y = \frac{0.123}{0.123} - 0.992

y = 1.00 - 0.992 = 0.008

\frac{\sin(0.123)}{0.123} - \cos(0.123)

y=\frac{(0.123)^2}{3}

y=\frac{0.015129}{3}

= \frac{0.0151}{3}

y =.00503333

=.00503

\epsilon_r = \frac{|0.008-0.00504|}{|0.00504|} \approx 58.7\%

\epsilon_r = \frac{|0.00503-0.00504|}{|0.00504|} \approx 0.2\%

Improve Precision Loss (Part II)

Mitigate precision loss in the expression \(y=e^x - e^{-2x}\)

David Mayerich

STIM Laboratory, University of Houston

e^x = 1 + x + \frac{x^2}{2} + \frac{x^3}{6}\cdots

y=\left(1 + x + \frac{x^2}{2} + \frac{x^3}{6}\cdots \right)- \left( 1 -2x + \frac{4x^2}{2} - \frac{8x^3}{6}\cdots \right)

y=3x - \frac{3x^2}{2} + \frac{9x^3}{6} \cdots

Mean Value Theorem

If \(f\) is a continuous function on the closed interval \([a,b]\) and possesses a derivative at each point of the open interval \((a,b)\), then

David Mayerich

STIM Laboratory, University of Houston

f(b)-f(a) = (b-a)f'(\theta)

for some \(\theta\) in \((a,b)\). A basic approximation is \(\theta=\frac{a+b}{2}\).

Mitigate catastrophic cancellation in the expression \(y=e^x - e^{x-\epsilon}\):

f(z)= e^z

b=x

a=x-\epsilon

f'(z)=e^z

\theta=\frac{2x-\epsilon}{2}

(b-a)f'(\theta)

(x-x+\epsilon)e^{\frac{2x-\epsilon}{2}}

\epsilon e^{\frac{2x-\epsilon}{2}}

Test precision for \(x=2\) and \(\epsilon=0.01\):

e^2-e^{1.99}

7.39-7.32=0.07

\epsilon e^{\frac{2x-\epsilon}{2}}

0.01 e^{\frac{3.99}{2}}

0.01 e^{2} = 0.0739

Original Expression:

Reformulation:

Expected Results:

y=0.0735

Mean Value Theorem Examples

Reformulate \(y=\ln(x+\epsilon)-\ln x\)

David Mayerich

STIM Laboratory, University of Houston

Reformulate \(y=e^\epsilon - 1\)

f(b)-f(a) = (b-a)f'(\theta)

f(z)=\ln z

a=x

b=x+\epsilon

f'(z)=\frac{1}{z}

\theta = \frac{2x+\epsilon}{2}

(b-a)f'(\theta)

(x+\epsilon-x)\frac{1}{\frac{2x+\epsilon}{2}}

y=\frac{2\epsilon}{2x+\epsilon}

f(z)=e^z

a=0

b=\epsilon

f'(z)=e^z

\theta = \frac{\epsilon}{2}

(\epsilon-0)e^{\frac{\epsilon}{2}}

\epsilon e^{\frac{\epsilon}{2}}