Measuring Algorithms

Numerical Methods

David Mayerich

Scalable Tissue Imaging and Modeling (STIM) Laboratory

Department of Electrical and Computer Engineering

Cullen College of Engineering

University of Houston

David Mayerich

STIM Laboratory, University of Houston

Approximations for Error

  • Approximate \(e^x\) as a Maclaurin series:

David Mayerich

STIM Laboratory, University of Houston

e^x = 1 + x + \frac{x^2}{2} + \frac{x^3}{3!} + \frac{x^4}{4!} + \epsilon
  • "Big-O" notation specifies how the term behaves as \(x\) changes

    • in this case \(\epsilon\) behaves like \(x^5\) when \(x \rightarrow 0\) or \(x \rightarrow \infty\)

    • the final term is bounded by \(O\left( x^5\right) \leq C |x|^5\) for some \(C\)

    • \(C_1 |x|^n < C_2 |x|^{n+1}\) for large \(x\) for any \(C_1\) and \(C_2\)

    • if \(C_1 \gg C_2\), there is still an \(x\) for which \(C_1 |x|^n < C_2 |x|^{n+1}\)

M(x)\\ \epsilon = O\left(x^5\right)

approximation

error

Estimating Relative Errors

  • Consider an algorithm for the exponential function based on its Maclaurin series:

David Mayerich

STIM Laboratory, University of Houston

e^x = 1 + x + \frac{x^2}{2} + \frac{x^3}{3!} + \frac{x^4}{4!} + \cdots + O\left( x^N \right)
\epsilon = O\left(x^5\right)\\ \epsilon \leq C\left| x \right|^5
float exp(float x, int N) {
	float ex = 0.0;
    float factorial = 1;
    for(int k = 0; k <= N; i++){
    	factorial *= i;
    	ex += pow(x, k) / factorial;
    }
    return ex;
}
e^x = \sum_{k=0}^{N} \frac{x^k}{k!} + O\left( x^N \right)
  • How does the error differ when \(N\) and \(x\) change?

exp(x, 5)   vs   exp(0.5*x, 5)

\epsilon = O\left(\frac{1}{2}x^5\right)\\ \epsilon \leq C\left| \frac{1}{2}x \right|^5\\ \epsilon \leq C\frac{1}{32}\left| x \right|^5

exp(x, 5)   vs   exp(2*x, 3)

  • We can adjust error bounds when comparing algorithms

Estimating Algorithmic Speed

  • The same concept applies to the time to solve a numerical problem

  • How many operations are required for \(c=a+b\) where \(a,b\in\mathbb{R}\)?

David Mayerich

STIM Laboratory, University of Houston

1 addition, or 1 floating point operation (FLOP)

  • What if \(\mathbf{a},\mathbf{b} \in \mathbb{R}^3\)?

\mathbf{a}= \begin{bmatrix} a_1\\ a_2\\ a_3 \end{bmatrix} , \quad\quad \mathbf{b}= \begin{bmatrix} b_1\\ b_2\\ b_3 \end{bmatrix}

3 floating point operations:     \(a_1+b_1\quad a_2 + b_2\quad a_3 + b_3\)

  • What if \(\mathbf{a}, \mathbf{b} \in \mathbb{R}^N\)?

  • Floating point operations take time, and that time varies across processors

  • If we measure the time to calculate \(\mathbf{a} + \mathbf{b}\) for \(\mathbb{R}^3\), we can approximate the time for \(\mathbb{R}^N\)

Running an \(O(n)\) Algorithm

David Mayerich

STIM Laboratory, University of Houston

float* a = new float[N];
float* b = new float[N];
float* c = new float[N];
float* d = new float[N];
float* e = new float[N];

// perform 1 FLOP vector operation
for(int i = 0; i < N; i++) {
	c[i] = a[i] + b[i];
}

Running an \(O(n)\) Algorithm

David Mayerich

STIM Laboratory, University of Houston

float* a = new float[N];
float* b = new float[N];
float* c = new float[N];
float* d = new float[N];
float* e = new float[N];

// perform 1 FLOP vector operation
for(int i = 0; i < N; i++) {
	c[i] = a[i] + b[i];
}

// perform 2 FLOP vector operation
for(int i = 0; i < N; i++) {
	d[i] = (a[i] + b[i]) * c[i];
}

Running an \(O(n)\) Algorithm

David Mayerich

STIM Laboratory, University of Houston

float* a = new float[N];
float* b = new float[N];
float* c = new float[N];
float* d = new float[N];
float* e = new float[N];

// perform 1 FLOP vector operation
for(int i = 0; i < N; i++) {
	c[i] = a[i] + b[i];
}

// perform 2 FLOP vector operation
for(int i = 0; i < N; i++) {
	d[i] = (a[i] + b[i]) * c[i];
}

// perform 3 FLOP vector operation
for(int i = 0; i < N; i++) {
	e[i] = (a[i] + b[i]) * c[i] + d[i];
}

Running an \(O\left( n^2 \right)\) Algorithm

David Mayerich

STIM Laboratory, University of Houston

float* A = new float[M * N];
float* x = new float[N];
float* y = new float[N];
float* z = new float[N]
float alpha, beta;

// load data

// perform 1 FLOP matrix multiplication
for(int m = 0; m < M; m++) {
	float(int n = 0; n < N; n++) {
    	y[m] = A[n * M + m] * x[n];
    }
}

Running an \(O\left( n^2 \right)\) Algorithm

David Mayerich

STIM Laboratory, University of Houston

float* A = new float[M * N];
float* x = new float[N];
float* y = new float[N];
float* z = new float[N]
float alpha, beta;

// load data

// perform 1 FLOP matrix multiplication
for(int m = 0; m < M; m++) {
	float(int n = 0; n < N; n++) {
    	y[m] = A[n * M + m] * x[n];
    }
}

// perform GEMM matrix multiplication
for(int m = 0; m < M; m++) {
	float(int n = 0; n < N; n++) {
    	y[m] = alpha * A[n * M + m] 
        	   * x[n] + beta * y[m];
    }
}

Comparing Algorithmic Complexity

  • An \(O\left(n\right)\) algorithm will beat an \(O\left(n^2\right)\) algorithm for large \(n\)

  • Two types of algorithmic optimization:

    1. Reduce computational complexity (ex. \(O\left(n^2\right) \rightarrow O\left( n \right)\)). This is a mostly theoretical (mathematical) task.

    2. Improve performance to decrease \(C\). This requires optimization within the source code, hardware, or instructions

David Mayerich

STIM Laboratory, University of Houston

Loops and Complexity

David Mayerich

STIM Laboratory, University of Houston

for(int i = 0; i < N; i++) {
  c[i] = a[i] + b[i];
}
\mathbf{c} = \mathbf{a} + \mathbf{b}\\ \mathbf{c}\in\mathbb{R}^n \quad \mathbf{a}\in\mathbb{R}^n \quad \mathbf{b}\in\mathbb{R}^n
\mathbf{y} = \mathbf{A}\mathbf{x}\\ \mathbf{y}\in\mathbb{R}^m \quad \mathbf{A}\in\mathbb{R}^{m\times n}\quad \mathbf{x}\in\mathbb{R}^n
for(int m = 0; i < M; i++){
  for(int n = 0; n < N; n++){
    y[m] += A[n * M + m]
  }
}
for(int k = 0; k < K; k++){	
  for(int m = 0; m < M; m++){
    for(int n = 0; n < N; n++){
      C[k * N + n] += A[n * M + m] 
      				* B[m * K + k]
    }
  }
}
\mathbf{C} = \mathbf{A}\mathbf{B}\\ \mathbf{A}\in\mathbb{R}^{m\times n} \quad \mathbf{B}\in\mathbb{R}^{m\times k} \quad \mathbf{C}\in\mathbb{R}^{n\times k}
O\left(N\right)
O\left(N^3\right)\\ \text{if } N=M=K
O\left(NMK\right)
O\left(N^2\right)\\ \text{ if }N = M
O\left(NM\right)

Complexity of Common Problems

  • \(O\left( 1 \right) \) - constant time

    • single operation on two values of constant bit size

    • accessing a value in memory (RAM), on an SSD, or in a register

    • use of a lookup table

  • \(O\left( \log{n}\right)\) - logarithmic time

    • binary or alphabetic search

    • traversing a tree (binary, quadtree, octree)

David Mayerich

STIM Laboratory, University of Houston

  • \(O\left( n \right)\) - linear time

    • finding an item in an unsorted list

    • multiplying two \(n\)-dimensional vectors

  • \(O\left(n^2\right)\) - quadratic time

    • bubble sort algorithm

    • matrix-vector multiplication

  • \(O\left(n!\right)\) - factorial time

    • evaluate all permutations of a set

    • naive Traveling Salesman Problem

Polynomials

Nested Polynomials

Synthetic Division

Horner's Algorithm

David Mayerich

STIM Laboratory, University of Houston

Polynomial Evaluation

  • Consider an order-\(n\) polynomial:

David Mayerich

STIM Laboratory, University of Houston

p(x) = a_0 + a_1x + a_2x^2 + \cdots + a_{n-1}x^{n-1} + a_nx^n
  • How many floating point operations are needed to get \(p(x)\)?

p_2(x) = a_0 + a_1 \cdot x + a_2 \cdot x \cdot x
p_3(x) = a_0 + a_1 \cdot x + a_2 \cdot x \cdot x + a_3 \cdot x \cdot x \cdot x
p_4(x) = a_0 + a_1 \cdot x + a_2 \cdot x \cdot x + a_3 \cdot x \cdot x \cdot x + a_4 \cdot x \cdot x \cdot x \cdot x
2_+ + 3_\times = 5
3_+ + 6_\times = 9
4_+ + 10_\times = 14
p_n(x)\rightarrow n_+ + \left[ \frac{n(n+1)}{2}\right]_\times = n + \frac{1}{2}n + \frac{1}{2}n^2 = O\left(n^2\right)

Nested Polynomial

Reformulate polynomials to a nested form:

David Mayerich

STIM Laboratory, University of Houston

p_2(x) = a_0 + a_1x + a_2x^2
p_2(x) = a_0 + \left(a_1x + x\left(a_2\right)\right)
p_3(x) = a_0 + a_1x + a_2x^2 + a_3x^3
p_3(x) = a_0 + \left(a_1x + x\left(a_2 + x\left(a_3\right)\right)\right)
p_2(x) = a_0 + \left(a_1x + x\left(a_2\right)\right)
p_3(x) = a_0 + \left(a_1x + x\left(a_2 + x\left(a_3\right)\right)\right)
p_4(x) = a_0 + \left(a_1x + x\left(a_2 + x\left(a_3 + x\left(a_4\right)\right)\right)\right)
2_+ + 2_\times = 4
3_+ + 3_\times = 6
4_+ + 4_\times = 8
p_n(x) \rightarrow n_+ + n_\times = 2n = O\left(n\right)

Polynomial Evaluation Algorithm

David Mayerich

STIM Laboratory, University of Houston

double p(double x, double* a, int n){
	double p_x = a[0];
    for(int i = 1; i <= n; i++){
    	double xn = a[0];
        for(int p = 0; p <= i; p++){
        	xn *= x;
        }
        p_x += xn;
    }
    return p_x;
}
double p(double x, double* a, int n){
	double p_x = a[0];
    for(int i = n; i >= 0; i++){
    	p_x = p_x * x + a[i];
    }
    return p_x;
}

Polynomial Division

  • Consider the following polynomial division:

David Mayerich

STIM Laboratory, University of Houston

\frac{x^4 + 6x^2 - 7x + 8}{x-2}
x^3
-\left(x^4 - 2x^3 \right)
x-2
x^4 + 0x^3 + 6x^2 - 7x + 8
+ 2x^2
-\left(2x^3 - 4x^2\right)
-\left(10x^2 - 20x\right)
+ 10x
-\left(13x -26\right)
+13
\left(x - 2 \right)\left(x^3 + 2x^2 + 10x + 13\right) + 34 = x^4 + 6x^2 - 7x + 8
p(x) = x^4 + 6x^2 - 7x + 8
p(2) = (2)^4 + 6(2)^2 - 7(2) + 8
= 16 + 24 - 14 + 8
= 34

What happens when \(x=2\)?

2x^3 + 6x^2
10x^2 - 7x
13x + 8
34

The remainder term is the solution of the polynomial at the root of the divisor

Nested Polynomials and Division

David Mayerich

STIM Laboratory, University of Houston

x^4 + 0x^3 + 6x^2 - 7x + 8
(8 + x(-7 + x(6 + x(0+x(1)))))
p(2) = (8 + 2(-7 + 2(6 + 2(0+2(1)))))
(8 + 2(-7 + 2(6 + 2(2))))
(8 + 2(-7 + 2(10)))
(8 + 2(13))
(34)

The algorithm for solving \(p(c)\) is identical to calculating the remainder of \(\frac{p(c)}{x-c}\)

x^3
x-2
x^4 + 0x^3 + 6x^2 - 7x + 8
2x^3 + 6x^2
+ 2x^2
10x^2 - 7x
+ 10x
13x + 8
+13
34
-\left(x^4 - 2x^3 \right)
-\left(2x^3 - 4x^2\right)
-\left(10x^2 - 20x\right)
-\left(13x -26\right)

Synthetic Division

  • Long division for polynomials is often simplified using synthetic division:

David Mayerich

STIM Laboratory, University of Houston

2
1
0
6
-7
8
1
2
2
4
10
20
13
26
34

x \(c\)

x^3 + 2x^2 + 10x + 13 + \frac{34}{x-2}
x^3
x-2
x^4 + 0x^3 + 6x^2 - 7x + 8
+ 2x^2
+ 10x
+13
+\frac{34}{x-2}

\(c\)

+

Horner's Algorithm

David Mayerich

STIM Laboratory, University of Houston

double horner(double* C, int N, double x){
	if(N == 0) return 0.0;			// output zero if there are no coefficients
    px = C[0];						// initialize the first coefficient
    
    for(int i = 1; i < N-1; i++){	// for each coefficient
    	px = px * x + C[i];			// evaluate the innermost nested term
    }
    return px;
}

C is an array of length N holding the coefficients of \(p(x)\)

N is the number of coefficients

x is the point for which \(p(x)\) is to be evaluated

Calculating Derivatives

David Mayerich

STIM Laboratory, University of Houston

\frac{x^4 + 6x^2 - 7x + 8}{x-2} = x^3 + 2x^2 + 10x + 13 + \frac{34}{x-2}
p_n(x) = a_n x^n + a_{n-1}x^{n-1} + \cdots + a_1 x^1 + a_0
\frac{p_n(x)}{x-c} = r_{n-1}(x) + \frac{p(c)}{x-c}
p_n(x) = (x-c)r_{n-1}(x) + p(c)
p'_n(x) = (x-c)r'_{n-1}(x) + r_{n-1}(c)
  • Consider an \(n\)-order polynomial \(p_n(x)\):
     
  • Division by \((x-c)\) produces a new polynomial \(r_{n-1}(x)\) with a remainder term:

     

  • Reformulate and calculate the derivative \(p'(x)\):

     

  • We know the term \(r_{n-1}(x)\) but do not have \(r'_{n-1}(x)\), but setting \(x=c\) yields:

p'_n(c) = r_{n-1}(c)

Discussion: Polynomials and Derivatives

David Mayerich

STIM Laboratory, University of Houston

double horner(double* C, int N, double x, double* R){






}

Discussion: Polynomials and Derivatives

David Mayerich

STIM Laboratory, University of Houston

double horner(double* C, int N, double x, double* R){
	if(N == 0) return 0.0;			// output zero if there are no coefficients
    R[0] = R[0];					// initialize the first coefficient
    



}

Discussion: Polynomials and Derivatives

David Mayerich

STIM Laboratory, University of Houston

void horner(double* C, int N, double x, double* R){
	if(N == 0) return 0.0;			// output zero if there are no coefficients
    R[0] = R[0];					// initialize the first coefficient
    
    for(int i = 1; i < N-1; i++){	// for each coefficient
    	R[i] = R[i-1] * x + C[i];	// evaluate the innermost nested term
    }
}

Discussion: Polynomials and Derivatives

David Mayerich

STIM Laboratory, University of Houston

double horner(double* C, int N, double x, double* R){
	if(N == 0) return 0.0;			// output zero if there are no coefficients
    R[0] = R[0];					// initialize the first coefficient
    
    for(int i = 1; i < N-1; i++){	// for each coefficient
    	R[i] = R[i-1] * x + C[i];	// evaluate the innermost nested term
    }
}
int N;									// order of the polynomial
double* P = new double[N + 1];			// array to store coefficients of p(x)
double* R = new double[N + 1];			// array to store coefficients of r(x)
double c;								// c value for the polynomial
// ...




Discussion: Polynomials and Derivatives

David Mayerich

STIM Laboratory, University of Houston

double horner(double* C, int N, double x, double* R){
	if(N == 0) return 0.0;			// output zero if there are no coefficients
    R[0] = R[0];					// initialize the first coefficient
    
    for(int i = 1; i < N-1; i++){	// for each coefficient
    	R[i] = R[i-1] * x + C[i];	// evaluate the innermost nested term
    }
}
int N;									// order of the polynomial
double* P = new double[N + 1];			// array to store coefficients of p(x)
double* R = new double[N + 1];			// array to store coefficients of r(x)
double c;								// c value for the polynomial
// ...
horner(P, N, c, R);						// calculate R
double p_c = R[N - 1];					// p(c) (and also remainder term)


Discussion: Polynomials and Derivatives

David Mayerich

STIM Laboratory, University of Houston

double horner(double* C, int N, double x, double* R){
	if(N == 0) return 0.0;			// output zero if there are no coefficients
    R[0] = R[0];					// initialize the first coefficient
    
    for(int i = 1; i < N-1; i++){	// for each coefficient
    	R[i] = R[i-1] * x + C[i];	// evaluate the innermost nested term
    }
}
int N;									// order of the polynomial
double* P = new double[N + 1];			// array to store coefficients of p(x)
double* R = new double[N + 1];			// array to store coefficients of r(x)
double c;								// c value for the polynomial
// ...
horner(P, N, c, R);						// calculate R
double p_c = R[N - 1];					// p(c) (and also remainder term)

horner(R, N, c, R);						// calculate r(c), overwriting R
double pp_c = R[N-1];					// p'(c)

B.1 Measuring Algorithms

By STIM Laboratory

B.1 Measuring Algorithms

  • 219