Scalable Tissue Imaging and Modeling (STIM) Laboratory
Department of Electrical and Computer Engineering
Cullen College of Engineering
University of Houston
David Mayerich
STIM Laboratory, University of Houston
Approximate \(e^x\) as a Maclaurin series:
David Mayerich
STIM Laboratory, University of Houston
"Big-O" notation specifies how the term behaves as \(x\) changes
in this case \(\epsilon\) behaves like \(x^5\) when \(x \rightarrow 0\) or \(x \rightarrow \infty\)
the final term is bounded by \(O\left( x^5\right) \leq C |x|^5\) for some \(C\)
\(C_1 |x|^n < C_2 |x|^{n+1}\) for large \(x\) for any \(C_1\) and \(C_2\)
if \(C_1 \gg C_2\), there is still an \(x\) for which \(C_1 |x|^n < C_2 |x|^{n+1}\)
approximation
error
Consider an algorithm for the exponential function based on its Maclaurin series:
David Mayerich
STIM Laboratory, University of Houston
float exp(float x, int N) {
float ex = 0.0;
float factorial = 1;
for(int k = 0; k <= N; i++){
factorial *= i;
ex += pow(x, k) / factorial;
}
return ex;
}How does the error differ when \(N\) and \(x\) change?
exp(x, 5) vs exp(0.5*x, 5)
exp(x, 5) vs exp(2*x, 3)
We can adjust error bounds when comparing algorithms
The same concept applies to the time to solve a numerical problem
How many operations are required for \(c=a+b\) where \(a,b\in\mathbb{R}\)?
David Mayerich
STIM Laboratory, University of Houston
1 addition, or 1 floating point operation (FLOP)
What if \(\mathbf{a},\mathbf{b} \in \mathbb{R}^3\)?
3 floating point operations: \(a_1+b_1\quad a_2 + b_2\quad a_3 + b_3\)
What if \(\mathbf{a}, \mathbf{b} \in \mathbb{R}^N\)?
Floating point operations take time, and that time varies across processors
If we measure the time to calculate \(\mathbf{a} + \mathbf{b}\) for \(\mathbb{R}^3\), we can approximate the time for \(\mathbb{R}^N\)
David Mayerich
STIM Laboratory, University of Houston
float* a = new float[N];
float* b = new float[N];
float* c = new float[N];
float* d = new float[N];
float* e = new float[N];
// perform 1 FLOP vector operation
for(int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}David Mayerich
STIM Laboratory, University of Houston
float* a = new float[N];
float* b = new float[N];
float* c = new float[N];
float* d = new float[N];
float* e = new float[N];
// perform 1 FLOP vector operation
for(int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
// perform 2 FLOP vector operation
for(int i = 0; i < N; i++) {
d[i] = (a[i] + b[i]) * c[i];
}David Mayerich
STIM Laboratory, University of Houston
float* a = new float[N];
float* b = new float[N];
float* c = new float[N];
float* d = new float[N];
float* e = new float[N];
// perform 1 FLOP vector operation
for(int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
// perform 2 FLOP vector operation
for(int i = 0; i < N; i++) {
d[i] = (a[i] + b[i]) * c[i];
}
// perform 3 FLOP vector operation
for(int i = 0; i < N; i++) {
e[i] = (a[i] + b[i]) * c[i] + d[i];
}David Mayerich
STIM Laboratory, University of Houston
float* A = new float[M * N];
float* x = new float[N];
float* y = new float[N];
float* z = new float[N]
float alpha, beta;
// load data
// perform 1 FLOP matrix multiplication
for(int m = 0; m < M; m++) {
float(int n = 0; n < N; n++) {
y[m] = A[n * M + m] * x[n];
}
}David Mayerich
STIM Laboratory, University of Houston
float* A = new float[M * N];
float* x = new float[N];
float* y = new float[N];
float* z = new float[N]
float alpha, beta;
// load data
// perform 1 FLOP matrix multiplication
for(int m = 0; m < M; m++) {
float(int n = 0; n < N; n++) {
y[m] = A[n * M + m] * x[n];
}
}
// perform GEMM matrix multiplication
for(int m = 0; m < M; m++) {
float(int n = 0; n < N; n++) {
y[m] = alpha * A[n * M + m]
* x[n] + beta * y[m];
}
}An \(O\left(n\right)\) algorithm will beat an \(O\left(n^2\right)\) algorithm for large \(n\)
Two types of algorithmic optimization:
Reduce computational complexity (ex. \(O\left(n^2\right) \rightarrow O\left( n \right)\)). This is a mostly theoretical (mathematical) task.
Improve performance to decrease \(C\). This requires optimization within the source code, hardware, or instructions
David Mayerich
STIM Laboratory, University of Houston
David Mayerich
STIM Laboratory, University of Houston
for(int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}for(int m = 0; i < M; i++){
for(int n = 0; n < N; n++){
y[m] += A[n * M + m]
}
}for(int k = 0; k < K; k++){
for(int m = 0; m < M; m++){
for(int n = 0; n < N; n++){
C[k * N + n] += A[n * M + m]
* B[m * K + k]
}
}
}\(O\left( 1 \right) \) - constant time
single operation on two values of constant bit size
accessing a value in memory (RAM), on an SSD, or in a register
use of a lookup table
\(O\left( \log{n}\right)\) - logarithmic time
binary or alphabetic search
traversing a tree (binary, quadtree, octree)
David Mayerich
STIM Laboratory, University of Houston
\(O\left( n \right)\) - linear time
finding an item in an unsorted list
multiplying two \(n\)-dimensional vectors
\(O\left(n^2\right)\) - quadratic time
bubble sort algorithm
matrix-vector multiplication
\(O\left(n!\right)\) - factorial time
evaluate all permutations of a set
naive Traveling Salesman Problem
Nested Polynomials
Synthetic Division
Horner's Algorithm
David Mayerich
STIM Laboratory, University of Houston
Consider an order-\(n\) polynomial:
David Mayerich
STIM Laboratory, University of Houston
How many floating point operations are needed to get \(p(x)\)?
Reformulate polynomials to a nested form:
David Mayerich
STIM Laboratory, University of Houston
David Mayerich
STIM Laboratory, University of Houston
double p(double x, double* a, int n){
double p_x = a[0];
for(int i = 1; i <= n; i++){
double xn = a[0];
for(int p = 0; p <= i; p++){
xn *= x;
}
p_x += xn;
}
return p_x;
}double p(double x, double* a, int n){
double p_x = a[0];
for(int i = n; i >= 0; i++){
p_x = p_x * x + a[i];
}
return p_x;
}Consider the following polynomial division:
David Mayerich
STIM Laboratory, University of Houston
What happens when \(x=2\)?
The remainder term is the solution of the polynomial at the root of the divisor
David Mayerich
STIM Laboratory, University of Houston
The algorithm for solving \(p(c)\) is identical to calculating the remainder of \(\frac{p(c)}{x-c}\)
Long division for polynomials is often simplified using synthetic division:
David Mayerich
STIM Laboratory, University of Houston
x \(c\)
\(c\)
+
David Mayerich
STIM Laboratory, University of Houston
double horner(double* C, int N, double x){
if(N == 0) return 0.0; // output zero if there are no coefficients
px = C[0]; // initialize the first coefficient
for(int i = 1; i < N-1; i++){ // for each coefficient
px = px * x + C[i]; // evaluate the innermost nested term
}
return px;
}C is an array of length N holding the coefficients of \(p(x)\)
N is the number of coefficients
x is the point for which \(p(x)\) is to be evaluated
David Mayerich
STIM Laboratory, University of Houston
Division by \((x-c)\) produces a new polynomial \(r_{n-1}(x)\) with a remainder term:
Reformulate and calculate the derivative \(p'(x)\):
We know the term \(r_{n-1}(x)\) but do not have \(r'_{n-1}(x)\), but setting \(x=c\) yields:
David Mayerich
STIM Laboratory, University of Houston
double horner(double* C, int N, double x, double* R){
}David Mayerich
STIM Laboratory, University of Houston
double horner(double* C, int N, double x, double* R){
if(N == 0) return 0.0; // output zero if there are no coefficients
R[0] = R[0]; // initialize the first coefficient
}David Mayerich
STIM Laboratory, University of Houston
void horner(double* C, int N, double x, double* R){
if(N == 0) return 0.0; // output zero if there are no coefficients
R[0] = R[0]; // initialize the first coefficient
for(int i = 1; i < N-1; i++){ // for each coefficient
R[i] = R[i-1] * x + C[i]; // evaluate the innermost nested term
}
}David Mayerich
STIM Laboratory, University of Houston
double horner(double* C, int N, double x, double* R){
if(N == 0) return 0.0; // output zero if there are no coefficients
R[0] = R[0]; // initialize the first coefficient
for(int i = 1; i < N-1; i++){ // for each coefficient
R[i] = R[i-1] * x + C[i]; // evaluate the innermost nested term
}
}int N; // order of the polynomial
double* P = new double[N + 1]; // array to store coefficients of p(x)
double* R = new double[N + 1]; // array to store coefficients of r(x)
double c; // c value for the polynomial
// ...
David Mayerich
STIM Laboratory, University of Houston
double horner(double* C, int N, double x, double* R){
if(N == 0) return 0.0; // output zero if there are no coefficients
R[0] = R[0]; // initialize the first coefficient
for(int i = 1; i < N-1; i++){ // for each coefficient
R[i] = R[i-1] * x + C[i]; // evaluate the innermost nested term
}
}int N; // order of the polynomial
double* P = new double[N + 1]; // array to store coefficients of p(x)
double* R = new double[N + 1]; // array to store coefficients of r(x)
double c; // c value for the polynomial
// ...
horner(P, N, c, R); // calculate R
double p_c = R[N - 1]; // p(c) (and also remainder term)
David Mayerich
STIM Laboratory, University of Houston
double horner(double* C, int N, double x, double* R){
if(N == 0) return 0.0; // output zero if there are no coefficients
R[0] = R[0]; // initialize the first coefficient
for(int i = 1; i < N-1; i++){ // for each coefficient
R[i] = R[i-1] * x + C[i]; // evaluate the innermost nested term
}
}int N; // order of the polynomial
double* P = new double[N + 1]; // array to store coefficients of p(x)
double* R = new double[N + 1]; // array to store coefficients of r(x)
double c; // c value for the polynomial
// ...
horner(P, N, c, R); // calculate R
double p_c = R[N - 1]; // p(c) (and also remainder term)
horner(R, N, c, R); // calculate r(c), overwriting R
double pp_c = R[N-1]; // p'(c)