COMP3010: Algorithm Theory and Design

Daniel Sutantyo, Department of Computing, Macquarie University

9.1 - Probabilistic Analysis

Probabilistic Analysis

9.1 - Probabilistic Analysis

  • What is it?
    • the use of probability theory to analyse the running time of an algorithm
    • our main example for the discussion
int max = 0;

for (int i = 0; i < n; i++){
    if (arr[i] > max){
        max = arr[i];
    }
}

9.1 - Probabilistic Analysis

int max = 0;

for (int i = 0; i < n; i++){
    if (arr[i] > max){
        max = arr[i];
    }
}
let M = element 1

for i = 2 to n:
   compare element i with M
   if M is less than element i:
      assign i to M
  • Let us generalise the code above: we see that there are two main operations: comparison and assignment

Probabilistic Analysis

9.1 - Probabilistic Analysis

let M = element 1

for i = 2 to n:
   compare element i with M         // cost is c_c
   if M is less than element i:
      assign i to M                 // cost is c_a
  • Let the comparison operation costs \(c_c\) and assignment operation cost \(c_a\)

Probabilistic Analysis

9.1 - Probabilistic Analysis

let M = element 1

for i = 2 to n:
   compare element i with M         // cost is c_c
   if M is less than element i:
      assign i to M                 // cost is c_a
  • Let the comparison operation costs \(c_c\) and assignment operation cost \(c_a\)
  • What is the complexity of the above algorithm?
    • we have to compare all \(n\) elements for a total of \((c_c*n)\)
    • do we have to do \(n\) assignments for a total cost of \((c_a*n)\)? 
      • we may have to do \(n\) assignments, but sometimes we may only need to do 1 assignment operation

Probabilistic Analysis

Probabilistic Analysis

9.1 - Probabilistic Analysis

let M = element 1

for i = 2 to n:
   compare element i with M         // cost is c_c
   if M is less than element i:
      assign i to M                 // cost is c_a
  • In the worst-case scenario, we would have to do one assignment after each comparison, so the number of operations is \(n(c_c + c_a)\)
    • but it is reasonable to expect that we don't need to do \(n\) assignments on an average input
    • let's ignore the comparison cost \(c_c\) since we have to do it anyway
    • what do you think is the average number of assignment?
      • most common guess is 2 (well, actually this should be 2.5)

Probabilistic Analysis

9.1 - Probabilistic Analysis

  • In order to perform a probabilistic analysis, we need to know the distribution of the input, or at least make some assumptions about it
    • to do this properly, we need to understand the different types of probability distributions, which is beyond the scope of this unit
  • For this example, let us assume a uniform random permutation, that is, every instance of input is equally likely
    • How many different inputs are possible?
      • there are \(n!\) possible permutations, and each are equally likely
    • Do you think half of these permutations require 2 assignment operations?

Sample Space

9.1 - Probabilistic Analysis

  • Suppose that there are only 4 elements, meaning there is a total of 4! = 24 combinations

1, 2, 3, 4

1, 2, 4, 3

1, 3, 2, 4

1, 3, 4, 2

1, 4, 2, 3

1, 4, 3, 2

2, 1, 3, 4

2, 1, 4, 3

2, 3, 1, 4

2, 3, 4, 1

2, 4, 1, 3

2, 4, 3, 1

3, 1, 2, 4

3, 1, 4, 2

3, 2, 1, 4

3, 2, 4, 1

3, 4, 1, 2

3, 4, 2, 1

4, 1, 2, 3

4, 1, 3, 2

4, 2, 1, 3

4, 2, 3, 1

4, 3, 1, 2

4, 3, 2, 1

Sample Space

9.1 - Probabilistic Analysis

  • How many of these permutations require us to do 4 assignment operations?

1, 2, 3, 4

1, 2, 4, 3

1, 3, 2, 4

1, 3, 4, 2

1, 4, 2, 3

1, 4, 3, 2

2, 1, 3, 4

2, 1, 4, 3

2, 3, 1, 4

2, 3, 4, 1

2, 4, 1, 3

2, 4, 3, 1

3, 1, 2, 4

3, 1, 4, 2

3, 2, 1, 4

3, 2, 4, 1

3, 4, 1, 2

3, 4, 2, 1

4, 1, 2, 3

4, 1, 3, 2

4, 2, 1, 3

4, 2, 3, 1

4, 3, 1, 2

4, 3, 2, 1

Sample Space

9.1 - Probabilistic Analysis

  • How many of these permutations require us to do 4 assignment operations?

1, 2, 3, 4

1, 2, 4, 3

1, 3, 2, 4

1, 3, 4, 2

1, 4, 2, 3

1, 4, 3, 2

2, 1, 3, 4

2, 1, 4, 3

2, 3, 1, 4

2, 3, 4, 1

2, 4, 1, 3

2, 4, 3, 1

3, 1, 2, 4

3, 1, 4, 2

3, 2, 1, 4

3, 2, 4, 1

3, 4, 1, 2

3, 4, 2, 1

4, 1, 2, 3

4, 1, 3, 2

4, 2, 1, 3

4, 2, 3, 1

4, 3, 1, 2

4, 3, 2, 1

Sample Space

9.1 - Probabilistic Analysis

  • How many of these permutations require us to do 1 assignment operation?

1, 2, 3, 4

1, 2, 4, 3

1, 3, 2, 4

1, 3, 4, 2

1, 4, 2, 3

1, 4, 3, 2

2, 1, 3, 4

2, 1, 4, 3

2, 3, 1, 4

2, 3, 4, 1

2, 4, 1, 3

2, 4, 3, 1

3, 1, 2, 4

3, 1, 4, 2

3, 2, 1, 4

3, 2, 4, 1

3, 4, 1, 2

3, 4, 2, 1

4, 1, 2, 3

4, 1, 3, 2

4, 2, 1, 3

4, 2, 3, 1

4, 3, 1, 2

4, 3, 2, 1

Sample Space

9.1 - Probabilistic Analysis

  • How many of these permutations require us to do 1 assignment operation?
    • whenever 4 is the first element

1, 2, 3, 4

1, 2, 4, 3

1, 3, 2, 4

1, 3, 4, 2

1, 4, 2, 3

1, 4, 3, 2

2, 1, 3, 4

2, 1, 4, 3

2, 3, 1, 4

2, 3, 4, 1

2, 4, 1, 3

2, 4, 3, 1

3, 1, 2, 4

3, 1, 4, 2

3, 2, 1, 4

3, 2, 4, 1

3, 4, 1, 2

3, 4, 2, 1

4, 1, 2, 3

4, 1, 3, 2

4, 2, 1, 3

4, 2, 3, 1

4, 3, 1, 2

4, 3, 2, 1

Sample Space

9.1 - Probabilistic Analysis

  • If you thought that the average number of assignments is 2 (or 2.5), do you still think so?

1, 2, 3, 4

1, 2, 4, 3

1, 3, 2, 4

1, 3, 4, 2

1, 4, 2, 3

1, 4, 3, 2

2, 1, 3, 4

2, 1, 4, 3

2, 3, 1, 4

2, 3, 4, 1

2, 4, 1, 3

2, 4, 3, 1

3, 1, 2, 4

3, 1, 4, 2

3, 2, 1, 4

3, 2, 4, 1

3, 4, 1, 2

3, 4, 2, 1

4, 1, 2, 3

4, 1, 3, 2

4, 2, 1, 3

4, 2, 3, 1

4, 3, 1, 2

4, 3, 2, 1

Sample Space

9.1 - Probabilistic Analysis

1, 2, 3, 4

1, 2, 4, 3

1, 3, 2, 4

1, 3, 4, 2

1, 4, 2, 3

1, 4, 3, 2

2, 1, 3, 4

2, 1, 4, 3

2, 3, 1, 4

2, 3, 4, 1

2, 4, 1, 3

2, 4, 3, 1

3, 1, 2, 4

3, 1, 4, 2

3, 2, 1, 4

3, 2, 4, 1

3, 4, 1, 2

3, 4, 2, 1

4, 1, 2, 3

4, 1, 3, 2

4, 2, 1, 3

4, 2, 3, 1

4, 3, 1, 2

4, 3, 2, 1

1

1

1

1

1

1

2

2

2

2

2

2

3

2

3

3

2

2

4

3

3

3

2

2

  • If you thought that the average number of assignments is 2 (or 2.5), do you still think so?
  • The average is 50/24 = 2.083

Sample Space

9.1 - Probabilistic Analysis

n permutation assignments average
4 24 50 2.083
5 120 274 2.283
6 720 1764 2.450
7 5040 13,068 2.593
8 40320 109,584 2.718
9 362,880 1,026,576 2.829
10 3,628,800 10,628,640 2.929
11 39,916,800 120,543,840 3.020
12 479,001,600 1,486,442,880 3.103

Sample Space

9.1 - Probabilistic Analysis

n permutation assignments average
8 40320 109,584 2.718
9 362,880 1,026,576 2.829
10 3,628,800 10,628,640 2.929
11 39,916,800 120,543,840 3.020
12 479,001,600 1,486,442,880 3.103
  • The average number of assignments you need to do doesn't seem linear
    • in fact it looks logarithmic, i.e. \(c_a\log n\)

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

  • We need to do one assignment if the element we are inspecting is greater than any other element before it
    • supposing that the elements are arranged randomly, if you have \(i\) elements, each one of these elements are equally likely to be the greatest element
    • the probability of element \(i\) being the greatest element is \(1/i\)
      • when you have one element, obviously that is the greatest one
      • when you have two elements, each has 1/2 chance of being the greatest element
      • when you have three elements, each has 1/3 chance, and so on

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

\[x_1, \qquad x_2, \qquad x_3, \qquad x_4, \qquad \dots\]

Expected number of assignments: 1 

have to do one assignment

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

have to do one assignment if \(x_2 > x_1\), 1 in 2 chance

\[x_1, \qquad x_2, \qquad x_3, \qquad x_4, \qquad \dots\]

Expected number of assignments: 1 + 0.5

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

have to do one assignment if \(x_3 > x_1\) and \(x_3 > x_2\), 1 in 3 chance

\[x_1, \qquad x_2, \qquad x_3, \qquad x_4, \qquad \dots\]

Expected number of assignments: 1 + 0.5 + 0.33

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

have to do one assignment if \(x_4 > x_1\), \(x_4 > x_2\), and \(x_4 > x_3\), 1 in 4 chance

\[x_1, \qquad x_2, \qquad x_3, \qquad x_4, \qquad \dots\]

Expected number of assignments: 1 + 0.5 + 0.33 + 0.25

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

Expected number of assignments: 1 + 0.5 + 0.33 + 0.25

  • More generally, the expected number of assignments is

\[ = \sum_{i=1}^n \frac{1}{i} \]

(this is called a harmonic series)

\[ \le \log n + 1\]

\[ 1 + \left(1 \times \frac{1}{2}\right) + \left(1 \times\frac{1}{3}\right) + \cdots + \left(1 \times \frac{1}{n}\right)\]

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

  • More generally, the expected number of assignments is

\[ = \sum_{i=1}^n \frac{1}{i} \]

(this is called a harmonic series)

\[ \le \log n + 1\]

\[ 1 + \left(1 \times \frac{1}{2}\right) + \left(1 \times\frac{1}{3}\right) + \cdots + \left(1 \times \frac{1}{n}\right)\]

  • Therefore the average-case complexity of the algorithm is \(O(\log n)\) which is better than the worst case of \(O(n)\)

Calculating the Average Case Complexity

9.1 - Probabilistic Analysis

  • Some of you at the start may think that the average complexity of this algorithm is also \(O(n)\)
    • you may have been influenced by the average-case complexity of linear search, which is also \(O(n)\), the same with its worst-case complexity
    • note that in the case of linear search, each element in the array has an equal chance of being the element we are looking for, so the expected number of operation is

\[ \frac{1}{n} + \frac{2}{n} + \frac{3}{n} + \cdots + \frac{n}{n} = \frac{1}{n}\sum_{i=1}^{n} i\]

\[= \frac{1}{n} \left(\frac{n(n+1)}{2}\right) = \frac{(n+1)}{2} = O(n)\]

Closing Words

9.1 - Probabilistic Analysis

  • Finding the average-case complexity of an algorithm is not a trivial task because we have to assume the distribution of the input data, and this is often beyond the scope of a computing unit
    • it is something you can pick up in a STAT or MATH unit
  • In this unit, we are going to use only uniform probability distribution that was mentioned in the previous lecture, and you should already have a good understanding on how that works