Session One, 19 April 2016

what's an algorithm?

A list of steps for a computer to solve a given problem

There are many kinds of algorithms

*Take the following problem:*

Given a list of numbers, find the pair of numbers that, when multiplied together, return the largest product

Text

```
numbers = [2, 5, 1, 10, 12]
result = 0
for i in range(0, n):
for j in range(i+1, n):
if a[i]*a[j] > result:
result = a[i]*a[j]
# 2*5 > 0 => result = 10
# 2*1 > 10 => result = 10
# 2*10 > 10 => result = 20
# and so on...
```

- extremely slow! Why?
- for an array of size 5, involves 25 calculations
- won't scale
- very memory-intensive
- duplication of effort: 5 * 2 and 2 * 5
- redundant -- continues crunching even when the largest number has been found

- most modern computers can execute 10^9 basic operations per second

Can we be smarter about this?

Insight: Recognise that the greatest product is also the product of the two greatest numbers in the list

```
def fastPairwiseProduct(a):
n = len(a)
ultimate_index = -1
for i in range(0, n):
if ultimate_index == -1 or a[i] > a[ultimate_index]:
ultimate_index = i
penultimate_index = -1
for j in range(0, n):
if (j != ultimate_index) and (penultimate_index == -1 or a[j] > a[penultimate_index]):
penultimate_index = j
return a[ultimate_index] * a[penultimate_index]
```

Standard optimisation techniques generally find and eliminate duplication of effort and cut down on memory usage

But, naïve algorithms are still useful!

- Generate large, random datasets
- Naïve algorithm as the base case
- Compare output of base case to proposed optimisation
- If solutions differ, there is a problem with the optimised case, or the base case, or both 😱

```
# a scrappy example of stress-testing
while(True):
numbers = []
random_length = randint(2,100)
for num in range(random_length):
numbers.append(randint(0,10000000))
outcome1 = slowPairwiseProduct(numbers)
outcome2 = fastPairwiseProduct(numbers)
if outcome1 != outcome2:
print("Error: solutions don't match!")
print(outcome1)
print(outcome2)
break;
else:
print("OK ---- " + str(outcome1))
```

- Limited manual testing
- Try to generate different answers
- Smallest/largest possible outputs
- Null/undefined/divide by 0 errors

- Understand time/memory consumption with increasing dataset size
- Corner cases -- "degenerate cases", i.e. wrong datatypes as inputs

- To appreciate algorithms built into modern languages, you have to start writing
__very__imperative code -- no more high-level functions like sort, reduce, filter, etc. 😭 - Maths is amazingly useful but you don't need a deep understanding to start writing algorithms. You will spend more time learning the standard optimisation tools
- Python is not the worst language ever (C is)