I'm a software engineer. You can usually find me at the local pub, bouldering, or hunting for the best Korean fried chicken in London.
Session One, 19 April 2016
what's an algorithm?
A list of steps for a computer to solve a given problem
There are many kinds of algorithms
Take the following problem:
Given a list of numbers, find the pair of numbers that, when multiplied together, return the largest product
the naïve algorithm
numbers = [2, 5, 1, 10, 12] result = 0 for i in range(0, n): for j in range(i+1, n): if a[i]*a[j] > result: result = a[i]*a[j] # 2*5 > 0 => result = 10 # 2*1 > 10 => result = 10 # 2*10 > 10 => result = 20 # and so on...
- extremely slow! Why?
- for an array of size 5, involves 25 calculations
- won't scale
- very memory-intensive
- duplication of effort: 5 * 2 and 2 * 5
- redundant -- continues crunching even when the largest number has been found
- most modern computers can execute 10^9 basic operations per second
Can we be smarter about this?
a basic optimisation
Insight: Recognise that the greatest product is also the product of the two greatest numbers in the list
def fastPairwiseProduct(a): n = len(a) ultimate_index = -1 for i in range(0, n): if ultimate_index == -1 or a[i] > a[ultimate_index]: ultimate_index = i penultimate_index = -1 for j in range(0, n): if (j != ultimate_index) and (penultimate_index == -1 or a[j] > a[penultimate_index]): penultimate_index = j return a[ultimate_index] * a[penultimate_index]
Standard optimisation techniques generally find and eliminate duplication of effort and cut down on memory usage
But, naïve algorithms are still useful!
- Generate large, random datasets
- Naïve algorithm as the base case
- Compare output of base case to proposed optimisation
- If solutions differ, there is a problem with the optimised case, or the base case, or both 😱
# a scrappy example of stress-testing while(True): numbers =  random_length = randint(2,100) for num in range(random_length): numbers.append(randint(0,10000000)) outcome1 = slowPairwiseProduct(numbers) outcome2 = fastPairwiseProduct(numbers) if outcome1 != outcome2: print("Error: solutions don't match!") print(outcome1) print(outcome2) break; else: print("OK ---- " + str(outcome1))
- Limited manual testing
- Try to generate different answers
- Smallest/largest possible outputs
- Null/undefined/divide by 0 errors
- Understand time/memory consumption with increasing dataset size
- Corner cases -- "degenerate cases", i.e. wrong datatypes as inputs
some things I learnt
- To appreciate algorithms built into modern languages, you have to start writing very imperative code -- no more high-level functions like sort, reduce, filter, etc. 😭
- Maths is amazingly useful but you don't need a deep understanding to start writing algorithms. You will spend more time learning the standard optimisation tools
- Python is not the worst language ever (C is)
Algo Brownbags #1
By Denise Yu