lunchtime algorithms

Session One, 19 April 2016

what's an algorithm?

A list of steps for a computer to solve a given problem

There are many kinds of algorithms

Take the following problem:

Given a list of numbers, find the pair of numbers that, when multiplied together, return the largest product

the naïve algorithm

Text

numbers = [2, 5, 1, 10, 12]

result = 0

for i in range(0, n):
    for j in range(i+1, n):
        if a[i]*a[j] > result:
            result = a[i]*a[j]


# 2*5 > 0   => result = 10
# 2*1 > 10  => result = 10
# 2*10 > 10 => result = 20
# and so on...

drawbacks

  • extremely slow! Why?
    • for an array of size 5, involves 25 calculations
    • won't scale
    • very memory-intensive
    • duplication of effort: 5 * 2 and 2 * 5
    • redundant -- continues crunching even when the largest number has been found
  • most modern computers can execute 10^9 basic operations per second

Can we be smarter about this?

a basic optimisation

Insight: Recognise that the greatest product is also the product of the two greatest numbers in the list

def fastPairwiseProduct(a):
    n = len(a)

    ultimate_index = -1
    for i in range(0, n):
        if ultimate_index == -1 or a[i] > a[ultimate_index]:
            ultimate_index = i

    penultimate_index = -1
    for j in range(0, n):
        if (j != ultimate_index) and (penultimate_index == -1 or a[j] > a[penultimate_index]):
            penultimate_index = j

    return a[ultimate_index] * a[penultimate_index]

Standard optimisation techniques generally find and eliminate duplication of effort and cut down on memory usage

But, naïve algorithms are still useful!

Stress testing

  • Generate large, random datasets
  • Naïve algorithm as the base case
  • Compare output of base case to proposed optimisation
  • If solutions differ, there is a problem with the optimised case, or the base case, or both 😱 
# a scrappy example of stress-testing

while(True):
    numbers = []
    random_length = randint(2,100)
    for num in range(random_length):
        numbers.append(randint(0,10000000))

    outcome1 = slowPairwiseProduct(numbers)
    outcome2 = fastPairwiseProduct(numbers)

    if outcome1 != outcome2:
        print("Error: solutions don't match!")
        print(outcome1)
        print(outcome2)
        break;
    else:
        print("OK ---- " + str(outcome1))

Verifying algorithms

  • Limited manual testing
  • Try to generate different answers
    • Smallest/largest possible outputs
    • Null/undefined/divide by 0 errors
  • Understand time/memory consumption with increasing dataset size
  • Corner cases -- "degenerate cases", i.e. wrong datatypes as inputs

some things I learnt

  • To appreciate algorithms built into modern languages, you have to start writing very imperative code -- no more high-level functions like sort, reduce, filter, etc. 😭
  • Maths is amazingly useful but you don't need a deep understanding to start writing algorithms. You will spend more time learning the standard optimisation tools
  • Python is not the worst language ever (C is)

discussion? 🍔🍱🍩☕️

Made with Slides.com