The Computer Science of Human Decisions

Explore/Exploit

 

  • Latest or greatest?
  • Novelty or tradition?
  • Stick or twist?

 

e.g. Going out for dinner

Formalism

  • The multi-armed bandit problem
  • "Life is a casino"
  • Which arm should you pull?

Simplistic approach

 

  • Maximise expected value
  • But how do we know?

1. Win stay, lose shift

 

  • Herbert Robbins (1952)
  • Not optimal, but simple
  • And provably better than chance
  • But: always shift? And really explore on last night?

The interval

 

  • Key to determining real-world strategy
  • Gathering vs using information
  • Explore whilst time remains to exploit

2. Gittins Index

 

  • Discounting handles indefinite interval
  • Play arm with highest "index"
  • % pay-off reflects commitment to stay
  • (or expectation of making further decisions)
  • Assumes no switching cost
  • Needs lookup table, discounting ratio

3: Regret minimisation

I knew that when I was 80 I was not going to regret having tried this"

 

  • Regret is difference to optimal strategy
  • Best-case scenario is logarithmic growth
    • i.e. few regrets with each passing year
  • Strategy: pick option with greatest potential for greatness, given what you know

Human behaviour

 

  • People over-explore (i.e. under-commit)
  • Example 1:
    • 1000 x observe vs bet (60%/40%)
    • Optimal: cash in after 38 observations
    • Actual: 505 observations!

Example 2

 

  • Punctuality of new airline
  • Optimal: Use it exclusively until punctuality worse than established airline
  • Then never switch back
    • You won't be getting any further information

 

Problem? 

"A restless world"

 

  • Rewards restlessness
  • "Never stop exploring"
  • Decision making: it's never your last...
  • ... contrary to human psychology

Further reading

 

  • "Algorithms to Live By"
  • Brian Christian and Tom Griffiths
    • How to search
    • How to know when to stop
    • How to schedule
    • ...

Algorithms to Live By

By Mark Woodbridge