The Computer Science of Human Decisions
Explore/Exploit
- Latest or greatest?
- Novelty or tradition?
- Stick or twist?
e.g. Going out for dinner
Formalism
- The multi-armed bandit problem
- "Life is a casino"
- Which arm should you pull?
Simplistic approach
- Maximise expected value
- But how do we know?
1. Win stay, lose shift
- Herbert Robbins (1952)
- Not optimal, but simple
- And provably better than chance
- But: always shift? And really explore on last night?
The interval
- Key to determining real-world strategy
- Gathering vs using information
- Explore whilst time remains to exploit
2. Gittins Index
- Discounting handles indefinite interval
- Play arm with highest "index"
- % pay-off reflects commitment to stay
- (or expectation of making further decisions)
- Assumes no switching cost
- Needs lookup table, discounting ratio
3: Regret minimisation
I knew that when I was 80 I was not going to regret having tried this"
- Regret is difference to optimal strategy
- Best-case scenario is logarithmic growth
- i.e. few regrets with each passing year
- Strategy: pick option with greatest potential for greatness, given what you know
Human behaviour
- People over-explore (i.e. under-commit)
- Example 1:
- 1000 x observe vs bet (60%/40%)
- Optimal: cash in after 38 observations
- Actual: 505 observations!
Example 2
- Punctuality of new airline
- Optimal: Use it exclusively until punctuality worse than established airline
- Then never switch back
- You won't be getting any further information
Problem?
"A restless world"
- Rewards restlessness
- "Never stop exploring"
- Decision making: it's never your last...
- ... contrary to human psychology
Further reading
- "Algorithms to Live By"
- Brian Christian and Tom Griffiths
- How to search
- How to know when to stop
- How to schedule
- ...
Algorithms to Live By
By Mark Woodbridge
Algorithms to Live By
- 25