The Computer Science of Human Decisions

Explore/Exploit

Latest or greatest?
Novelty or tradition?
Stick or twist?

e.g. Going out for dinner

Formalism

The multi-armed bandit problem
"Life is a casino"
Which arm should you pull?

Simplistic approach

Maximise expected value
But how do we know?

1. Win stay, lose shift

Herbert Robbins (1952)
Not optimal, but simple
And provably better than chance
But: always shift? And really explore on last night?

The interval

Key to determining real-world strategy
Gathering vs using information
Explore whilst time remains to exploit

2. Gittins Index

Discounting handles indefinite interval
Play arm with highest "index"
% pay-off reflects commitment to stay
(or expectation of making further decisions)
Assumes no switching cost
Needs lookup table, discounting ratio

3: Regret minimisation

I knew that when I was 80 I was not going to regret having tried this"

Regret is difference to optimal strategy
Best-case scenario is logarithmic growth
- i.e. few regrets with each passing year
Strategy: pick option with greatest potential for greatness, given what you know

Human behaviour

People over-explore (i.e. under-commit)
Example 1:
- 1000 x observe vs bet (60%/40%)
- Optimal: cash in after 38 observations
- Actual: 505 observations!

Example 2

Punctuality of new airline
Optimal: Use it exclusively until punctuality worse than established airline
Then never switch back
- You won't be getting any further information

Problem?

"A restless world"

Rewards restlessness
"Never stop exploring"
Decision making: it's never your last...
... contrary to human psychology

Further reading

"Algorithms to Live By"
Brian Christian and Tom Griffiths
- How to search
- How to know when to stop
- How to schedule
- ...

Algorithms to Live By

By Mark Woodbridge

Algorithms to Live By

25

Mark Woodbridge