Multi-Arm Bandit v.s. A/B testing

Chen-Yu Yang

 

WiFi: IIIEDU-TAF / hellohello

About Me

Multi-Arm Bandit

Let's play a game

http://iosband.github.io/2015/07/28/Beat-the-bandit.html

Regret

  • What if I know the best option...

Solving MAB

  • Greedy
  • Epsilon Greedy
  • Upper Confidence Bound
  • Thompson Sampling

Greedy

  • Greedy...
  • Reduced to A/B testing
  • Linear regret

Epsilon Greedy

  • Greedy... for eps. portion
  • Lower epsilon if we believed we've explored enough
  • Linear regret

Upper Confidence Bound

  • For each option, estimate its upper bound
  • UCB_i ~= sample mean + sqrt(log(t) / n_i)
  • Sublinear regret

Thompson Sampling

  • Bayesian approach
  • Sample from the posterior distribution
  • Beta(#success+1, #failure+1)
  • Sublinear distribution

Codes

From our class final project

https://github.com/hshim/Bandits/blob/master/playground.ipynb

Applications

  • Any problems you solved with AB testing
  • Choose between options
  • Selling flight tickets
  • Determine target audiences
  • Clinical Trial
  • etc...

Question?

Multi-Arm Bandit v.s. A/B testing

By Chen-Yu Yang

Multi-Arm Bandit v.s. A/B testing

  • 603