Xu et al. 2022
Extensive Form Games
- Reasoning about action histories
R
P
S
Player 1
Player 2
R
P
S
R
P
S
R
P
S
R
P
S
Player 1
Player 2
(0,0)
(0,0)
(0,0)
(-1,1)
(-1,1)
(-1,1)
(1,-1)
(1,-1)
(1,-1)
Extensive Form Games
- Reasoning about action histories
Imperfect Information Extensive Form Games
- Reasoning about information sets
R
P
S
R
P
S
R
P
S
R
P
S
Player 1
Player 2
(0,0)
(0,0)
(0,0)
(-1,1)
(-1,1)
(-1,1)
(1,-1)
(1,-1)
(1,-1)
Subgame utility evaluation
(non Monte Carlo)
Tightest known convergence bound for CFR+ is 2x worse than vanilla CFR
(but in practice CFR+ converges significantly faster)
(A solution to graduate student descent)
Sample subset of population
Random architecture population initialization
Cull the old
Mutate best, evaluate, store
Back to simulink...
Clipped, normalized exploitability
Initialize random (or bootstrapped) population
Sample from subset of population
Mutate & ensure candidate is reasonable contender
Evaluate mutant & store
Intermediate Policy Improvement
Intermediate Policy Evaluation
Final Policy Evaluation
Double-clipped Discounted CFR
Training
Testing