By Denis and Steve
GOOD
BAD
Take over Sample D
Solve
Maximize reward while reducing variance in policy gradient
Makes changes to ρ smoother
Can still subtract baseline from R(h) for best results
1.
2.