"Empowerment" as an Intrinsic Motivation to Explore Sparse Environments
How to calculate empowerment
Variational Bound on Mutual Information (Mohamed and Rezende, 2015 - also before I think)
Can represent q (and w, the policy) by neural networks and maximise this! Mohamed and Rezende do this by alternating between maximising w.r.t q (maximum likelihood) and w (they derive an expression for the functional derivative of the above) - using SGD in both cases.