Model Free
Active RL
Easy way to qualify a action in a specific state
QValue - Q(s,a) = 0
Temporal Difference Learning
Learn Q(s,a) values as you go
Take a action based in the max QValue(s,a)
Receive a sample (s,a,s’,r)
Consider your old estimate:
Consider your new sample estimate:
Incorporate the new estimate into a running average
Solution: describe a state using a vector of features (properties)
Update the weights