Q(s,a) = w1*f1(s,a)+w2*f2(s,a)+...+wn*fn(s,a)
difference = [r + gamma*MaxQ(s',a')] - Q(s,a)
Wi <= Wi + alpha * [difference] * fi(s,a)
alpha - learning rate - 0.1 and 0.001
Weights: (10, 15, 0, 0)
State: (0, 0, 1, 1)
QValue = 10*0 + 15*0 + 0 * 1 + 0 * 1 = 0;
Weights: (10, 15, 0, 0)
State: (1, 0, 0, 0)
QValue = 10*1 + 15*0 + 0 *0 + 0 * 0 = 10;