Note: This is gonna be Meetup style
S ← ∅
b ← b0
PUSH(S, b)
while Ǝn, where (b, n) ∈ H do
A ← {n|(b,n) ∈ H}
C ← {a|a ∈ A, TEST(preconds(a), W)}
----> b ← CHOOSE(C, P, S, W) <----
PUSH(S, b)
What it brings to the table, talking on self acting agents
CHOOSE(Q, C, W) :-
a ← ZEROVAL()
for all c ∈ C do
t ← GET(Q, W, c)
if t > a then
a ← t
return DECIDE(a, C, Q)
The maximum value is picked, and then a DECISION of Exploitation vs Exploration has to be made.
LEARN(Q, a, W, W*, P) :-
v ← GET(Q, W*, c)
d ← ZEROVAL()
for all c* ∈ C do
t ← GET(Q, W, c*) – GET(Q, W*, a)
if t > d then
d ← t
r ← REWARD(a, W, W*, P)
q ← v + α*(r + γ*d)
PUT(Q, W*, e, q)
The learning process is basically updating our assumptions on the problem (represented by a value in Q), based on new information from the last action performed.
12: E ← ∅
13: while E = ∅ do
14: W* ← W
15: K ← UPDATE(W)
16: W ← REVISE(W, K)
17: E ← {a|a ∈ S, TEST(termconds(a), W)}
18: for all e ∈ E do
19: LEARN(Q, e, W, W*, P)
Once the running was done for the selected step and his sub-steps, now is the place where the state updates and thus allowing us to apply the learning process.
*Stagadush*