Make live decisions

Note: This is gonna be Meetup style

S ← ∅
b ← b0
PUSH(S, b)
while Ǝn, where (b, n) ∈ H do
	A ← {n|(b,n) ∈ H}
	C ← {a|a ∈ A, TEST(preconds(a), W)}
---->	b ← CHOOSE(C, P, S, W)    <----
	PUSH(S, b)

How do we CHOOSE(...)?

Decisions are only valid if our "strategy" holds

Reinforcement Learning

What it brings to the table, talking on self acting agents

In a nutshell

CHOOSE(Q, C, W) :-
	a ← ZEROVAL()
	for all c ∈ C do
		t ← GET(Q, W, c)
		if t > a then
			a ← t
	return DECIDE(a, C, Q)

We have our a table Q that maps a composite key: (W, c) to a value.

C represents all the possible actions.

W represents the our state.

The maximum value is picked, and then a DECISION of Exploitation vs Exploration has to be made.

Q TAble, What is your PROFESSION?!

LEARN(Q, a, W, W*, P) :-
	v ← GET(Q, W*, c)
	d ← ZEROVAL()
	for all c* ∈ C do
		t ← GET(Q, W, c*) – GET(Q, W*, a)
		if t > d then
			d ← t
	r ← REWARD(a, W, W*, P)
	q ← v + α*(r + γ*d)
	PUT(Q, W*, e, q)

The learning process is basically updating our assumptions on the problem (represented by a value in Q), based on new information from the last action performed.

12:  E ← ∅
13:  while E = ∅ do
14: 	W* ← W
15: 	K ← UPDATE(W)
16:	W ← REVISE(W, K)
17:	E ← {a|a ∈ S, TEST(termconds(a), W)}
18:	for all e ∈ E do
19:		LEARN(Q, e, W, W*, P)

Once the running was done for the selected step and his sub-steps, now is the place where the state updates and thus allowing us to apply the learning process.

So over time our table Q shall converge to an optimal policy

THanks. Hope you learned something.

*Stagadush*