Make live decisions

Note: This is gonna be Meetup style

S ← ∅
b ← b0
PUSH(S, b)
while Ǝn, where (b, n) ∈ H do
	A ← {n|(b,n) ∈ H}
	C ← {a|a ∈ A, TEST(preconds(a), W)}
---->	b ← CHOOSE(C, P, S, W)    <----
	PUSH(S, b)

How do we CHOOSE(...)?

Decisions are only valid if our "strategy" holds

Reinforcement Learning

What it brings to the table, talking on self acting agents

In a nutshell

CHOOSE(Q, C, W) :-
	a ← ZEROVAL()
	for all c ∈ C do
		t ← GET(Q, W, c)
		if t > a then
			a ← t
	return DECIDE(a, C, Q)

We have our a table Q that maps a composite key: (W, c) to a value.

C represents all the possible actions.

W represents the our state.

The maximum value is picked, and then a DECISION of Exploitation vs Exploration has to be made.

Q TAble, What is your PROFESSION?!

LEARN(Q, a, W, W*, P) :-
	v ← GET(Q, W*, c)
	d ← ZEROVAL()
	for all c* ∈ C do
		t ← GET(Q, W, c*) – GET(Q, W*, a)
		if t > d then
			d ← t
	r ← REWARD(a, W, W*, P)
	q ← v + α*(r + γ*d)
	PUT(Q, W*, e, q)

The learning process is basically updating our assumptions on the problem (represented by a value in Q), based on new information from the last action performed.

12:  E ← ∅
13:  while E = ∅ do
14: 	W* ← W
15: 	K ← UPDATE(W)
16:	W ← REVISE(W, K)
17:	E ← {a|a ∈ S, TEST(termconds(a), W)}
18:	for all e ∈ E do
19:		LEARN(Q, e, W, W*, P)

Once the running was done for the selected step and his sub-steps, now is the place where the state updates and thus allowing us to apply the learning process.

So over time our table Q shall converge to an optimal policy

THanks. Hope you learned something.

*Stagadush*

Make live decisions

By Boaz Berman

Make live decisions

Why Reinforcement Learning is the best approach to Belief Desire Intention

Boaz Berman

BermanBoaz

Make live decisions

How do we CHOOSE(...)?

Decisions are only valid if our "strategy" holds

Reinforcement Learning

In a nutshell

We have our a table Q that maps a composite key: (W, c) to a value.

C represents all the possible actions.

W represents the our state.

Q TAble, What is your PROFESSION?!

So over time our table Q shall converge to an optimal policy

THanks. Hope you learned something.

Make live decisions

More from Boaz Berman