RL Food gathering 

\(S^e_t = \vec x_t\)

0 0 1

 

\(S^a_t\)

\(a_t\)

\( c= n \times \textit{sign}(\|x_{t-1}\| - \|x_t\|) \)

Smells \(c\) after moving

-1 with some probability

\( Q(s, a) \)  has \(4^3 \times 2^3 \times 4 = 2048 \) degrees of freedom

\( Q(R s, R a) = Q(s, a) \quad \forall R \in \text{rotations} \) 

\( \Longrightarrow 512 \) degrees of freedom

\( Q(\downarrow\rightarrow\uparrow001, \rightarrow) = Q(\rightarrow\uparrow\leftarrow001, \rightarrow) = Q(\uparrow\leftarrow\downarrow001, \rightarrow) \)

\( Q(\rightarrow\rightarrow\rightarrow) = \uparrow \)

\( Q(\rightarrow\rightarrow\uparrow) = \leftarrow \)

\( Q(\rightarrow\uparrow\leftarrow) = \leftarrow \)

\( Q(\uparrow\leftarrow\leftarrow) = \leftarrow \)

\( Q(\leftarrow\leftarrow\leftarrow) = \uparrow \)

Problem of invariance by rotation

\( \pi = \textrm{softmax}(\beta Q) \)

\( \pi^* = \textrm{argmax}(Q) \)

\( \)

\(Q(s_t, a_t) \leftarrow \alpha (r_t + \gamma Q(s_{t+1}, a^*) - Q(s_t, a_t)) \)

\(\pi\)

\(\pi^*\)

\( r_t = \| \vec x_{t-1} \| - \| \vec x_t \| \)

  • RL vs supervied (importance of \(\gamma\))
  • reward is the final goal

RL food gathering

By Mario Geiger

RL food gathering

  • 45
Loading comments...

More from Mario Geiger