Juraj Micko (jm2186)
Kwot Sin Lee (ksl36)
Wilson Suen (wss28)
Use reinforcement learning
to learn a robot controller
to follow a path
Main task: Learn velocities
Extension: Learn hyper-parameters of a controller
Action space:
Path:
Observation space:
Total reward = Direction reward + Distance reward - Angular velocity
Learn hyper-parameters of a feedback-linearized low-level path-following controller
Move holonomic point
Observation
Agent
controller coefficients
Move holonomic point
Move holonomic point
Observation
Agent
Move the robot
(feedback linearization)
Action in the environment
controller coefficients
Policy
Q-function
(target)
(target)
(target)
Advantage Function
Policy Network
Time Step
Action
State
policy before update
Scale
Warmup