Learning to Race in Hours with Reinforcement Learning

Antonin Raffin

Outline

I. Reinforcement Learning 101

II. Learning to drive in minutes

III. Learning to race in hours

Donkey Car Sim

Controls

on-board camera

https://github.com/tawnkramer/gym-donkeycar

RL 101

Goal: maximize sum of rewards

Reward Example

Is the car still on the road?

-10

yes

RL in Practice

RL Tips and Tricks (RLVS21): https://rl-vs.github.io/rlvs2021/

Questions?

Learning to Drive in Minutes

Challenges

minimal number of sensors
variability of the scene (light, shadows, other cars, ...)
oscillations
limited computing power
communication delay
sample efficiency

https://github.com/araffin/learning-to-drive-in-5-minutes

Decoupling Features Extraction from Policy Learning

Learning a State Representation

Real Car

Exploring the Latent Space

Questions?

Reward?

Primary

Secondary

stay on the track

smooth driving

Reward Hacking

+1 for every timestep without crash

minimize steering diff

minimal throttle

no steering / constant steering

maximize distance travelled

zig-zag driving

Solution?

+1 for every step without crash + const x throttle

-10 - const x throttle (on crash)

reward =

Smooth Control

clip steering diff
continuity cost:
to be continued...

(steering - last\_steering)^2

Be careful with Markov assumption!

Terminations

crash (user intervention)
timeout (need to be handled)
to be continued...

RL Library: Stable Baselines3

Reliable implementations
User friendly
Active community
Extensive documentation
RL Zoo for training
Algo: TQC (sample efficient)

https://github.com/DLR-RM/stable-baselines3

Result

+real world video