Learning to Race in Hours with Reinforcement Learning
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8456148/overlay_cropped.png)
Antonin Raffin
Outline
I. Reinforcement Learning 101
II. Learning to drive in minutes
III. Learning to race in hours
Donkey Car Sim
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8456249/donkey_sim.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8456957/904.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8456961/001-car-steering-wheel.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8456963/002-speedometer.png)
Controls
on-board camera
RL 101
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8456212/RL101.png)
Goal: maximize sum of rewards
Reward Example
Is the car still on the road?
+1
-10
yes
no
RL in Practice
RL Tips and Tricks (RLVS21): https://rl-vs.github.io/rlvs2021/
Questions?
Learning to Drive in Minutes
Challenges
- minimal number of sensors
- variability of the scene (light, shadows, other cars, ...)
- oscillations
- limited computing power
- communication delay
- sample efficiency
![](https://miro.medium.com/max/720/1*z7xGwQwzLI-5PoyxsfvqrQ.gif)
Decoupling Features Extraction from Policy Learning
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8457120/deliverables_D2.1_state_representation_learning_srl_toolbox.png)
Learning a State Representation
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8457145/race_auto_encoder.png)
Real Car
![](https://araffin.github.io/slides/rlvs-tips-tricks/images/car/ae_robot.jpg)
Exploring the Latent Space
![](https://miro.medium.com/max/720/1*EvSBN0zynU-RuWbhXWfdhQ.gif)
Questions?
Reward?
Primary
Secondary
stay on the track
smooth driving
Reward Hacking
+1 for every timestep without crash
minimize steering diff
minimal throttle
no steering / constant steering
maximize distance travelled
zig-zag driving
Solution?
+1 for every step without crash + const x throttle
-10 - const x throttle (on crash)
reward =
Smooth Control
![](https://miro.medium.com/max/768/1*20LZ6UZUHUjaCarRgNHfEw.gif)
- clip steering diff
- continuity cost:
- to be continued...
(steering - last\_steering)^2
Be careful with Markov assumption!
Terminations
- crash (user intervention)
- timeout (need to be handled)
- to be continued...
RL Library: Stable Baselines3
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8457097/logo.png)
- Reliable implementations
- User friendly
- Active community
- Extensive documentation
- RL Zoo for training
- Algo: TQC (sample efficient)
Result
Learning to Race in Hours
- new sensor: speed
- new objective
![](https://s3.amazonaws.com/media-p.slid.es/uploads/222895/images/8461309/race.jpg)
Reward
- proxy: maximize speed
- penalize crashes
Smooth Control
- generalized State-Dependent Exploration (gSDE)
- condition reward on smoothness
- reduce max steering
- history wrapper
reward = reward \cdot (1 - delta\_steering) - continuity\_cost
Terminations
- after n hits (crash)
- low speed (stuck)
Result
live demo
Questions?
Learning to Race in Hours with Reinforcement Learning
By Antonin Raffin
Learning to Race in Hours with Reinforcement Learning
- 1,889