Athletic Intelligence
for Legged Robots

Shamel Fahmi

June, 2023

"Marc Raibert divided intelligence into Cognitive Intelligence (CI) and Athletic Intelligence (AtI). CI allows us to make abstract plans, and to understand and solve broader problems. AtI allows us to operate our bodies in such a way that we can balance, stand, walk, climb, etc. AtI also lets us do real-time perception so that we can interact with the world around us."

S. Fahmi, On Terrain-Aware Locomotion for Legged Robots, PhD. dissertation, Italian Institute of Technology, 2021.
M. Raibert and S. Kuindersma, Boston dynamics, the Challenges of Real-World Reinforcement Learning Workshop in NeurIPS, 2021.

Athletic Intelligence

for Legged Robots?

Athletic Intelligence allows the robot to take decisions based on the surrounding world using its perception.

decisions

world

perception

decisions

world

perception

Using Optimization and Learning

  • Control

  • State Estimation

  • Planning

  • Proprioception

  • Vision

  • Physical Properties

  • Geometry

STANCE: Whole-Body Control for Legged Robots

ViTAL: Vision-Based Terrain-Aware Locomotion Planning

MIMOC: Reinforcement Learning for Legged Robots

Locomotion Planning

Two ways to deal with locomotion planning:

1. Coupled Planning:

  • Plan the feet & base states simultaneously.

2. Decoupled planning:

  • Separate the plan into feet & base plans.

  • Planning the feet motion ⇒ foothold selection

  • Planning the body motion ⇒ pose adaptation

Vision-Based Foothold Adaptation (VFA)

Villarreal 2019

Barasuol 2015

Foothold Selection

V. Barasuol et al., Reactive Trotting with Foot Placement Corrections through Visual Pattern Classification, IROS, 2015.
O. Villarreal et al., Fast and Continuous Foothold Adaptation for Dynamic Locomotion through CNNs, RA-L, 2019.

Goal: Select footholds based on the terrain information and the robot's capabilities. 

Pose Adaptation

  • Goal: find the optimal robot's pose that can:

    • Maximize the reachablity of the given selected footholds.

    • Avoid terrain collision and maintain static stability.

 

  • A pose optimizer to deal with these objectives.

M. Kalakrishnan et al., Learning, planning, and control for quadruped locomotion over challenging terrain, IJRR 2011.
P. Fankhauser et al., Robust Rough-Terrain Locomotion with a Quadrupedal Robot, IROS, 2018.

Kalakrishnan 2011

Fankhauser 2018

The Problem with Decoupled Planning

  • Current decoupled planning strategies focus on finding one optimal solution based on given selected footholds.

 

  • First, the footholds are selected.

  • Then, the pose is optimized based on that.

 

  • If the selected footholds are not reached (the robot gets disturbed), the robot may end up in a state with no reachable safe footholds. 

  • Propose a different paradigm.

 

  • Instead of finding body poses that are optimal w.r.t given footholds,

 

  • we find body poses that will maximize the chances of the robot reaching safe footholds.

 

  • Put the robot in a state that if it gets disturbed, it remains around a set of solutions that are still safe & reachable.

How to Solve This Problem?

Vision-Based Terrain Aware Locomotion (ViTAL)

  • ViTAL is an online vision-based locomotion planning strategy.

 

  • ViTAL selects footholds based on the robot's skills

 

  • Also, find the robot pose that maximizes the chances of success in reaching these footholds.

 

  • The notion of safety & success emerges from skills that characterize the robot's capabilities.

Vision-Based Foothold Adaptation (VFA)

Vision-Based Pose Adaptation (VPA)

Foothold Evaluation Criteria (FEC)

What the robot is capable of doing?

  • Avoid edges, corners, gaps, etc.
  • Remain with the workspace of the legs.
  • Avoid colliding with the terrain.

Foothold Evaluation Criteria (FEC)

The FEC is a way to evaluate footholds given a heightmap, the leg's hip height, & other robot states.

When the criteria are evaluated, a boolean matrix is given as an output.

Each pixel in the heightmap represents the terrain height corresponding to the pixel's location in the terrain map.

The heightmap is a 2D representation of the terrain.

Each pixel is a candidate foothold for the robot.

The boolean matrix characterizes which pixel (location) in the heightmap is feasible (safe) or not based on the robot's capabilities (the criteria). 

Foothold Evaluation Criteria (FEC)

Rejects footholds that cause foot trajectory collision.

- Foot Trajectory Collision (FTC)

Rejects footholds that are outside the workspace during the entire gait phase.

-  Kinematic Feasibility (KF)

Rejects footholds that cause leg collision with the terrain.

-  Leg Collision (LC)

-  Terrain Roughness (TR)

Rejects footholds that are near holes, spikes, edges, and corners.

Vision-Based Foothold Adaptation (VFA)

From the VFA to the VPA

First, we extract heightmaps that are centered around the projection of the leg hip location.

Vision-Based Pose Adaptation (VPA)

Then, the pose evaluation stage will evaluate the criteria for all the samples of the hip heights. 

The pose evaluation stage will output samples of number of safe footholds.

Vision-Based Pose Adaptation (VPA)

The output of the function approximation is

\hat{\mathcal{F}}_l

These samples are then used to approximate a continuous function of the number of safe footholds at the function approximation stage. 

Vision-Based Pose Adaptation (VPA)

Finally, the pose optimization stage finds the optimal pose that maximizes            for all of the legs

\hat{\mathcal{F}}_l

Vision-Based Pose Adaptation (VPA)

Find the optimal robot's trunk pose that maximizes the number of safe footholds for all the legs

Height

Roll

Pitch

Pose Optimization

Pose Optimization

front hips rising up

front hips lowering down

hind hips rising up

hind hips lowering down

Baseline: Terrain-Based Reference (TBR)

O. Villarreal et al., MPC-based Controller with Terrain Insight for Dynamic Legged Locomotion, ICRA, 2020.

Conclusions: ViTAL

  • We presented ViTAL, a decoupled visual planning strategy

 

  • ViTAL consisted of :

  • VFA for foothold selection

  • VPA for pose adaptation

    • Introduced a different paradigm in pose adaptation

    • Finds the body poses that maximize the chances of the legs to succeed in reaching safe footholds

  • FEC (notion of safety) that characterizes the robot's capabilities

Conclusion: ViTAL

  • ViTAL allows HyQ and HyQReal to traverse a wide variety of terrains under various forward velocities.

    • These terrains included stairs, gaps, etc.

    • Forward velocities varied from 0.2m/s to 0.75 m/s.

 

  • We showed that the VPA resulted in body poses that are aware of the terrain, and of what the robot and its legs can do.

 

  • The VPA puts the robot in a pose that provides the feet with higher number of safe footholds which allows the robot to succeed unlike the baseline that fails.

Published In:

 

S. Fahmi, V. Barasuol, D. Esteban, O. Villarreal, and C. Semini,

“ViTAL: Vision-Based Terrain-Aware Locomotion for Legged Robots,” IEEE Trans. Robot. (T-RO), Nov. 2022, doi: 10.1109/TRO.2022.3222958.

Model-Based vs. Reinforcement Learning

  • Model-Based Control

    • Interpretable

    • Based on physics

    • Easier to tune & design

  • Reinforcement Learning

    • Model-free

    • Harder to interpret, design, & tune

    • Less sensitive to modeling & state estimation inaccuracies

  • Can we combine the best of both worlds?
     

  • Train an RL policy that tracks (imitates) reference trajectories from model-based optimal controllers?

RL for Legged Robots

[1] N. Rudin et al., Learning to walk in minutes using massively parallel deep reinforcement learning, CoRL, 2021.
[2] G. Ji et al., Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,” RA-L, 2022.
[3] J. Siekmann et al., Sim-to-real learning of all common bipedal gaits via periodic reward composition, ICRA, 2021

[4] X. B. Peng et al., Learning agile robotic locomotion skills by imitating animals, RSS 2020.
[5] X. B. Peng, et al., Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,” ACM Trans. Graphics, 2018.

  • Heuristic Rewards [1-3]

    • Hand-crafted

    • Requires tedious tuning

  • Motion Imitation [4,5]

    • Uses MoCap (or video clips) hard to get

    • Motion retargeting kinematically inaccurate

  • PMTGs

    GPGs + residuals

    Gait dependent

Proposed Approach

  • MIMOC is a locomotion controller that learns from model-based optimal controllers.
     

  • This is an RL approach via motion imitation.
     

  • Instead of MoCap, we rely on reference trajectories from model-based optimal controllers.
     

  • MIMOC: Motion Imitation from Model-based Optimal Control

MIMOC's Contributions

  • Unlike RL via motion imitation (MoCap), our references

    1. Consider the robot's whole-body dynamics ⇒ dynamically feasible

    2. Do not need motion retargeting (robot-specific) ⇒ kinematically feasible

    3. Include torques and GRFs which are crucial for balance
       

  • As a result, compared to RL via motion imitation

    • The sim-to-real gap is significantly reduced

    • No fine-tuning stage and no tedious reward tuning were needed
       

  • Unlike model-based controllers

    • There were no issues with running the policy in real-time

    • The policy is less sensitive to state estimation noise and model inaccuracies

MIMOC's RL Framework

  • Inputs: commands, phase (clock), and (some) robot observations
     

  • Actions: joint set-points going through a PD controller at 100 Hz
     

  • The references are only used to calculate the rewards
     

  • The rewards are only tracking rewards, without hand-tuning

MIMOC's RL Framework

  • We used an MPC + WBC for the Mini-Cheetah and the MIT Humanoid
     

  • We used IsaacGym with the Legged Gym Implementation

D. Kim et al., Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control, ArXiV, 2019.
N. Rudin et al., Learning to walk in minutes using massively parallel deep reinforcement learning, CoRL, 2021.

  • Rewards: only tracking rewards of weighted squared exponentials

 

 

 

 

  • Observations

     

  • Actions: PD set-points

  • Domain Randomization: noise, friction, body mass, push robots, and gravity.

r_t = r_{q} + r_{\dot{q}} + r_{\tau} + r_{x}+ r_{\dot{x}} + r_{\theta} + r_{\omega} + r_{e} + r_{\dot{e}}
r_{\beta} = w_{\beta} \sum_{i=1}^{n_{d}} \Bigg[ \mathrm{exp}( - \alpha_{\beta} \big\| \beta_{i,\mathrm{ref}} - \beta_i \big\|^2 ) \Bigg]
\alpha_\beta = \left(\frac{2}{\max(\beta^{\mathrm{ref}}) -\min(\beta^{\mathrm{ref}}) }\right)^2
o_t = [ q^T \quad \dot{q}^T \quad z^T \quad \dot{x}^T \quad \omega^T \quad c^T \quad \cos(\phi) \quad \sin(\phi) ]^T

MIMOC's RL Framework

Published In:

 

A.J. Miller, S. Fahmi, M. Chignoli, and S. Kim,

"Reinforcement Learning for Legged Robots: Motion Imitation from Model-Based Optimal Control," ArXiV preprint, May 2023.

Made with Slides.com