Few Remarks on Benchmarks
Roberto Calandra
Facebook AI Research
Workshop in Benchmarking Robotics - 13 August 2019
Two Meta-questions about Benchmarks
-
What is a benchmark testing? (i.e., are these the right metrics)
- Nature is not single-objective
- Solving a specific task vs designing general purpose systems
-
Is this benchmark representative of real-world problems?
- Are potential advances useful in the real-world?
- If using simulations, is the result indicative?
Solving a specific task vs designing general purpose systems
- In Robotics there is tension between creating systems that just work, and advancing scientific understanding
- Do we care about being able to pick the same object over and over ?
(e.g., industrial application) - Or do we care about a system that can adapt to different tasks (potentially unknown at training time)?
- In System Identification (and kids), we do not know the task beforehand. Can we still learn something useful?
Benchmarks on Real Robots are Hard...
- Not everyone has the same setting (robot, sensors, etc)
- Running real-world experiments is time-consuming and expensive
- Can we just use simulation?
We should not decouple software from hardware !
How do we evaluate the importance of the Hardware too?
(Not abstracting away, but reason about it)
Formalizing the Hardware



Benchmarks should consider both!
One way to do
Final Remarks
- What are we really benchmarking? (Perception vs learning vs controller vs hardware)
- Agree with Jan's point about "mine is better than yours"
- Reproducing and negative results MUST be worth (see Physics)
- Many of the current learning benchmarks (e.g., OpeanAI) are atrocious
(clearly not well designed as meaningful benchmarks) - Benchmarks should not be proprietary (e.g., MuJoCo)
Mimic Benchmark
- Let a Robot "play" in an environment for a long time (e.g., 3 Months) without any goal
- Now bring in humans and the robot has to reproduce any skill that the human demonstrate
- The humans win if the robot can not reproduce the shown skill
(Generative Adversarial Human) - How long does it take to the Human to win?
Few Remarks on Benchmarks
By Roberto Calandra
Few Remarks on Benchmarks
Presented at the Workshop in Benchmarking Robotics
- 710