Robot Learning

Ken Nakahara, Cong Wang, Zdravko Dugonjic, Elia Rühle, and Johannes Busch

Learning, Adaptive Systems, and Robotics (LASR) Lab

Research Seminar

v1.0

Structure of the Course

  • You will have to find a group of 3 (exceptions possible) students.
  • Each group can select a topic to work on.
  • Each topic consists of 3-4 papers.
  • The group will have to read the papers, write a report, and present the topic in front of the course.
  • The presentation, including questions, should take about 45 minutes.
  • The presentation/report should contain:
    • Explanation of the provided papers.
    • Information on additional papers that fit the respective topic.
    • A comparison/Analysis of common patters of the papers.
    • Discussion of the topic.
  • We will propose a number of topics, but feel free to propose your own.

Evaluation and Grading

  • Both the presentation and the report will be graded individually.
  • Every group will partner up with a second group. The groups will
    • be tasked to lead the question round after the presentation of the other group.
    • write a review of the report of the respective other group. The final grading will be done by the course supervisors, but the reviews will be taken into account.
  • Attendance will be checked and is mandatory for passing the course. 

Schedule

  • The seminars will take place from 13.05-15.07, which leaves the first group 4 weeks to prepare. Every seminar will consist of two presentations with questions.
  • Please sign up by adding your group the Calendar we will publish on Opal today.
    • Put in the topic and the 3 team members.
    • Do not input the same topic twice. If you really want to have a specific topic, you can talk to us and we will find a solution.
    • If possible, try to reduce the number of events by filling up the second time slot of a date that has the first slot already taken.
    • Do not delete other entries.

Deadlines

  • Report: 07.07.25
  • Review: 30.08.25

Available Topics

Skill-Learning

Papers:

Pertsch, et. al. - Accelerating Reinforcement Learning with Learned Skill Priors

Eysenbach, et. al. - Diversity is All You Need: Learning Skills without a Reward Function

Mishra, et. al. - Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models

Liang, et. al. - SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Goal:

The word skill in decision making is used in a broad sense. This work should illuminate and structure the different meanings and show how ideas are connected.

Source: Pertsch, et. al.

Latent World Models

Papers:

Hafner, et. al. - Learning Latent Dynamics for Planning from Pixels

Hafner, et. al. - Dream to Control: Learning Behaviors by Latent Imagination

Hafner, et. al. - Mastering Atari with Discrete World Models

Hafner, et. al. - Mastering Diverse Domains through World Models

Goal:

Show the evolution of Latent World Models.

Source: Hafner, et. al.

Diffusion-Generative Models

Papers:

Ho, et. al. - Denoising Diffusion Probabilistic Models

Song, et. al. - Denoising Diffusion Implicit Models

Song, et. al. - Consistency Models

Karras, et. al. - Elucidating the Design Space of Diffusion-Based Generative Models

Goal:

Explain the different mathematical formulations of diffusion models and show how they can be unified.

Source: Sohl-Dickstein, et. al.

Diffusion Policies

Papers:

Chi, et. al. - Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Wang, et. al. -  Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Lai, et. al. - Diffusion-Reward Adversarial Imitation Learning

Ren, et. al. - Diffusion Policy Policy Optimization

Goal:

Explain and contrast different approaches how diffusion models can be used as expressive policies for decision making.

Source: Chi, et. al.

Generative World Modelling

Papers:

Alonso, et al. - Diffusion for World Modeling: Visual Details Matter in Atari

Bruce, et al. - Genie: Generative Interactive Environments

Savov, et al. - Exploration-Driven Generative Interactive Environments

Russel, et al. - GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving

Goal:

Outline how generative models can be used to model the world. Highlight how these findings relate to training general capable agents.

Source: Bruce, et al.

Augmenting Reinforcement Learning with Self-Play

Papers:

Silver, et al. - Mastering the game of Go with deep neural networks and tree search

Vinyals et al. - Grandmaster level in StarCraft II using multi-agent reinforcement learning

Perolat, et al. - Mastering the game of stratego with model-free multiagent reinforcement learning

Berner et al. - Dota 2 with Large Scale Deep Reinforcement Learning

Goal:

Explain how reinforcement learning can benefit from agents playing against themselves to achieve super-human performance.

Source: Vinyals, et al.

Unsupervised Environment Design for Autocurricula

Papers:

Samvelyan, et al. - MAESTRO - Open-ended Environment Design For Multiagent Reinforcement Learning

Parker-Holder, et al. - Evolving curricula with regret-based environment design

Rutherford, et al. - No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Dennis, et al. - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Goal:

Understand how unsupervised environment design paved the way to open-ended learning.

Source: Rutherford, et al.

Joint Embedding Predictive Architecture for Self-Supervised Learning

Papers:

Assran, et al. - Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Bardes, et al. - MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features

Drozdov, et al. - Video Representation Learning with Joint-Embedding Predictive Architectures

Skenderi, et al. - Graph-level Representation Learning with Joint-Embedding Predictive Architectures

Goal:

Understand how the JEPA-approach differs from current generative models and how it can be used for different modalities.

Source: Assran, et al.

Mental States Attribution to Robotics

Papers:

Thellman et al. - Mental State Attribution to Robots: A Systematic Review of Conceptions, Methods, and Findings

Thellman et al. - Do You See what I See? Tracking the Perceptual Beliefs of Robots

Thellman et al. - Does the Robot Know It Is Being Distracted? Attitudinal and Behavioral Consequences of Second-Order Mental State Attribution in HRI

Source: Thellman et al.

Unexpected Robot Behavior

Papers:

Schadenberg et al. - “I See What You Did There”: Understanding People’s Social Perception of a Robot and Its Predictability

Abe et al. - Human Understanding and Perception of Unanticipated Robot Action in the Context of Physical Interaction

Honig et al. - Understanding and Resolving Failures in Human-Robot Interaction: Literature Review and Model Development

Source: Schadenberg et al.

Robot Intent Expression and Communication

Papers:

Prascher et al. - How to Communicate Robot Motion Intent: A Scoping Review

Bodden et al. -A flexible optimization-based method for synthesizing intent-expressive robot arm motion

Yi et al. - Your Way Or My Way: Improving Human-Robot Co-Navigation Through Robot Intent and Pedestrian Prediction Visualisations.

Source: Bodden et al.

Knowledge Representation in Service Robotics

Papers:

Beetz et al. - Know Rob 2.0 — A 2nd Generation Knowledge Processing Framework for Cognition-Enabled Robotic Agents

Hughes et al. - Foundations of spatial perception for robotics: Hierarchical representations and real-time systems

Paulius et al. - A Survey of Knowledge Representation in Service Robotics

Source: Paulius et al.

Evaluation Methods for Robot Explainability

Papers:

Wachowiak et al. - A Survey of Evaluation Methods and Metrics for Explanations in Human–Robot Interaction (HRI)

Doshi-Velez et al. - Towards A Rigorous Science of Interpretable Machine Learning

Sakai et al. - Implementation and Evaluation of Algorithms for Realizing Explainable Autonomous Robots

Source: Sakai et al.

Learning Stable Grasping for Any Objects

Papers:

Xu, et al. - LeTac-MPC: Learning Model Predictive Control for Tactile-reactive Grasping

Zhong, et al. - DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Calandra, et al. - More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

Mahler, et al. - Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

Source: Xu, et al.

Robotic In-hand Rotation and Its Application

Papers:

Qi, et al. - From Simple to Complex Skills: The Case of In-Hand Object Reorientation

Yuan, et al. - Robot synesthesia: In-hand manipulation with visuotactile sensing

Yang, et al. - AnyRotate: Gravity Invariant In-Hand Object Rotation with Sim-to-Real Touch

Suresh, et al. - Neural feels with Neural Fields: Visuo-tactile Perception for In-Hand Manipulation

Source: Qi, et al.

In-hand Manipulation in Various Scenarios

Papers:

Funabashi, et al. - Focused Blind Switching Manipulation Based on Constrained and Regional Touch States of Multi-Fingered Hand Using Deep Learning

Yin, et al. - Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing

Wang, et al. - Lessons from Learning to Spin “Pens”

OpenAI - Learning dexterous in-hand manipulation

Source: Funabashi, et al.

Bimanual Manipulation with Multi-fingered Robotic Hands

Papers:

Lin, et al. -  Learning Visuotactile Skills with Two Multifingered Hands

Chen, et al. - Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation

Shaw, et al. -  Bimanual Dexterity for Complex Tasks

Lin, et al. - Twisting Lids Off with Two Hands

Source: Lin, et al.

Manipulation Tasks on Humanoids

Papers:

Li, et al. -  OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation

Ze, et al. - Generalizable Humanoid Manipulation with 3D Diffusion Policies

Lin, et al. -  Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

Qiu, et al. - Humanoid Policy ∼ Human Policy

Source: Li, et al.

Robot Foundation Models

Papers:

Physical Intelligence - π0: A Vision-Language-Action Flow Model for General Robot Control

Octo Model Team -  Octo: An Open-Source Generalist Robot Policy

Kawaharazuka, et al. - Real-world robot applications of foundation models: a review

 

Source: Physical Intelligence

Neural Representations and Emerging Vision Capabilities

Papers:

Tiwary, et al. -  What if Eye...? Computationally Recreating Vision Evolution

Tang, et al. - Emergent Correspondence from Image Diffusion

Zholus, et al. - TAPNext: Tracking Any Point (TAP) as Next Token Prediction

Eslami, et al. - Neural Scene Representation and Rendering

Learning Simulation from Data

Papers:

Alonso, et al. -  Diffusion for World Modeling: Visual Details Matter in Atari

Kanervisto, et al. - World and Human Action Models towards gameplay ideation

Chen, et al. - Model as a Game: On Numerical and Spatial Consistency for Generative Games

Understanding Physics Through Video 

Papers:

Liu, et al. - Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Li, et al. - Sora Generates Videos with Stunning Geometrical Consistency

Radford, et al. - Learning Transferable Visual Models From Natural Language Supervision

Motamed, et al. - Do generative video models understand physical principles?

Deformable Object Representations

Papers:

Pumarola, et al. - D-NeRF: Neural Radiance Fields for Dynamic Scenes

Wu, et al. - 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Liu, et al. - LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction

Cao, et al. - HexPlane: A Fast Representation for Dynamic Scenes

Research Seminar SuSe25

By Johannes Busch

Research Seminar SuSe25

  • 579