Ethan Faculty Job Talk (v1)

Empowering Physically Assistive Robots

Ethan K. Gordon

Postdoc, University of Pennsylvania

PhD 2023, University of Washington

with Contact-Rich Active Learning

Physically Assistive Robots (PARs)

“If I can have a robot do it, I can learn to adapt to it, but it would be me feeding me, and that would be huge”

Tyler Schrenk

1985-2023

The Promise of PARs:

Empowerment
Independence

What is needed for PARs?

Contact-Rich Manipulation

Sliding to clean the spoon and bowl
Shaking to smoothen
In-Mouth Hand-Off
(vision-denied)

Online Adaptation

Bite Size Adjustment

What is needed for PARs?

Online Adaptation

Totally Different Food
Multi-bite: different shapes for each bite

There is no time for

re-training!

Tractable Adaptability

How can robots efficiently learn, during deployment, how to manipulate previously-unseen objects?

Policy Space Reduction

Model-Based Methods

Leveraging Haptics

Support

Inform

Multimodal Active Learning

Physically Assistive Robots

The Technology/Application Cycle

Support

Inform

Physically Assistive Robots (PARs)

Active Learning in Contact

Support

Inform

Multimodal Active Learning

Physically Assistive Robots

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Food Bite Acquisition as a Contextual Bandit
- Policy space reduction a priori
- Haptics as post hoc bandit context
Active Learning Through Contact
Community-Based Participatory Design
Where can PARs go from here?

PAR

Do we need autonomy? What kind?

Community-Based Participatory Research

It is important to ask users, observational and qualitative research before experimentation.

Time Per Bite:

Caretaker: ~20s
Preferred: <2min
Teleoperated Robot: 5-40min

Example: Why Single-Utensil Feeding?

It's intuitive and familiar.

The Assistive Dexterous Arm (ADA)

User Studies Capture Diversity

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

User Studies Capture Metrics

Trade-off between autonomy (with chance of error) and high-effort manual control.

What errors are tolerable?

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

Acceptance Can be User-Dependent

Users with more limited mobility had a preference for greater autonomy, even if it experienced errors.

But error tolerance was still low, e.g.,

Minimum Food Acquisition Success Rate:

80%

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

Autonomy Preference Given Errors

User Rating

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Food Bite Acquisition as a Contextual Bandit
- Policy space reduction a priori
- Haptics as post hoc bandit context
Active Learning Through Contact
Community-Based Participatory Design
Where can PARs go from here?

PAR

AL

Data Driven Bite Acquisition

R. Feng, Y. Kim, G. Lee, E. K. Gordon, et al, "...Generalizing skewering strategies...", ISRR 2019

Real data collection takes a lot of time, but we can do it. What if we just train an RL policy?

Example: 10 trajectories x 16 food types

85 person-hours

Can it run on portable compute?
Is it safe? Is it predictable / comfortable?
How does it handle covariate shift?

Online Learning with Policy Space Reduction

Hierarchy and Bandits

Leveraging Expert Data

 T. Bhattacharjee et al, “Towards Robotic Feeding...", R-AL 2019

Qualitative Taxonomy

Insights:

Discrete classes of strategies

Lots of variations within those classes

Imitation Learning for Policy Space Reduction

 E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023

Splines and force/torque thresholds.

Comparable with Euclidean Metric.

Emergent Behavior

Wiggling

Tilting

High Pressure

Scooping

 E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023

Is this expressive enough?

Yes!

(Note the 80% acceptance threshold)

 E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023

Online learning with a discrete action space maps cleanly on the contextual bandit setting.

The Multi-Arm Bandit (MAB)

\(r \sim\)

\( \mathcal{N}(0.5, 1)\)

Reward

\( \mathcal{N}(0.8, 1)\)

\( \mathcal{N}(0.1, 1)\)

Interaction Protocol:

Select \(a = \pi(a_{0\ldots t}, r_{0\ldots t})\)
Observe \(r\)
Update \(\pi\)

\(a =\)

\(1\)

\(2\)

\(3\)

Metric: Regret: \(\mathbb{E}[r(a^*) - r(a_t)]\)

Test time metric, balances exploration vs. exploitation,

often theoretically bounded

The (Stochastic) Contextual Bandit

\(r \sim\)

\( \mathcal{N}(\mu_1(c), \sigma)\)

Reward

Interaction Protocol:

Observe \(c_t\)
Select \(a_t = \pi(c_t)\)
Observe \(r(a_t, c_t)\)
Update \(\pi\)

\(a =\)

\(1\)

\(2\)

\(3\)

\( \mathcal{N}(\mu_2(c), \sigma)\)

\( \mathcal{N}(\mu_3(c), \sigma)\)

\(c_t\)

Bite Acquisition as a Contextual Bandit

Discrete Actions \(a\)

Visual Context: \(c_t\)

\(r\): Success/Failure \(\{1, 0\}\)

 E. K. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020

Eye-in-Hand RGBD

SegmentAnything

ResNet Features

Bite Acquisition as a Contextual Bandit

\(c\): Visual Context

Eye-in-Hand RGBD

SegmentAnything

ResNet Features

\(l\): Success/Failure

\(\{0, 1\}\)

\(a\):

Control Policy

Action Space Trade-Off:

Expressive enough for general success
Not too big for tractable learning

 E. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020

Post Hoc Haptics for Bite Acquisition

Haptic data is really good for food classification.

55ms of force data:

T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Supervised vs. Bandit Learning

\(l_a(c) = [0, 1, 1, 1, 1, 1, 1, 1, 1, 1]\)

Supervised Learning sees \(l_a(c) \forall a\)

Full Feedback

\(c\)

Bandit Algorithm sees \(l_{a_t}(c)\)

Bandit Feedback (Harder)

No counterfactual.

Post Hoc Haptics for Bite Acquisition

E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Consider a joint loss model:

\(l_a + \epsilon = c \cdot \theta^*_a = p \cdot \phi^*_a\)

\(\epsilon \sim \mathcal{N}\)

visual context

haptic context

\(\tilde{\theta},\tilde{\phi} = \arg\min_{\theta, \phi}\)

\(\sum_{t : \text{action}=a}||c_t\cdot\theta_a - l_t||^2 + ||p_t\cdot\phi_a - l_t||^2\)

s.t. \(\forall t,a: c_t \cdot \theta_a = p_t \cdot \phi_a\)

Once either model is learned, the other model reduces from bandit feedback to full feedback

Example:

After only 1 action, robot determines that kiwi \(\approx\) banana, and can impute the counterfactual.

Are the 11 actions tractable for learning?

T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Yes

New foods take ~7-8 actions to learn to user satisfaction.

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Food Bite Acquisition as a Contextual Bandit
- Policy space reduction a priori
- Haptics as post hoc bandit context
Active Learning Through Contact
Community-Based Participatory Design
Where can PARs go from here?

AL

Bite Acquisition: The Data Problem

 E. Heiden et al, “DiSECt", RSS 2021 ; R. Feng et al, "...Generalizing skewering strategies...", ISRR 2019

Food Sim/Real gap is significant.

(this is just planar cutting)

Real data takes a lot of time.

Example: 10 trajectories x 16 food types

85 person-hours

Deep Dive on Active Tactile Exploration

Online Learning:

Given data, what is our best guess of the object's parameters and location?

Active Exploration:

Estimate how much information we have from previous data.
Choose the next action to maximize the novel information collected.

Dynamic Object System Identification

Choose:

Robot Trajectory \(r_t\)

Measure:

Find:

Object Geometry \(\theta^*\)
Object Pose \(x^*_T\)

Previous Work in Tactile SysID

Static Objects: "assume a sensor that can detect contact before causing movement" [2]

Utilizes 2D OR discrete object priors.

Spatially Sparse Data -> Active Learning

 [1] Hu et al, Biomimetic Intelligence and Robotics 2024 ; [2] Xu et al. "TANDEM3D...", ICRA 2023

Learning and Exploration Through Contact

Any gradients \(\nabla_\Theta\mathcal{L}\) have inverse Jacobian terms: \(\frac{\partial x_t}{\partial x_T}\)

Fundamental Problem:

What is the sensitivity of past measurements to the current state?

For e.g. Coulomb friction, this is ill-posed.

(Even the simpler: "what is the past state given the current state" is unanswerable)

For both learning and exploration, how do we handle this?

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Log-Likelihood Loss: \(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)

(Past) Measurements

Geometry and (Current) State

Online Learning Through Contact Dynamics

Goal: Define a Loss Function as a Negative-Log-Likelihood

\(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)

\phi

Step 1: Define a per-timestep Measurement Model

\(\mathbb{P}(\hat{n}_{m,t} | x_t(\Theta)) := \mathcal{N}(||\hat{n}_{m,t}-\hat{n}_t(x_t, \theta)||^2_2, \Sigma_n)\)

\(\mathbb{P}(c_t=0 | x_t(\Theta)) := sigmoid(\alpha\phi_t(x_t, \theta))\)

(Differential Collision)

Step 2 (The Hard Part): What is \(x_t(x_T, \theta)\) ?

(and its gradient)

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Online Learning Through Contact Dynamics

Option 1: DiffSim (Shooting)

Pretend \(\Theta = \{\theta, x_0\}\), then \(x_{t+1} = f_\theta(x_t)\)

MLE \(\tilde{x}_T = f^T_\theta(\tilde{x}_0)\)

Clear Problem: Unstable. Accurate \(\tilde{x}_T\) requires an accurate \(\tilde{x}_0\)

Option 2: DiffSim (Collocation) [with Prediction Loss]

Pretend \(\Theta = \{\theta, x_t\}\), add dynamics as a penalty.

\(\mathcal{L} := \sum_t-\log\mathbb{P}(m_t | x_t,\theta) + ||x_t - f_\theta(x_{t-1})||^2\)

Fundamental Problem: \(f\) could have near-0 or near-\(\infty\) gradients.

\(f_\theta(x_t) = g_\theta(x_t, \lambda_t)\)

\(\lambda_t = \min_\lambda h_\theta(x_t, \lambda)\)

 B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Contact Forces

(Approximately) Minimizing Graph Distance

 B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Analogy: \(y = H(x-\theta)\)

\(\theta\)

\(x\)

\(y\)

\(\mathcal{D}\)

Mean Square Error

\(\mathcal{L}_{MSE} = \sum_\mathcal{D}||y_{\mathcal{D}} - H(x_\mathcal{D}-\theta)||^2\)

MSE

Alternative: Graph Distance

\(\mathcal{L}_{GD} = \sum_\mathcal{D}\min_x||(x_{\mathcal{D}}, y_{\mathcal{D}}) - (x, H(x-\theta))||^2\)

Problem: look at the gradient w.r.t. \(\theta\).

It is 0 almost everywhere!

Trade-Off:

Pro: Loss gradient is finite (or bounded) everywhere.
Con: Potentially expensive inner optimization loop.

Learning with a Violation-Implicit Loss

Inner Opt (QP) over contact forces.

\(\mathbb{P}(m_t | x_t,\theta)\)

Active Learning with Expected Information Gain

\(\Theta\)

\(\mathcal{L}\)

\(\tilde{\Theta}\)

\(\Theta\)

\(\mathcal{L}\)

\(\tilde{\Theta}\)

Maximum Likelihood Estimate: \(\tilde{\Theta} = \arg\min_\Theta\mathcal{L}(\Theta)\)

\(\rightarrow \frac{d\mathcal{L}}{d\Theta}(\tilde{\Theta}) = 0\)

Information \(\rightarrow\) How certain am I? Ideally: answer without a strong prior.

How certain is this?

Noise Floor

Low Info

High Info

Observed Information:

\(\mathcal{I} := \sum_{m_t}\nabla_{\Theta}^2\mathcal{L}\)

Expected (Fisher) Information:

\(\mathcal{F} := \mathbb{E}_{m_t}\left[\nabla_{\Theta}^2\mathcal{L}\right]\)

\(= Var_{m_t}\left[\nabla_{\Theta}\mathcal{L}\right]\)

\(= \mathbb{E}_{m_t}\left[\nabla_{\Theta}\mathcal{L}\left(\nabla_{\Theta}\mathcal{L}\right)^T\right] \)

EIG \(:= \log\det(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I})\)

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Expected Information Gain (EIG)

Learn; Compute

Observed Info \(\mathcal{I}\)

Sample + Simulate

Expected Fisher Info \(\mathcal{F}\)

\(\max EIG := \log\det\left(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I}\right)\)

Choose actions where simulated, expected Fisher info is distinct from Observed info.

Information Maximization In Action

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Information Through Marginalization

Ongoing Work

Currently, to reparameterize from \(\Theta = x_t \rightarrow \Theta = x_T\), we pretend \(\frac{d x_t}{d x_T} = \mathbf{I}\). This will neglect significant dynamics. Ongoing work will address this.

Information Through Marginalization

We want \(\nabla_{x_T} \log\mathbb{P}(m_t | x_T)\)

Ongoing Work

\(=\nabla_{x_T} \log\int_{x_t}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)

Assume:

\(\mathbb{P}(x_t | x_T) \propto \exp(f(x_t, x_T))\)

\(\approx\nabla_{x_T} \log\sum_{x_t\sim U}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)

\(=softmax_{x_t\sim U}\left(\log \mathbb{P}(m_t | x_t)+ f(x_t, x_T)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)

\(=softmax_{x_t\sim MCMC}\left(\log \mathbb{P}(m_t | x_t)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)

No gradient through sampling

At the cost of sampling trajectories \(x_t\) (via MCMC), we bypass the inverse Jacobian.

(If we have a good guess \(\tilde{x}_t\), MCMC should be quick)

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Food Bite Acquisition as a Contextual Bandit
- Policy space reduction a priori
- Haptics as post hoc bandit context
Active Learning Through Contact
Community-Based Participatory Design
Where can PARs go from here?

PAR

Community-Based Participatory Design

E. Gordon et al, "An adaptable, safe, and portable robot-assisted feeding system.", HRI Companion 2024

Community-Based Participatory Design

A. Nanavati, E. Gordon et al, "Lessons learned from designing...", HRI 2025

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Food Bite Acquisition as a Contextual Bandit
Leveraging Haptic Sensing
- Model-Based Tactile Active Exploration
- Haptics as Post-Hoc Bandit Context
RAF: Community-Based Participatory Design
Where can PARs go from here?

Skills List

Bonus:

Other Leadership Activities

Responsibilities: Lead the other officers (Program Committee, Panel Committee, Sponsorship Chair, etc.); Secure Funding (NSF + AIJ grants); run the workshop day-of; coordinate travel for all attendees; Reviewer/AC of last resort.

What I didn't do:

reach out to some prospective speakers, update the website after creation

Why HRI?

Research Motivation: "positioning robots to work alongside humans in collaborative, scalable networks aimed at extending human capabilities."
- That has been a driving motivation for working on PARs.
"Do what works" mentality: model-based / physics explicit? learning-based / physics implicit? Encourages cultivating a broad knowledge base.
Keeping many of the "perks" of academia:
- Mentorship: continuously working with interns and postdocs
- Collaboration:
  - Internal: research engineers and across research teams
  - External: with both academic and industry partners
- Research Dissemination: presentation at conferences / workshops

Research Plans

Safe Active Exploration in Contact

As et al, "ActSafe...", ICLR 2025

Beneficial to play optimistically w.r.t. loss

Safe to play pessimistically w.r.t. model parameters

Leverage Fisher Information to better quantify uncertainty through contact dynamics.
Identify loss components that are safety-critical vs. performance-critical.

Dressing "Acquisition"

Kapusta et al, Autonomous Robots 2019; Jenamani et al, HRI 2024; McMurray, "Robotics... for poultry processing" (Book, 2011)

Food Preparation

Lots of Work

Bite Transfer

Some Work

Cloth Folding

Lots of Work

Sleeve Insertion

Some Work

Picking up and orienting clothes in preparation for insertion motions.

Multi-Function PARs

Can feeding / dressing / ambulation / etc. all be done with a single system?
How can information be shared between tasks?
How will users feel about having a robot for the entire day?

Thank you!

DAIR Lab

Amal Nanavati

Tractable Adaptability

Ethan K. Gordon

Postdoc, University of Pennsylvania

PhD 2023, University of Washington

Online and Active Learning for Physically Assistive Robotics

Empowering Physically Assistive Robots

Physically Assistive Robots (PARs)

What is needed for PARs?

What is needed for PARs?

Tractable Adaptability

Multimodal Active Learning

Physically Assistive Robots

The Technology/Application Cycle

Physically Assistive Robots (PARs)

Active Learning in Contact

Multimodal Active Learning

Physically Assistive Robots

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Food Bite Acquisition as a Contextual Bandit

Active Learning Through Contact

Community-Based Participatory Design

Where can PARs go from here?

PAR

Do we need autonomy? What kind?

Community-Based Participatory Research

Example: Why Single-Utensil Feeding?

The Assistive Dexterous Arm (ADA)

User Studies Capture Diversity

User Studies Capture Metrics

Acceptance Can be User-Dependent

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Food Bite Acquisition as a Contextual Bandit

Active Learning Through Contact

Community-Based Participatory Design

Where can PARs go from here?

PAR

AL

Data Driven Bite Acquisition

Online Learning with Policy Space Reduction

Leveraging Expert Data

Imitation Learning for Policy Space Reduction

Emergent Behavior

Is this expressive enough?

The Multi-Arm Bandit (MAB)

The (Stochastic) Contextual Bandit

Bite Acquisition as a Contextual Bandit

Bite Acquisition as a Contextual Bandit

Post Hoc Haptics for Bite Acquisition

Supervised vs. Bandit Learning

Post Hoc Haptics for Bite Acquisition

Are the 11 actions tractable for learning?

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Food Bite Acquisition as a Contextual Bandit

Active Learning Through Contact

Community-Based Participatory Design

Where can PARs go from here?

AL

Bite Acquisition: The Data Problem

Deep Dive on Active Tactile Exploration

Online Learning:

Active Exploration:

Dynamic Object System Identification

Choose:

Measure:

Find:

Previous Work in Tactile SysID

Learning and Exploration Through Contact

Online Learning Through Contact Dynamics

Online Learning Through Contact Dynamics

(Approximately) Minimizing Graph Distance

Learning with a Violation-Implicit Loss

Active Learning with Expected Information Gain

Expected Information Gain (EIG)

Information Maximization In Action

Information Through Marginalization

Information Through Marginalization

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics