Ethan Job Talk Med

Empowering Physically Assistive Robots

Ethan K. Gordon

Postdoc, University of Pennsylvania

PhD 2023, University of Washington

with Contact-Rich Active Learning

Physically Assistive Robots (PARs)

Physical device that moves least partially autonomously.

Aids in tasks that users would otherwise find impossible, uncomfortable, or inconvenient.

Direct or indirect physical contact with people and the environment.

Activities of daily living (ADLs) for those with short- or long-term physical impairments.

Rehabilitation and physical therapy.

Assisting nurses and physicians with patient care.

Physically Assistive Robots (PARs)

“If I can have a robot do it, I can learn to adapt to it, but it would be me feeding me, and that would be huge”

Tyler Schrenk

1985-2023

The Promise of PARs:

Empowerment
Independence

What is needed for PARs?

Contact-Rich Manipulation

Sliding to clean the spoon and bowl
Shaking to smoothen
In-Mouth Hand-Off
(vision-denied)

Online Adaptation

Bite Size Adjustment

What is needed for PARs?

Online Adaptation

Totally Different Food
Multi-bite: different shapes for each bite

There is no time for

re-training!

Key Technology:

Tractable Adaptability

How can robots adapt at deployment-time

efficiently, safely, and portably?

Policy Space Reduction

Model-Based Methods

Leveraging Haptics

Support

Inform

Multimodal Active Learning

Physically Assistive Robots

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
Active Learning with Dynamic Contact
Community-Based Participatory Design
Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Do we need autonomy? What kind?

Community-Based Participatory Research

It is important to ask users, observational and qualitative research before experimentation.

Time Per Bite:

Caretaker: ~20s
Preferred: <2min
Teleoperated Robot: 5-40min

Why Single-Utensil Feeding?

It's intuitive and familiar.

The Assistive Dexterous Arm (ADA)

User Studies Capture Diversity

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

Acceptance is User-Dependent, But High

Users with more limited mobility had a preference for greater autonomy, even if it experienced errors.

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

Autonomy Preference Given Errors

User Rating

User Studies Capture Metrics

Trade-off between autonomy (with chance of error) and high-effort manual control.

What errors are tolerable? Minimum Food Acquisition Success Rate: 80%

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
Active Learning with Dynamic Contact
Community-Based Participatory Design
Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Data Driven Bite Acquisition

R. Feng, Y. Kim, G. Lee, E. K. Gordon, et al, "...Generalizing skewering strategies...", ISRR 2019

Food simulation is hard*, but we can collect real data. What if we just use machine learning?

Example: 10 trajectories x 16 food types

85 person-hours

Is it portable?
Is it safe?
Is it adaptable?

Leveraging Expert Data

 T. Bhattacharjee et al, “Towards Robotic Feeding...", R-AL 2019

Qualitative Taxonomy

Insights:

Discrete classes of strategies

Lots of variations within those classes

Emergent Discrete Action Space

Wiggling

Tilting

High Pressure

Scooping

 E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023

Is this expressive enough?

Yes!

(Note the 80% acceptance threshold)

 E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023

Online learning with a discrete action space is easier, safer, and more predictable for the patient.

Online Learning for Bite Acquisition

Discrete Actions $a$

Visual Context: $c$ $c_t$

$r$ : Success/Failure $\{1, 0\}$

 E. K. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020

Learn some parameters $\theta$

$\mathbb{P}(r=1) \approx f_\theta(c, a)$

$N_a$ is small enough that we can do this at deployment time!

$ra+ϵ=fθ∗(c)=gϕ∗(p)r_a + \epsilon = f_{\theta^*}(c) = g_{\phi^*}(p)$

Leveraging Haptic Data

Haptic data is really good for food classification, and we already have one for safety!

55ms of force data:

T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Post Hoc Haptics for Bite Acquisition

E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Consider a joint loss model:

$ra+ϵ=fθ∗(c)=gϕ∗(p)r_a + \epsilon = f_{\theta^*}(c) = g_{\phi^*}(p)$ $\mathbb{P}(r = 1) = f_\theta(c) = g_\phi(p)$

visual context

haptic context

$f_\theta(c, a) = [0.9, 0.1, 0.5, 0.8, \ldots]$

Once either model is learned, the complexity of the other one is significantly reduced:

Example:

After only 1 action, robot determines that kiwi $\approx$ banana, and can impute the counterfactual.

Observe $a=1 \rightarrow [0, ?, ?, ?, \ldots]$

$g_\phi(p, a) = [0.8, 0.2, 0.4, 0.7, \ldots]$

Can provide the counterfactual

$O(dim⁡c)→O(min⁡(dim⁡c,dim⁡p))O(\dim c) \rightarrow O(\min(\dim c,\dim p))$

Are the 11 actions tractable for learning?

E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Yes

New foods take ~7-8 actions to learn to user satisfaction.

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
Active Learning with Dynamic Contact
Community-Based Participatory Design
Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Active Tactile Exploration Through Contact

[1] Kapusta et al, Autonomous Robots 2019; [2] Hello Robot; [3] Kuka

Dressing

Rehabilitation

How can we perform active learning with:

Dynamic Objects?

Visual Occlusions (and other uncertainties)?

Distilled Challenge: Can a robot do this?

Can we do better if we have tactile sensors and robust simulators?

Surgery

System Identification and Measuring Uncertainty

Choose:

Robot Trajectory

Measure:

Find:

Object Geometry and Pose
How certain are we?

Active Exploration with the Trifinger Robot

Exploration to Maximize Information

Learn; Compute

Observed Information $I\mathcal{Irm$

Sample Actions + Simulate

Expected Future Information

Choose actions where simulated, expected future info is distinct from observed info.

Information Maximization In Action

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
Active Learning with Dynamic Contact
Community-Based Participatory Design
Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Community-Based Participatory Design

E. Gordon et al, "An adaptable, safe, and portable robot-assisted feeding system.", HRI Companion 2024

Community-Based Participatory Design

A. Nanavati, E. Gordon et al, "Lessons learned from designing...", HRI 2025

Summary

The Promise of Physically Assistive Robotics
Robot-Assisted Feeding: User-Defined Metrics
Food Bite Acquisition as a Contextual Bandit
- Policy space reduction a priori
- Haptics as post hoc bandit context
Active Learning with Dynamic Contact
Community-Based Participatory Design
Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Challenges in Feeding and Beyond

Multimodal Active Learning

Physically Assistive Robots

Nanavati, Alves-Oliveira, Schrenk, Gordon, et al., HRI 2023

Challenges in Feeding and Beyond

Multimodal Active Learning

Physically Assistive Robots

Many are relevant across multiple tasks!

Dressing

Grooming

Rehabilitation

Safe Active Exploration

Multimodal Active Learning

Physically Assistive Robots

As et al, "ActSafe...", ICLR 2025

Beneficial to play optimistically w.r.t. loss

Safer to play pessimistically w.r.t. model parameters

Which loss components are:

Safety-Critical

(zero user error tolerance)

vs.

Performance-Critical

(higher user error tolerance)

Adjust play for each metric separately.

Leveraging Foundation Models

Multimodal Active Learning

Physically Assistive Robots

"VLAs": ChatGPT for Robots. Impressive general performance.
I would not deploy these models directly with patients right now.
Is it portable? No: internet access or large computers required.
Is it safe? Not guaranteed.
Is it adaptable? Not at deployment time (but can be fine-tuned in advance).

π0.5; Kapusta et al, Autonomous Robots 2019

Multi-Function Longitudinal Studies

Multimodal Active Learning

Physically Assistive Robots

Can feeding / dressing / ambulation / physical therapy / etc. all be done with a single system or connected ensemble?
How can information be shared between tasks?
How will users feel about having a robot 24/7 for weeks or months?

Thank you!

DAIR Lab

Amal Nanavati

Empowering Physically Assistive Robots

Ethan K. Gordon

Postdoc, University of Pennsylvania

PhD 2023, University of Washington

with Contact-Rich Active Learning

Online Learning with Policy Space Reduction

Hierarchy and Bandits

Data Driven Bite Acquisition

 E. Heiden et al, “DiSECt", RSS 2021

(only planar cutting)

Imitation Learning for Policy Space Reduction

 E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023

Splines and force/torque thresholds.

Comparable with Euclidean Metric.

The Multi-Arm Bandit (MAB)

$r \sim$

$\mathcal{N}(0.5, 1)$

Reward

$\mathcal{N}(0.8, 1)$

$\mathcal{N}(0.1, 1)$

Interaction Protocol:

Select $a = \pi(a_{0\ldots t}, r_{0\ldots t})$
Observe $r$
Update $\pi$

$a =$

$1$

$2$

$3$

Metric: Regret: $\mathbb{E}[r(a^*) - r(a_t)]$

Test time metric, balances exploration vs. exploitation,

often theoretically bounded

The (Stochastic) Contextual Bandit

$r \sim$

$\mathcal{N}(\mu_1(c), \sigma)$

Reward

Interaction Protocol:

Observe $c_t$
Select $a_t = \pi(c_t)$
Observe $r(a_t, c_t)$
Update $\pi$

$a =$

$1$

$2$

$3$

$\mathcal{N}(\mu_2(c), \sigma)$

$\mathcal{N}(\mu_3(c), \sigma)$

$c_t$

Supervised vs. Bandit Learning

$l_a(c) = [0, 1, 1, 1, 1, 1, 1, 1, 1, 1]$

Supervised Learning sees $l_a(c) \forall a$

Full Feedback

$c$

Bandit Algorithm sees $l_{a_t}(c)$

Bandit Feedback (Harder)

No counterfactual.

E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Regret: $O(\sqrt{T})$

Regret: $O(\dim a \dim c * \sqrt{T})$

Previous Work in Tactile SysID

Static Objects: "assume a sensor that can detect contact before causing movement" [2]

Utilizes 2D OR discrete object priors.

Spatially Sparse Data -> Active Learning

 [1] Hu et al, Biomimetic Intelligence and Robotics 2024 ; [2] Xu et al. "TANDEM3D...", ICRA 2023

Online Learning Through Contact

Problem Formulation

 E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026

$m_t$

$\ldots$

$x_T$ ?

Example: we measure contact at $t$ .

(Learning) Where is the object at $t=T$ ?

(Information) How certain are we?

Measurement Model: $\mathbb{P}(m_t | \theta, x_t)$

Dynamics: $x_{t+1} = f(\theta, x_t)$

$\theta$ ?

Online Learning

 E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Maximum (Log) Likelihood as Trajectory Optimization

$\mathbb{P}(m_t | x_T) \rightarrow \mathbb{P}(m_t | \tau=[x_0,\ldots,x_T])\mathbb{P}(\tau)$

Loss $\mathcal{L} := -\sum_t\log\mathbb{P}(m_t | x_t) + ||x_t - f(x_{t-1})||^2$

Key Difficulty: contact dynamics $f$ often have near-0 or near- $\infty$ gradients.

(Approximately) Minimizing Graph Distance

 B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Analogy: $y = H(x-\theta)$

$\theta$

$x$

$y$

$\mathcal{D}$

Mean Square Error

$\mathcal{L}_{MSE} = \sum_\mathcal{D}||y_{\mathcal{D}} - H(x_\mathcal{D}-\theta)||^2$

MSE

Alternative: Graph Distance

$\mathcal{L}_{GD} = \sum_\mathcal{D}\min_x||(x_{\mathcal{D}}, y_{\mathcal{D}}) - (x, H(x-\theta))||^2$

Problem: look at $\nabla_\theta \mathcal{L}_{MSE}$ .

It is 0 or undefined everywhere!

Trade-Off:

Pro: Loss gradient is finite (or bounded) almost everywhere!
Con: Potentially expensive inner optimization loop.

Quantifying Information Without a Prior

$\Theta$

$\mathcal{L}$

$\tilde{\Theta}$

$\Theta$

$\mathcal{L}$

$\tilde{\Theta}$

Maximum Likelihood Estimate: $\tilde{\Theta} = \arg\min_\Theta\mathcal{L}(\Theta)$

$\rightarrow \frac{d\mathcal{L}}{d\Theta}(\tilde{\Theta}) = 0$

Information $\rightarrow$ How certain am I? Ideally: answer without a strong prior.

How certain is this?

Noise Floor

Low Info

High Info

Past (Observed) Information:

$\mathcal{I} := \sum_{m_t}\nabla_{\Theta}\mathcal{L}\left(\nabla_{\Theta}\mathcal{L}\right)^T$

Future (Fisher) Information:

$\mathcal{F} := Var_{m_t}\left[\nabla_{\Theta}\mathcal{L}\right]$

$= \mathbb{E}_{m_t}\left[\nabla_{\Theta}\mathcal{L}\left(\nabla_{\Theta}\mathcal{L}\right)^T\right]$

Expected Information Gain (EIG) $:= \log\det(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I})$

 E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Computing Observed Information

Computing Fisher Information

Key Difficulty: Backwards simulation isn't well-defined for Coulomb frictional contact.

 E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026

"Quasi-Static" Solution: pretend $\frac{\partial x_t}{\partial x_{t+1}} = \frac{\partial x_{t+1}}{\partial x_{t}} = \mathbf{I}$

$\mathcal{I} = \sum_t \mathbb{E}_{m_t}[(\nabla_{x_t}\mathbb{P}(m_t|x_t))^2]$

Pro: Easy to compute

For Gaussian measurement model, no sampling required for $\mathbb{E}$

$\mathcal{I} := Var_{m_t}\left[\nabla_{(\theta, x_T)}\mathcal{L}\right] = Var_{m_t}\left[\nabla_{(\theta, \tau)}\mathcal{L}\nabla_{x_T}\tau\right]$

Recall: out loss optimizes the entire trajectory $\tau$ ,

but we want information about $x_T$

$\frac{\partial x_t}{\partial x_{t+1}}$

Gradient of "backwards simulation"

Information Through Marginalization

Ongoing Work

"Quasi-Static" Solution won't work for more dynamic systems.

Can we do better? Yes, through marginalization.

$\approx softmax_{\tau\sim MCMC}\left(\log \mathbb{P}(m_0 | x_0)\right) \cdot \left(\nabla_{x_T}||f(x_{T-1}) - x_T||_2^2\right)$

$=\nabla_{x_T} \log\int_{\tau}\mathbb{P}(m_0 | \tau)\mathbb{P}(\tau| x_T)$

We want $\nabla_{x_T} \log\mathbb{P}(m_0 | x_T)$

Con: We introduce sampling. However...

Pros:

No gradient of $f$ required
For sampling: we already have a good guess of $\tau$ from the learning algorithm

Journal extension in the works...

Information Through Marginalization

We want $\nabla_{x_T} \log\mathbb{P}(m_t | x_T)$

Ongoing Work

$=\nabla_{x_T} \log\int_{x_t}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)$

Assume:

$\mathbb{P}(x_t | x_T) \propto \exp(f(x_t, x_T))$

$\approx\nabla_{x_T} \log\sum_{x_t\sim U}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)$

$=softmax_{x_t\sim U}\left(\log \mathbb{P}(m_t | x_t)+ f(x_t, x_T)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)$

$=softmax_{x_t\sim MCMC}\left(\log \mathbb{P}(m_t | x_t)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)$

No gradient through sampling

At the cost of sampling trajectories $x_t$ (via MCMC), we bypass the inverse Jacobian.

(If we have a good guess $\tilde{x}_t$ , MCMC should be quick)

Assistive Direct pHRI

Multimodal Active Learning

Physically Assistive Robots

Large communities in HRI and Contact-Rich Manipulation
The overlap was much smaller.

Leveraging Model-Based Methods

Multimodal Active Learning

Physically Assistive Robots

Dataset $\mathcal{D}$

Perception

Learned Model + Policy

$\mathcal{L}$

VLA

Run Classical Techniques on Learned Model:

Information Quantification (Observed and $\mathbb{E}_\pi$ )
Generate / efficiently sample fine-tuning sim data.
Offline MPC when compute or connectivity are limited (portability).

Uncertainty Quantification

Images

$\mathcal{L}$

Implicit approach, e.g. look at action entropy [1]
Alternative: try to learn that uncertainty explicitly.
Used privileged simulator information to compute observed info $\approx$ uncertainty.

States

Observed Information:

$\mathcal{I} := \sum_{m_t}\nabla_{\Theta}\mathcal{L}\left(\nabla_{\Theta}\mathcal{L}\right)^T$

Multimodal Active Learning

Physically Assistive Robots

VLA

[1] Yang et al, "Uncertainty-aware Observation Reinjection...", preprint

Empowering Physically Assistive Robots

Physically Assistive Robots (PARs)

Physically Assistive Robots (PARs)

What is needed for PARs?

What is needed for PARs?

Key Technology:

Tractable Adaptability

Multimodal Active Learning

Physically Assistive Robots

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Online Learning for Food Acquisition

Active Learning with Dynamic Contact

Community-Based Participatory Design

Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Do we need autonomy? What kind?

Community-Based Participatory Research

Why Single-Utensil Feeding?

The Assistive Dexterous Arm (ADA)

User Studies Capture Diversity

Acceptance is User-Dependent, But High

User Studies Capture Metrics

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Online Learning for Food Acquisition

Active Learning with Dynamic Contact

Community-Based Participatory Design

Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Data Driven Bite Acquisition

Leveraging Expert Data

Emergent Discrete Action Space

Is this expressive enough?

Online Learning for Bite Acquisition

Leveraging Haptic Data

Post Hoc Haptics for Bite Acquisition

Are the 11 actions tractable for learning?

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Online Learning for Food Acquisition

Active Learning with Dynamic Contact

Community-Based Participatory Design

Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Active Tactile Exploration Through Contact

System Identification and Measuring Uncertainty

Choose:

Measure:

Find:

Active Exploration with the Trifinger Robot

Exploration to Maximize Information

Information Maximization In Action

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Online Learning for Food Acquisition

Active Learning with Dynamic Contact

Community-Based Participatory Design

Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Community-Based Participatory Design

Community-Based Participatory Design

Summary

The Promise of Physically Assistive Robotics

Robot-Assisted Feeding: User-Defined Metrics

Food Bite Acquisition as a Contextual Bandit

Active Learning with Dynamic Contact

Community-Based Participatory Design

Where can PARs go from here?

Multimodal Active Learning

Physically Assistive Robots

Challenges in Feeding and Beyond