Tractable Adaptability

Ethan K. Gordon

Postdoc, University of Pennsylvania

PhD 2023, University of Washington

Online and Active Learning for Physically Assistive Robotics

Physically Assistive Robots (PARs)

If I can have a robot do it, I can learn to adapt to it, but it would be me feeding me, and that would be huge”

 

Tyler Schrenk

1985-2023

The Promise of PARs:

  • Empowerment
  • Independence

What is needed for PARs?

Contact-Rich Manipulation

  • Sliding to clean the spoon and bowl
  • Shaking to smoothen
  • In-Mouth Hand-Off
    (vision-denied)

 

Online Adaptation

  • Bite Size Adjustment

What is needed for PARs?

Online Adaptation

  • Totally Different Food
  • Multi-bite: different shapes for each bite

 

There is no time for

re-training!

Tractable Adaptability

How can robots efficiently learn, during deployment, how to manipulate previously-unseen objects?

Policy Space Reduction
Model-Based Methods
Leveraging Haptics

The Technology/Application Cycle

Support

Inform

Physically Assistive Robots (PARs)

Active Learning in Contact

Support

Inform

Multimodal Active Learning

Physically Assistive Robots

Summary

  • The Promise of Physically Assistive Robotics

  • Robot-Assisted Feeding: User-Defined Metrics

  • Food Bite Acquisition as a Contextual Bandit

  • Leveraging Haptic Sensing

    • Model-Based Tactile Active Exploration
    • Haptics as Post-Hoc Bandit Context
  • RAF: Community-Based Participatory Design

  • Where can PARs go from here?

Do we need autonomy? What kind?

Community-Based Participatory Research

It is important to ask users, observational and qualitative research before experimentation.

Time Per Bite:

  • Caretaker: ~20s
  • Preferred: <2min
  • Teleoperated Robot: 5-40min

 

Example: Why Single-Utensil Feeding?

It's intuitive and familiar.

Obi:

The Assistive Dexterous Arm (ADA)

User Studies Capture Diversity

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

User Studies Capture Metrics 

Trade-off between autonomy (with chance of error) and high-effort manual control.

What errors are tolerable?

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

Acceptance Can be User-Dependent

Users with more limited mobility had a preference for greater autonomy

And greater tolerance for errors.

 

But the average was still high, e.g.,

Desired Food Acquisition Success Rate:

80%

T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020

Summary

  • The Promise of Physically Assistive Robotics

  • Robot-Assisted Feeding: User-Defined Metrics

  • Food Bite Acquisition as a Contextual Bandit

  • Leveraging Haptic Sensing

    • Model-Based Tactile Active Exploration
    • Haptics as Post-Hoc Bandit Context
  • RAF: Community-Based Participatory Design

  • Where can PARs go from here?

Bite Acquisition: A Data Problem

 E. Heiden et al, “DiSECt", RSS 2021 ; R. Feng et al, "...Generalizing skewering strategies...", ISRR 2019 

Food simulation is not quite there.

(this is just planar cutting)

Real data takes a lot of time.

Example: 10 trajectories x 16 food types

85 person-hours

Case for online learning:

Data collection for food is uniquely expensive.

Large models limit portability.

Desire life-long deployment.

 

Policy Space Reduction

Hierarchy and Bandits

*(especially pre-2023)

Leveraging Expert Data

 T. Bhattacharjee et al, “Towards Robotic Feeding...", R-AL 2019 

Qualitative Taxonomy

Insights:

Discrete classes of strategies

Lots of variations within those classes

Imitation Learning for Policy Space Reduction

 E. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023 

Splines and force/torque thresholds.

Comparable with Euclidean Metric.

Emergent Behavior

Wiggling

Tilting

High Pressure

Scooping

 E. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023 

Is this expressive enough?

Yes!

(Note the 80% acceptance threshold)

 E. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023 

Online learning with a discrete action space maps cleanly on the contextual bandit setting.

The Multi-Arm Bandit (MAB)

\(l \sim\)

\( \mathcal{N}(0.5, 1)\)

Loss

\( \mathcal{N}(0.8, 1)\)

\( \mathcal{N}(0.1, 1)\)

Interaction Protocol:

  • Select \(a = \pi(a_{0\ldots t}, l_{0\ldots t})\)
  • Observe \(l\)
  • Update \(\pi\)

\(a =\)

\(1\)

\(2\)

\(3\)

Metric: Regret: \(\mathbb{E}[l(a^*) - l(a_t)]\)

Test time metric, balances exploration vs. exploitation,

often theoretically bounded

The (Stochastic) Contextual Bandit

\(l \sim\)

\( \mathcal{N}(\mu_1(c), \sigma)\)

Loss

Interaction Protocol:

  • Observe \(c_t\)
  • Select \(a_t = \pi(c_t)\)
  • Observe \(l(a_t, c_t)\)
  • Update \(\pi\)

\(a =\)

\(1\)

\(2\)

\(3\)

\( \mathcal{N}(\mu_2(c), \sigma)\)

\( \mathcal{N}(\mu_3(c), \sigma)\)

\(c_t\)

Bite Acquisition as a Contextual Bandit

Discrete Actions \(a\)

Visual Context: \(c_t\)

\(l\): Success/Failure \(\{0, 1\}\)

 E. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020 

Eye-in-Hand RGBD

SegmentAnything

ResNet Features

Bite Acquisition as a Contextual Bandit

\(c\): Visual Context

Eye-in-Hand RGBD

SegmentAnything

ResNet Features

\(l\): Success/Failure

\(\{0, 1\}\)

\(a\):

Control Policy

Action Space Trade-Off:

  • Expressive enough for general success
  • Not too big for tractable learning
 E. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020 

Summary

  • The Promise of Physically Assistive Robotics

  • Robot-Assisted Feeding: User-Defined Metrics

  • Food Bite Acquisition as a Contextual Bandit

  • Leveraging Haptic Sensing

    • Model-Based Tactile Active Exploration
    • Haptics as Post-Hoc Bandit Context
  • RAF: Community-Based Participatory Design

  • Where can PARs go from here?

Deep Dive on Active Tactile Exploration

Online Learning:

  • Given data, what is our best guess of the object's parameters and location?

Active Exploration:

  • Estimate how much information we have from previous data.

  • Choose the next action to maximize the novel information collected.

Dynamic Object System Identification

Choose:

  • Robot Trajectory \(r_t\)

Measure:

Find:

  • Object Geometry \(\theta^*\)

  • Object Pose \(x^*_T\)

Previous Work in Tactile SysID

Static Objects: "assume a sensor that can detect contact before causing movement" [2]

Utilizes 2D OR discrete object priors.

Spatially Sparse Data -> Active Learning

 [1] Hu et al, Biomimetic Intelligence and Robotics 2024 ; [2] Xu et al. "TANDEM3D...", ICRA 2023 

Learning and Exploration Through Contact

Any gradients \(\nabla_\Theta\mathcal{L}\) have inverse Jacobian terms: \(\frac{\partial x_t}{\partial x_T}\)

Fundamental Problem:

What is the sensitivity of past measurements to the current state?

For e.g. Coulomb friction, this is ill-posed.

(Even the simpler: "what is the past state given the current state" is unanswerable)

For both learning and exploration, how do we handle this?

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Log-Likelihood Loss: \(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)

(Past) Measurements

Geometry and (Current) State

Online Learning Through Contact Dynamics

Goal: Define a Loss Function as a Negative-Log-Likelihood

\(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)

\phi

Step 1: Define a per-timestep Measurement Model

\(\mathbb{P}(\hat{n}_{m,t} | x_t(\Theta)) := \mathcal{N}(||\hat{n}_{m,t}-\hat{n}_t(x_t, \theta)||^2_2, \Sigma_n)\)

\(\mathbb{P}(c_t=0 | x_t(\Theta)) := sigmoid(\alpha\phi_t(x_t, \theta))\)

(Differential Collision)

Step 2 (The Hard Part): What is \(x_t(x_T, \theta)\) ?

(and its gradient)

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Online Learning Through Contact Dynamics

Option 1: DiffSim (Shooting)

Pretend \(\Theta = \{\theta, x_0\}\), then \(x_{t+1} = f_\theta(x_t)\)

MLE \(\tilde{x}_T = f^T_\theta(\tilde{x}_0)\)

Clear Problem: Unstable. Accurate \(\tilde{x}_T\) requires an accurate \(\tilde{x}_0\)

Option 2: DiffSim (Collocation) [with Prediction Loss]

Pretend \(\Theta = \{\theta, x_t\}\), add dynamics as a penalty.

\(\mathcal{L} := \sum_t-\log\mathbb{P}(m_t | x_t,\theta) + ||x_t - f_\theta(x_{t-1})||^2\)

Fundamental Problem: \(f\) could have near-0 or near-\(\infty\) gradients.

\(f_\theta(x_t) = g_\theta(x_t, \lambda_t)\)

\(\lambda_t = \min_\lambda h_\theta(x_t, \lambda)\)

 B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026 

Contact Forces

(Approximately) Minimizing Graph Distance

 B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026 

Analogy: \(y = H(x-\theta)\)

\(\theta\)

\(x\)

\(y\)

\(\mathcal{D}\)

Mean Square Error

\(\mathcal{L}_{MSE} = \sum_\mathcal{D}||y_{\mathcal{D}} - H(x_\mathcal{D}-\theta)||^2\)

MSE

GD

Alternative: Graph Distance

\(\mathcal{L}_{GD} = \sum_\mathcal{D}\min_x||(x_{\mathcal{D}}, y_{\mathcal{D}}) - (x, H(x-\theta))||^2\)

Problem: look at the gradient w.r.t. \(\theta\).

It is 0 almost everywhere!

Trade-Off: 

  • Pro: Loss gradient is finite (or bounded) everywhere.
  • Con: Potentially expensive inner optimization loop.

Learning with a Violation-Implicit Loss

Inner Opt (QP) over contact forces.

\(\mathbb{P}(m_t | x_t,\theta)\)

Active Learning with Expected Information Gain

\(\Theta\)

\(\mathcal{L}\)

\(\tilde{\Theta}\)

\(\Theta\)

\(\mathcal{L}\)

\(\tilde{\Theta}\)

Maximum Likelihood Estimate: \(\tilde{\Theta} = \arg\min_\Theta\mathcal{L}(\Theta)\)

\(\rightarrow \frac{d\mathcal{L}}{d\Theta}(\tilde{\Theta}) = 0\)

Information \(\rightarrow\) How certain am I? Ideally: answer without a strong prior.

How certain is this?

Noise Floor

Low Info

High Info

Observed Information:

\(\mathcal{I} := \sum_{m_t}\nabla_{\Theta}^2\mathcal{L}\)

Expected (Fisher) Information:

\(\mathcal{F} := \mathbb{E}_{m_t}\left[\nabla_{\Theta}^2\mathcal{L}\right]\)

\(= Var_{m_t}\left[\nabla_{\Theta}\mathcal{L}\right]\)

\(= \mathbb{E}_{m_t}\left[\nabla_{\Theta}\mathcal{L}\left(\nabla_{\Theta}\mathcal{L}\right)^T\right] \)

EIG \(:= \log\det(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I})\)

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026 

Expected Information Gain (EIG)

Learn; Compute

Observed Info \(\mathcal{I}\)

Sample + Simulate

Expected Fisher Info \(\mathcal{F}\)

\(\max EIG := \log\det\left(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I}\right)\)

Choose actions where simulated, expected Fisher info is distinct from Observed info.

Information Maximization In Action

 E. Gordon et al, "Active Tactile Exploration...", ICRA 2026 

Information Through Marginalization

Ongoing Work

Currently, to reparameterize from \(\Theta = x_t \rightarrow \Theta = x_T\), we pretend \(\frac{d x_t}{d x_T} = \mathbf{I}\). This will neglect significant dynamics. Ongoing work will address this.

Information Through Marginalization

We want \(\nabla_{x_T} \log\mathbb{P}(m_t | x_T)\)

Ongoing Work

\(=\nabla_{x_T} \log\int_{x_t}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)

Assume:

\(\mathbb{P}(x_t | x_T) \propto \exp(f(x_t, x_T))\)

\(\approx\nabla_{x_T} \log\sum_{x_t\sim U}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)

\(=softmax_{x_t\sim U}\left(\log \mathbb{P}(m_t | x_t)+ f(x_t, x_T)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)

\(=softmax_{x_t\sim MCMC}\left(\log \mathbb{P}(m_t | x_t)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)

No gradient through sampling

At the cost of sampling trajectories \(x_t\) (via MCMC), we bypass the inverse Jacobian.

(If we have a good guess \(\tilde{x}_t\), MCMC should be quick)

Summary

  • The Promise of Physically Assistive Robotics

  • Robot-Assisted Feeding: User-Defined Metrics

  • Food Bite Acquisition as a Contextual Bandit

  • Leveraging Haptic Sensing

    • Model-Based Tactile Active Exploration
    • Haptics as Post-Hoc Bandit Context
  • RAF: Community-Based Participatory Design

  • Where can PARs go from here?

Post Hoc Haptics for Bite Acquisition

Haptic data is really good for food classification.

55ms of force data:

T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Supervised vs. Bandit Learning

\(l_a(c) = [0, 1, 1, 1, 1, 1, 1, 1, 1, 1]\)

Supervised Learning sees \(l_a(c) \forall a\)

Full Feedback

\(c\)

Bandit Algorithm sees \(l_{a_t}(c)\)

Bandit Feedback (Harder)

No counterfactual.

Post Hoc Haptics for Bite Acquisition

E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Consider a joint loss model:

\(l_a + \epsilon = c \cdot \theta^*_a = p \cdot \phi^*_a\)

\(\epsilon \sim \mathcal{N}\)

visual context

haptic context

\(\tilde{\theta},\tilde{\phi} = \arg\min_{\theta, \phi}\)

      \(\sum_{t : \text{action}=a}||c_t\cdot\theta_a - l_t||^2 + ||p_t\cdot\phi_a - l_t||^2\)

s.t. \(\forall t,a: c_t \cdot \theta_a = p_t \cdot \phi_a\)

Once either model is learned, the other model reduces from bandit feedback to full feedback

Example:

After only 1 action, robot determines that kiwi \(\approx\) banana, and can impute the counterfactual.

Are the 11 actions tractable for learning?

T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021

Yes

 

New foods take ~7-8 actions to learn to user satisfaction.

Summary

  • The Promise of Physically Assistive Robotics

  • Robot-Assisted Feeding: User-Defined Metrics

  • Food Bite Acquisition as a Contextual Bandit

  • Leveraging Haptic Sensing

    • Model-Based Tactile Active Exploration
    • Haptics as Post-Hoc Bandit Context
  • RAF: Community-Based Participatory Design

  • Where can PARs go from here?

Community-Based Participatory Design

E. Gordon et al, "An adaptable, safe, and portable robot-assisted feeding system.", HRI Companion 2024

Community-Based Participatory Design

A. Nanavati, E. Gordon et al, "Lessons learned from designing...", HRI 2025

Summary

  • The Promise of Physically Assistive Robotics

  • Robot-Assisted Feeding: User-Defined Metrics

  • Food Bite Acquisition as a Contextual Bandit

  • Leveraging Haptic Sensing

    • Model-Based Tactile Active Exploration
    • Haptics as Post-Hoc Bandit Context
  • RAF: Community-Based Participatory Design

  • Where can PARs go from here?

Skills List

Bonus:

Other Leadership Activities

Responsibilities: Lead the other officers (Program Committee, Panel Committee, Sponsorship Chair, etc.); Secure Funding (NSF + AIJ grants); run the workshop day-of; coordinate travel for all attendees; Reviewer/AC of last resort.

What I didn't do:

reach out to some prospective speakers, update the website after creation

Why HRI?

  • Research Motivation: "positioning robots to work alongside humans in collaborative, scalable networks aimed at extending human capabilities."
    • That has been a driving motivation for working on PARs.
  • "Do what works" mentality: model-based / physics explicit? learning-based / physics implicit? Encourages cultivating a broad knowledge base.
  • Keeping many of the "perks" of academia:
    • Mentorship: continuously working with interns and postdocs
    • Collaboration:
      • Internal: research engineers and across research teams
      • External: with both academic and industry partners
    • Research Dissemination: presentation at conferences / workshops

Research Plans

Safe Active Exploration in Contact

As et al, "ActSafe...", ICLR 2025

Beneficial to play optimistically w.r.t. loss

Safe to play pessimistically w.r.t. model parameters

  1. Leverage Fisher Information to better quantify uncertainty through contact dynamics.
  2. Identify loss components that are safety-critical vs. performance-critical.

Dressing "Acquisition"

Kapusta et al, Autonomous Robots 2019; Jenamani et al, HRI 2024; McMurray, "Robotics... for poultry processing" (Book, 2011) 

Food Preparation

Lots of Work

Bite Transfer

Some Work

Cloth Folding

Lots of Work

Sleeve Insertion

Some Work

Picking up and orienting clothes in preparation for insertion motions.

Multi-Function PARs

?
  1. Can feeding / dressing / ambulation / etc. all be done with a single system?
  2. How can information be shared between tasks?
  3. How will users feel about having a robot for the entire day?

Thank you!

DAIR Lab

Amal Nanavati

Tractable Adaptability

Ethan K. Gordon

Postdoc, University of Pennsylvania

PhD 2023, University of Washington

Online and Active Learning for Physically Assistive Robotics

Ethan Faculty Job Talk

By Michael Posa

Ethan Faculty Job Talk

45min

  • 30