Tractable Adaptability
Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington


Online and Active Learning for Physically Assistive Robotics
Physically Assistive Robots (PARs)
“If I can have a robot do it, I can learn to adapt to it, but it would be me feeding me, and that would be huge”
Tyler Schrenk
1985-2023


The Promise of PARs:
- Empowerment
- Independence
What is needed for PARs?
Contact-Rich Manipulation
- Sliding to clean the spoon and bowl
- Shaking to smoothen
- In-Mouth Hand-Off
(vision-denied)
Online Adaptation
- Bite Size Adjustment
What is needed for PARs?
Online Adaptation
- Totally Different Food
- Multi-bite: different shapes for each bite
There is no time for
re-training!
Tractable Adaptability
How can robots efficiently learn, during deployment, how to manipulate previously-unseen objects?
Policy Space Reduction
Model-Based Methods
Leveraging Haptics
The Technology/Application Cycle
Support
Inform
Physically Assistive Robots (PARs)
Active Learning in Contact







Support
Inform
Multimodal Active Learning
Physically Assistive Robots












Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Food Bite Acquisition as a Contextual Bandit
-
Leveraging Haptic Sensing
- Model-Based Tactile Active Exploration
- Haptics as Post-Hoc Bandit Context
-
RAF: Community-Based Participatory Design
-
Where can PARs go from here?
Do we need autonomy? What kind?
Community-Based Participatory Research
It is important to ask users, observational and qualitative research before experimentation.
Time Per Bite:
- Caretaker: ~20s
- Preferred: <2min
- Teleoperated Robot: 5-40min
Example: Why Single-Utensil Feeding?
It's intuitive and familiar.
Obi:
The Assistive Dexterous Arm (ADA)
User Studies Capture Diversity
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020User Studies Capture Metrics
Trade-off between autonomy (with chance of error) and high-effort manual control.
What errors are tolerable?
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Acceptance Can be User-Dependent
Users with more limited mobility had a preference for greater autonomy.
And greater tolerance for errors.
But the average was still high, e.g.,
Desired Food Acquisition Success Rate:
80%
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020
Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Food Bite Acquisition as a Contextual Bandit
-
Leveraging Haptic Sensing
- Model-Based Tactile Active Exploration
- Haptics as Post-Hoc Bandit Context
-
RAF: Community-Based Participatory Design
-
Where can PARs go from here?
Bite Acquisition: A Data Problem
E. Heiden et al, “DiSECt", RSS 2021 ; R. Feng et al, "...Generalizing skewering strategies...", ISRR 2019
Food simulation is not quite there.
(this is just planar cutting)
Real data takes a lot of time.
Example: 10 trajectories x 16 food types
85 person-hours
Case for online learning:
Data collection for food is uniquely expensive.
Large models limit portability.
Desire life-long deployment.
Policy Space Reduction
Hierarchy and Bandits
*(especially pre-2023)
Leveraging Expert Data
T. Bhattacharjee et al, “Towards Robotic Feeding...", R-AL 2019

Qualitative Taxonomy
Insights:
Discrete classes of strategies
Lots of variations within those classes
Imitation Learning for Policy Space Reduction



E. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Splines and force/torque thresholds.
Comparable with Euclidean Metric.
Emergent Behavior
Wiggling
Tilting
High Pressure
Scooping
E. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Is this expressive enough?

Yes!
(Note the 80% acceptance threshold)
E. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Online learning with a discrete action space maps cleanly on the contextual bandit setting.
The Multi-Arm Bandit (MAB)



\(l \sim\)
\( \mathcal{N}(0.5, 1)\)
Loss
\( \mathcal{N}(0.8, 1)\)
\( \mathcal{N}(0.1, 1)\)
Interaction Protocol:
- Select \(a = \pi(a_{0\ldots t}, l_{0\ldots t})\)
- Observe \(l\)
- Update \(\pi\)
\(a =\)
\(1\)
\(2\)
\(3\)



Metric: Regret: \(\mathbb{E}[l(a^*) - l(a_t)]\)
Test time metric, balances exploration vs. exploitation,
often theoretically bounded
The (Stochastic) Contextual Bandit



\(l \sim\)
\( \mathcal{N}(\mu_1(c), \sigma)\)
Loss
Interaction Protocol:
- Observe \(c_t\)
- Select \(a_t = \pi(c_t)\)
- Observe \(l(a_t, c_t)\)
- Update \(\pi\)
\(a =\)
\(1\)
\(2\)
\(3\)
\( \mathcal{N}(\mu_2(c), \sigma)\)
\( \mathcal{N}(\mu_3(c), \sigma)\)

\(c_t\)
Bite Acquisition as a Contextual Bandit
Discrete Actions \(a\)
Visual Context: \(c_t\)

\(l\): Success/Failure \(\{0, 1\}\)
E. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020
Eye-in-Hand RGBD
SegmentAnything
ResNet Features
Bite Acquisition as a Contextual Bandit

\(c\): Visual Context
Eye-in-Hand RGBD
SegmentAnything
ResNet Features
\(l\): Success/Failure
\(\{0, 1\}\)

\(a\):
Control Policy
Action Space Trade-Off:
- Expressive enough for general success
- Not too big for tractable learning
E. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020
Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Food Bite Acquisition as a Contextual Bandit
-
Leveraging Haptic Sensing
- Model-Based Tactile Active Exploration
- Haptics as Post-Hoc Bandit Context
-
RAF: Community-Based Participatory Design
-
Where can PARs go from here?
Deep Dive on Active Tactile Exploration
Online Learning:
-
Given data, what is our best guess of the object's parameters and location?
Active Exploration:
-
Estimate how much information we have from previous data.
- Choose the next action to maximize the novel information collected.
Dynamic Object System Identification
Choose:
-
Robot Trajectory \(r_t\)
Measure:
Find:
-
Object Geometry \(\theta^*\)
-
Object Pose \(x^*_T\)

Previous Work in Tactile SysID
Static Objects: "assume a sensor that can detect contact before causing movement" [2]
Utilizes 2D OR discrete object priors.
Spatially Sparse Data -> Active Learning

[1] Hu et al, Biomimetic Intelligence and Robotics 2024 ; [2] Xu et al. "TANDEM3D...", ICRA 2023
Learning and Exploration Through Contact
Any gradients \(\nabla_\Theta\mathcal{L}\) have inverse Jacobian terms: \(\frac{\partial x_t}{\partial x_T}\)
Fundamental Problem:
What is the sensitivity of past measurements to the current state?
For e.g. Coulomb friction, this is ill-posed.
(Even the simpler: "what is the past state given the current state" is unanswerable)
For both learning and exploration, how do we handle this?
E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Log-Likelihood Loss: \(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)
(Past) Measurements
Geometry and (Current) State
Online Learning Through Contact Dynamics
Goal: Define a Loss Function as a Negative-Log-Likelihood
\(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)

Step 1: Define a per-timestep Measurement Model
\(\mathbb{P}(\hat{n}_{m,t} | x_t(\Theta)) := \mathcal{N}(||\hat{n}_{m,t}-\hat{n}_t(x_t, \theta)||^2_2, \Sigma_n)\)
\(\mathbb{P}(c_t=0 | x_t(\Theta)) := sigmoid(\alpha\phi_t(x_t, \theta))\)

(Differential Collision)
Step 2 (The Hard Part): What is \(x_t(x_T, \theta)\) ?
(and its gradient)

E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Online Learning Through Contact Dynamics
Option 1: DiffSim (Shooting)
Pretend \(\Theta = \{\theta, x_0\}\), then \(x_{t+1} = f_\theta(x_t)\)
MLE \(\tilde{x}_T = f^T_\theta(\tilde{x}_0)\)
Clear Problem: Unstable. Accurate \(\tilde{x}_T\) requires an accurate \(\tilde{x}_0\)
Option 2: DiffSim (Collocation) [with Prediction Loss]
Pretend \(\Theta = \{\theta, x_t\}\), add dynamics as a penalty.
\(\mathcal{L} := \sum_t-\log\mathbb{P}(m_t | x_t,\theta) + ||x_t - f_\theta(x_{t-1})||^2\)
Fundamental Problem: \(f\) could have near-0 or near-\(\infty\) gradients.
\(f_\theta(x_t) = g_\theta(x_t, \lambda_t)\)
\(\lambda_t = \min_\lambda h_\theta(x_t, \lambda)\)
B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Contact Forces
(Approximately) Minimizing Graph Distance
B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Analogy: \(y = H(x-\theta)\)
\(\theta\)
\(x\)
\(y\)
\(\mathcal{D}\)
Mean Square Error
\(\mathcal{L}_{MSE} = \sum_\mathcal{D}||y_{\mathcal{D}} - H(x_\mathcal{D}-\theta)||^2\)
MSE
GD
Alternative: Graph Distance
\(\mathcal{L}_{GD} = \sum_\mathcal{D}\min_x||(x_{\mathcal{D}}, y_{\mathcal{D}}) - (x, H(x-\theta))||^2\)
Problem: look at the gradient w.r.t. \(\theta\).
It is 0 almost everywhere!
Trade-Off:
- Pro: Loss gradient is finite (or bounded) everywhere.
- Con: Potentially expensive inner optimization loop.
Learning with a Violation-Implicit Loss
Inner Opt (QP) over contact forces.
\(\mathbb{P}(m_t | x_t,\theta)\)
Active Learning with Expected Information Gain
\(\Theta\)
\(\mathcal{L}\)
\(\tilde{\Theta}\)

\(\Theta\)
\(\mathcal{L}\)
\(\tilde{\Theta}\)

Maximum Likelihood Estimate: \(\tilde{\Theta} = \arg\min_\Theta\mathcal{L}(\Theta)\)
\(\rightarrow \frac{d\mathcal{L}}{d\Theta}(\tilde{\Theta}) = 0\)
Information \(\rightarrow\) How certain am I? Ideally: answer without a strong prior.
How certain is this?
Noise Floor
Low Info
High Info
Observed Information:
\(\mathcal{I} := \sum_{m_t}\nabla_{\Theta}^2\mathcal{L}\)
Expected (Fisher) Information:
\(\mathcal{F} := \mathbb{E}_{m_t}\left[\nabla_{\Theta}^2\mathcal{L}\right]\)
\(= Var_{m_t}\left[\nabla_{\Theta}\mathcal{L}\right]\)
\(= \mathbb{E}_{m_t}\left[\nabla_{\Theta}\mathcal{L}\left(\nabla_{\Theta}\mathcal{L}\right)^T\right] \)
EIG \(:= \log\det(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I})\)
E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Expected Information Gain (EIG)
Learn; Compute
Observed Info \(\mathcal{I}\)
Sample + Simulate
Expected Fisher Info \(\mathcal{F}\)
\(\max EIG := \log\det\left(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I}\right)\)
Choose actions where simulated, expected Fisher info is distinct from Observed info.




Information Maximization In Action



E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Information Through Marginalization


Ongoing Work
Currently, to reparameterize from \(\Theta = x_t \rightarrow \Theta = x_T\), we pretend \(\frac{d x_t}{d x_T} = \mathbf{I}\). This will neglect significant dynamics. Ongoing work will address this.
Information Through Marginalization

We want \(\nabla_{x_T} \log\mathbb{P}(m_t | x_T)\)

Ongoing Work
\(=\nabla_{x_T} \log\int_{x_t}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)
Assume:
\(\mathbb{P}(x_t | x_T) \propto \exp(f(x_t, x_T))\)
\(\approx\nabla_{x_T} \log\sum_{x_t\sim U}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)
\(=softmax_{x_t\sim U}\left(\log \mathbb{P}(m_t | x_t)+ f(x_t, x_T)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)
\(=softmax_{x_t\sim MCMC}\left(\log \mathbb{P}(m_t | x_t)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)
No gradient through sampling
At the cost of sampling trajectories \(x_t\) (via MCMC), we bypass the inverse Jacobian.
(If we have a good guess \(\tilde{x}_t\), MCMC should be quick)
Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Food Bite Acquisition as a Contextual Bandit
-
Leveraging Haptic Sensing
- Model-Based Tactile Active Exploration
- Haptics as Post-Hoc Bandit Context
-
RAF: Community-Based Participatory Design
-
Where can PARs go from here?
Post Hoc Haptics for Bite Acquisition



Haptic data is really good for food classification.
55ms of force data:

T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Supervised vs. Bandit Learning


\(l_a(c) = [0, 1, 1, 1, 1, 1, 1, 1, 1, 1]\)
Supervised Learning sees \(l_a(c) \forall a\)
Full Feedback
\(c\)
Bandit Algorithm sees \(l_{a_t}(c)\)
Bandit Feedback (Harder)
No counterfactual.
Post Hoc Haptics for Bite Acquisition
E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Consider a joint loss model:
\(l_a + \epsilon = c \cdot \theta^*_a = p \cdot \phi^*_a\)
\(\epsilon \sim \mathcal{N}\)
visual context
haptic context
\(\tilde{\theta},\tilde{\phi} = \arg\min_{\theta, \phi}\)
\(\sum_{t : \text{action}=a}||c_t\cdot\theta_a - l_t||^2 + ||p_t\cdot\phi_a - l_t||^2\)
s.t. \(\forall t,a: c_t \cdot \theta_a = p_t \cdot \phi_a\)
Once either model is learned, the other model reduces from bandit feedback to full feedback
Example:
After only 1 action, robot determines that kiwi \(\approx\) banana, and can impute the counterfactual.
Are the 11 actions tractable for learning?
T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021
Yes
New foods take ~7-8 actions to learn to user satisfaction.
Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Food Bite Acquisition as a Contextual Bandit
-
Leveraging Haptic Sensing
- Model-Based Tactile Active Exploration
- Haptics as Post-Hoc Bandit Context
-
RAF: Community-Based Participatory Design
-
Where can PARs go from here?
Community-Based Participatory Design
E. Gordon et al, "An adaptable, safe, and portable robot-assisted feeding system.", HRI Companion 2024Community-Based Participatory Design
A. Nanavati, E. Gordon et al, "Lessons learned from designing...", HRI 2025Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Food Bite Acquisition as a Contextual Bandit
-
Leveraging Haptic Sensing
- Model-Based Tactile Active Exploration
- Haptics as Post-Hoc Bandit Context
-
RAF: Community-Based Participatory Design
-
Where can PARs go from here?
Skills List











Bonus:






Other Leadership Activities



Responsibilities: Lead the other officers (Program Committee, Panel Committee, Sponsorship Chair, etc.); Secure Funding (NSF + AIJ grants); run the workshop day-of; coordinate travel for all attendees; Reviewer/AC of last resort.
What I didn't do:
reach out to some prospective speakers, update the website after creation
Why HRI?
- Research Motivation: "positioning robots to work alongside humans in collaborative, scalable networks aimed at extending human capabilities."
- That has been a driving motivation for working on PARs.
- "Do what works" mentality: model-based / physics explicit? learning-based / physics implicit? Encourages cultivating a broad knowledge base.
- Keeping many of the "perks" of academia:
- Mentorship: continuously working with interns and postdocs
- Collaboration:
- Internal: research engineers and across research teams
- External: with both academic and industry partners
- Research Dissemination: presentation at conferences / workshops
Research Plans
Safe Active Exploration in Contact


As et al, "ActSafe...", ICLR 2025Beneficial to play optimistically w.r.t. loss
Safe to play pessimistically w.r.t. model parameters
- Leverage Fisher Information to better quantify uncertainty through contact dynamics.
- Identify loss components that are safety-critical vs. performance-critical.
Dressing "Acquisition"

Kapusta et al, Autonomous Robots 2019; Jenamani et al, HRI 2024; McMurray, "Robotics... for poultry processing" (Book, 2011) 


Food Preparation
Lots of Work
Bite Transfer
Some Work
Cloth Folding
Lots of Work
Sleeve Insertion
Some Work
Picking up and orienting clothes in preparation for insertion motions.
Multi-Function PARs




?
- Can feeding / dressing / ambulation / etc. all be done with a single system?
- How can information be shared between tasks?
- How will users feel about having a robot for the entire day?
Thank you!










DAIR Lab
Amal Nanavati




Tractable Adaptability
Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington


Online and Active Learning for Physically Assistive Robotics
Ethan Faculty Job Talk
By Michael Posa
Ethan Faculty Job Talk
45min
- 30