Empowering Physically Assistive Robots
Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington


with Contact-Rich Active Learning
Physically Assistive Robots (PARs)
Physical device that moves least partially autonomously.
Aids in tasks that users would otherwise find impossible, uncomfortable, or inconvenient.
Direct or indirect physical contact with people and the environment.


Activities of daily living (ADLs) for those with short- or long-term physical impairments.
Rehabilitation and physical therapy.

Assisting nurses and physicians with patient care.

Physically Assistive Robots (PARs)
“If I can have a robot do it, I can learn to adapt to it, but it would be me feeding me, and that would be huge”
Tyler Schrenk
1985-2023


The Promise of PARs:
- Empowerment
- Independence
What is needed for PARs?
Contact-Rich Manipulation
- Sliding to clean the spoon and bowl
- Shaking to smoothen
- In-Mouth Hand-Off
(vision-denied)
Online Adaptation
- Bite Size Adjustment
What is needed for PARs?
Online Adaptation
- Totally Different Food
- Multi-bite: different shapes for each bite
There is no time for
re-training!
Key Technology:
Tractable Adaptability
How can robots adapt at deployment-time
efficiently, safely, and portably?
Policy Space Reduction
Model-Based Methods
Leveraging Haptics
Support
Inform
Multimodal Active Learning
Physically Assistive Robots







Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
-
Active Learning with Dynamic Contact
-
Community-Based Participatory Design
-
Where can PARs go from here?
Multimodal Active Learning
Physically Assistive Robots
Do we need autonomy? What kind?
Community-Based Participatory Research
It is important to ask users, observational and qualitative research before experimentation.
Time Per Bite:
- Caretaker: ~20s
- Preferred: <2min
- Teleoperated Robot: 5-40min
Why Single-Utensil Feeding?
It's intuitive and familiar.
The Assistive Dexterous Arm (ADA)
User Studies Capture Diversity
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Acceptance is User-Dependent, But High
Users with more limited mobility had a preference for greater autonomy, even if it experienced errors.
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020
Autonomy Preference Given Errors
User Rating
User Studies Capture Metrics
Trade-off between autonomy (with chance of error) and high-effort manual control.
What errors are tolerable? Minimum Food Acquisition Success Rate: 80%
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
-
Active Learning with Dynamic Contact
-
Community-Based Participatory Design
-
Where can PARs go from here?
Multimodal Active Learning
Physically Assistive Robots
Data Driven Bite Acquisition
R. Feng, Y. Kim, G. Lee, E. K. Gordon, et al, "...Generalizing skewering strategies...", ISRR 2019
Food simulation is hard*, but we can collect real data. What if we just use machine learning?
Example: 10 trajectories x 16 food types
85 person-hours
- Is it portable?
- Is it safe?
- Is it adaptable?


Leveraging Expert Data
T. Bhattacharjee et al, “Towards Robotic Feeding...", R-AL 2019

Qualitative Taxonomy
Insights:
Discrete classes of strategies
Lots of variations within those classes
Emergent Discrete Action Space
Wiggling
Tilting
High Pressure
Scooping
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Is this expressive enough?

Yes!
(Note the 80% acceptance threshold)
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Online learning with a discrete action space is easier, safer, and more predictable for the patient.
Online Learning for Bite Acquisition
Discrete Actions a
Visual Context: c ctc_t ct

r: Success/Failure {1,0}
E. K. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020
Learn some parameters θ
P(r=1)≈fθ(c,a)
Na is small enough that we can do this at deployment time!
ra+ϵ=fθ∗(c)=gϕ∗(p)r_a + \epsilon = f_{\theta^*}(c) = g_{\phi^*}(p)
Leveraging Haptic Data



Haptic data is really good for food classification, and we already have one for safety!
55ms of force data:

T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Post Hoc Haptics for Bite Acquisition
E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Consider a joint loss model:
ra+ϵ=fθ∗(c)=gϕ∗(p)r_a + \epsilon = f_{\theta^*}(c) = g_{\phi^*}(p)P(r=1)=fθ(c)=gϕ(p)
visual context
haptic context
fθ(c,a)=[0.9,0.1,0.5,0.8,…]
Once either model is learned, the complexity of the other one is significantly reduced:
Example:
After only 1 action, robot determines that kiwi ≈ banana, and can impute the counterfactual.
Observe a=1→[0,?,?,?,…]
gϕ(p,a)=[0.8,0.2,0.4,0.7,…]
Can provide the counterfactual
O(dimc)→O(min(dimc,dimp))O(\dim c) \rightarrow O(\min(\dim c,\dim p))O(dimc)→O(min(dimc,dimp))
Are the 11 actions tractable for learning?
E. Gordon et al, "Leveraging post hoc context...", ICRA 2021
Yes
New foods take ~7-8 actions to learn to user satisfaction.
Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
-
Active Learning with Dynamic Contact
-
Community-Based Participatory Design
-
Where can PARs go from here?
Multimodal Active Learning
Physically Assistive Robots
Active Tactile Exploration Through Contact



[1] Kapusta et al, Autonomous Robots 2019; [2] Hello Robot; [3] KukaDressing
Rehabilitation
How can we perform active learning with:
Dynamic Objects?
Visual Occlusions (and other uncertainties)?
Distilled Challenge: Can a robot do this?
Can we do better if we have tactile sensors and robust simulators?

Surgery
System Identification and Measuring Uncertainty
Choose:
-
Robot Trajectory rtr
Measure:
Find:
- Object Geometry and Pose
- How certain are we?

Active Exploration with the Trifinger Robot
Exploration to Maximize Information
Learn; Compute
Observed InformationI\mathcal{Irm
Sample Actions + Simulate
Expected Future Information
Choose actions where simulated, expected future info is distinct from observed info.




Information Maximization In Action



E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Online Learning for Food Acquisition
- Policy space reduction a priori
- Haptics as post hoc data
-
Active Learning with Dynamic Contact
-
Community-Based Participatory Design
-
Where can PARs go from here?
Multimodal Active Learning
Physically Assistive Robots
Community-Based Participatory Design
E. Gordon et al, "An adaptable, safe, and portable robot-assisted feeding system.", HRI Companion 2024Community-Based Participatory Design
A. Nanavati, E. Gordon et al, "Lessons learned from designing...", HRI 2025Summary
-
The Promise of Physically Assistive Robotics
-
Robot-Assisted Feeding: User-Defined Metrics
-
Food Bite Acquisition as a Contextual Bandit
- Policy space reduction a priori
- Haptics as post hoc bandit context
-
Active Learning with Dynamic Contact
-
Community-Based Participatory Design
-
Where can PARs go from here?
Multimodal Active Learning
Physically Assistive Robots
Challenges in Feeding and Beyond
Multimodal Active Learning
Physically Assistive Robots

Nanavati, Alves-Oliveira, Schrenk, Gordon, et al., HRI 2023
Challenges in Feeding and Beyond
Multimodal Active Learning
Physically Assistive Robots

Many are relevant across multiple tasks!



Dressing
Grooming
Rehabilitation
Safe Active Exploration
Multimodal Active Learning
Physically Assistive Robots


As et al, "ActSafe...", ICLR 2025Beneficial to play optimistically w.r.t. loss
Safer to play pessimistically w.r.t. model parameters
Which loss components are:
Safety-Critical
(zero user error tolerance)
vs.
Performance-Critical
(higher user error tolerance)
Adjust play for each metric separately.
Leveraging Foundation Models
Multimodal Active Learning
Physically Assistive Robots
- "VLAs": ChatGPT for Robots. Impressive general performance.
- I would not deploy these models directly with patients right now.
- Is it portable? No: internet access or large computers required.
- Is it safe? Not guaranteed.
- Is it adaptable? Not at deployment time (but can be fine-tuned in advance).

π0.5; Kapusta et al, Autonomous Robots 2019
?
Multi-Function Longitudinal Studies
Multimodal Active Learning
Physically Assistive Robots



?
- Can feeding / dressing / ambulation / physical therapy / etc. all be done with a single system or connected ensemble?
- How can information be shared between tasks?
- How will users feel about having a robot 24/7 for weeks or months?

Thank you!










DAIR Lab
Amal Nanavati




Empowering Physically Assistive Robots
Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington


with Contact-Rich Active Learning
Online Learning with Policy Space Reduction
Hierarchy and Bandits
Data Driven Bite Acquisition
E. Heiden et al, “DiSECt", RSS 2021
(only planar cutting)
Imitation Learning for Policy Space Reduction



E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Splines and force/torque thresholds.
Comparable with Euclidean Metric.
The Multi-Arm Bandit (MAB)



r∼
N(0.5,1)
Reward
N(0.8,1)
N(0.1,1)
Interaction Protocol:
- Select a=π(a0…t,r0…t)
- Observe r
- Update π
a=
1
2
3



Metric: Regret: E[r(a∗)−r(at)]
Test time metric, balances exploration vs. exploitation,
often theoretically bounded
The (Stochastic) Contextual Bandit



r∼
N(μ1(c),σ)
Reward
Interaction Protocol:
- Observe ct
- Select at=π(ct)
- Observe r(at,ct)
- Update π
a=
1
2
3
N(μ2(c),σ)
N(μ3(c),σ)

ct
Supervised vs. Bandit Learning


la(c)=[0,1,1,1,1,1,1,1,1,1]
Supervised Learning sees la(c)∀a
Full Feedback
c
Bandit Algorithm sees lat(c)
Bandit Feedback (Harder)
No counterfactual.
E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Regret: O(T)
Regret: O(dimadimc∗T)
Previous Work in Tactile SysID
Static Objects: "assume a sensor that can detect contact before causing movement" [2]
Utilizes 2D OR discrete object priors.
Spatially Sparse Data -> Active Learning

[1] Hu et al, Biomimetic Intelligence and Robotics 2024 ; [2] Xu et al. "TANDEM3D...", ICRA 2023
Online Learning Through Contact
Problem Formulation
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026

mt
…
xT?
Example: we measure contact at t.
(Learning) Where is the object at t=T?
(Information) How certain are we?
Measurement Model: P(mt∣θ,xt)
Dynamics: xt+1=f(θ,xt)

θ?
Online Learning
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Maximum (Log) Likelihood as Trajectory Optimization
P(mt∣xT)→P(mt∣τ=[x0,…,xT])P(τ)
Loss L:=−∑tlogP(mt∣xt)+∣∣xt−f(xt−1)∣∣2
Key Difficulty: contact dynamics f often have near-0 or near-∞ gradients.
(Approximately) Minimizing Graph Distance
B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026

Analogy: y=H(x−θ)
θ
x
y
D
Mean Square Error
LMSE=∑D∣∣yD−H(xD−θ)∣∣2
MSE
GD
Alternative: Graph Distance
LGD=∑Dminx∣∣(xD,yD)−(x,H(x−θ))∣∣2
Problem: look at ∇θLMSE.
It is 0 or undefined everywhere!
Trade-Off:
- Pro: Loss gradient is finite (or bounded) almost everywhere!
- Con: Potentially expensive inner optimization loop.
Quantifying Information Without a Prior
Θ
L
Θ~

Θ
L
Θ~

Maximum Likelihood Estimate: Θ~=argminΘL(Θ)
→dΘdL(Θ~)=0
Information → How certain am I? Ideally: answer without a strong prior.
How certain is this?
Noise Floor
Low Info
High Info
Past (Observed) Information:
I:=∑mt∇ΘL(∇ΘL)T
Future (Fisher) Information:
F:=Varmt[∇ΘL]
=Emt[∇ΘL(∇ΘL)T]
Expected Information Gain (EIG) :=logdet(FI−1+I)
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Computing Observed Information
Computing Fisher Information
Key Difficulty: Backwards simulation isn't well-defined for Coulomb frictional contact.
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
"Quasi-Static" Solution: pretend ∂xt+1∂xt=∂xt∂xt+1=I
I=∑tEmt[(∇xtP(mt∣xt))2]
Pro: Easy to compute
For Gaussian measurement model, no sampling required for E
I:=Varmt[∇(θ,xT)L]=Varmt[∇(θ,τ)L∇xTτ]
Recall: out loss optimizes the entire trajectory τ,
but we want information about xT
∂xt+1∂xt
Gradient of "backwards simulation"
Information Through Marginalization


Ongoing Work
"Quasi-Static" Solution won't work for more dynamic systems.
Can we do better? Yes, through marginalization.
≈softmaxτ∼MCMC(logP(m0∣x0))⋅(∇xT∣∣f(xT−1)−xT∣∣22)
=∇xTlog∫τP(m0∣τ)P(τ∣xT)
We want ∇xTlogP(m0∣xT)
Con: We introduce sampling. However...
Pros:
- No gradient of f required
- For sampling: we already have a good guess of τ from the learning algorithm
Journal extension in the works...
Information Through Marginalization

We want ∇xTlogP(mt∣xT)

Ongoing Work
=∇xTlog∫xtP(mt∣xt)P(xt∣xT)
Assume:
P(xt∣xT)∝exp(f(xt,xT))
≈∇xTlog∑xt∼UP(mt∣xt)P(xt∣xT)
=softmaxxt∼U(logP(mt∣xt)+f(xt,xT))⋅(∇xTf(xt,xT))
=softmaxxt∼MCMC(logP(mt∣xt))⋅(∇xTf(xt,xT))
No gradient through sampling
At the cost of sampling trajectories xt (via MCMC), we bypass the inverse Jacobian.
(If we have a good guess x~t, MCMC should be quick)
Assistive Direct pHRI
Multimodal Active Learning
Physically Assistive Robots

- Large communities in HRI and Contact-Rich Manipulation
- The overlap was much smaller.
Leveraging Model-Based Methods
Multimodal Active Learning
Physically Assistive Robots
Dataset D

Perception
Learned Model + Policy
L

VLA
Run Classical Techniques on Learned Model:
- Information Quantification (Observed and Eπ)
- Generate / efficiently sample fine-tuning sim data.
- Offline MPC when compute or connectivity are limited (portability).
Uncertainty Quantification
Images

L
- Implicit approach, e.g. look at action entropy [1]
- Alternative: try to learn that uncertainty explicitly.
- Used privileged simulator information to compute observed info ≈ uncertainty.
States
Observed Information:
I:=∑mt∇ΘL(∇ΘL)T
Multimodal Active Learning
Physically Assistive Robots

VLA
[1] Yang et al, "Uncertainty-aware Observation Reinjection...", preprint
Ethan Job Talk Med
By Michael Posa
Ethan Job Talk Med
45min
- 13