Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington
with Contact-Rich Active Learning
Physical device that moves least partially autonomously.
Aids in tasks that users would otherwise find impossible, uncomfortable, or inconvenient.
Direct or indirect physical contact with people and the environment.
Activities of daily living (ADLs) for those with short- or long-term physical impairments.
Rehabilitation and physical therapy.
Assisting nurses and physicians with patient care.
“If I can have a robot do it, I can learn to adapt to it, but it would be me feeding me, and that would be huge”
Tyler Schrenk
1985-2023
The Promise of PARs:
Contact-Rich Manipulation
Online Adaptation
Online Adaptation
There is no time for
re-training!
How can robots adapt at deployment-time
efficiently, safely, and portably?
Policy Space Reduction
Model-Based Methods
Leveraging Haptics
Support
Inform
It is important to ask users, observational and qualitative research before experimentation.
Time Per Bite:
It's intuitive and familiar.
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Users with more limited mobility had a preference for greater autonomy, even if it experienced errors.
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Autonomy Preference Given Errors
User Rating
Trade-off between autonomy (with chance of error) and high-effort manual control.
What errors are tolerable? Minimum Food Acquisition Success Rate: 80%
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020R. Feng, Y. Kim, G. Lee, E. K. Gordon, et al, "...Generalizing skewering strategies...", ISRR 2019
Food simulation is hard*, but we can collect real data. What if we just use machine learning?
Example: 10 trajectories x 16 food types
85 person-hours
T. Bhattacharjee et al, “Towards Robotic Feeding...", R-AL 2019
Qualitative Taxonomy
Insights:
Discrete classes of strategies
Lots of variations within those classes
Wiggling
Tilting
High Pressure
Scooping
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Yes!
(Note the 80% acceptance threshold)
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Online learning with a discrete action space is easier, safer, and more predictable for the patient.
Discrete Actions a
Visual Context: c ctc_t ct
r: Success/Failure {1,0}
E. K. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020
Learn some parameters θ
P(r=1)≈fθ(c,a)
Na is small enough that we can do this at deployment time!
ra+ϵ=fθ∗(c)=gϕ∗(p)r_a + \epsilon = f_{\theta^*}(c) = g_{\phi^*}(p)
Haptic data is really good for food classification, and we already have one for safety!
55ms of force data:
T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Consider a joint loss model:
ra+ϵ=fθ∗(c)=gϕ∗(p)r_a + \epsilon = f_{\theta^*}(c) = g_{\phi^*}(p)P(r=1)=fθ(c)=gϕ(p)
visual context
haptic context
fθ(c,a)=[0.9,0.1,0.5,0.8,…]
Once either model is learned, the complexity of the other one is significantly reduced:
Example:
After only 1 action, robot determines that kiwi ≈ banana, and can impute the counterfactual.
Observe a=1→[0,?,?,?,…]
gϕ(p,a)=[0.8,0.2,0.4,0.7,…]
Can provide the counterfactual
O(dimc)→O(min(dimc,dimp))O(\dim c) \rightarrow O(\min(\dim c,\dim p))O(dimc)→O(min(dimc,dimp))
E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Yes
New foods take ~7-8 actions to learn to user satisfaction.
[1] Kapusta et al, Autonomous Robots 2019; [2] Hello Robot; [3] KukaDressing
Rehabilitation
How can we perform active learning with:
Dynamic Objects?
Visual Occlusions (and other uncertainties)?
Distilled Challenge: Can a robot do this?
Can we do better if we have tactile sensors and robust simulators?
Surgery
Robot Trajectory rtr
Learn; Compute
Observed InformationI\mathcal{Irm
Sample Actions + Simulate
Expected Future Information
Choose actions where simulated, expected future info is distinct from observed info.
E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
E. Gordon et al, "An adaptable, safe, and portable robot-assisted feeding system.", HRI Companion 2024A. Nanavati, E. Gordon et al, "Lessons learned from designing...", HRI 2025Nanavati, Alves-Oliveira, Schrenk, Gordon, et al., HRI 2023
Many are relevant across multiple tasks!
Dressing
Grooming
Rehabilitation
As et al, "ActSafe...", ICLR 2025Beneficial to play optimistically w.r.t. loss
Safer to play pessimistically w.r.t. model parameters
Which loss components are:
Safety-Critical
(zero user error tolerance)
vs.
Performance-Critical
(higher user error tolerance)
Adjust play for each metric separately.
π0.5; Kapusta et al, Autonomous Robots 2019
?
?
DAIR Lab
Amal Nanavati
Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington
with Contact-Rich Active Learning
Hierarchy and Bandits
E. Heiden et al, “DiSECt", RSS 2021
(only planar cutting)
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Splines and force/torque thresholds.
Comparable with Euclidean Metric.
r∼
N(0.5,1)
Reward
N(0.8,1)
N(0.1,1)
Interaction Protocol:
a=
1
2
3
Metric: Regret: E[r(a∗)−r(at)]
Test time metric, balances exploration vs. exploitation,
often theoretically bounded
r∼
N(μ1(c),σ)
Reward
Interaction Protocol:
a=
1
2
3
N(μ2(c),σ)
N(μ3(c),σ)
ct
la(c)=[0,1,1,1,1,1,1,1,1,1]
Supervised Learning sees la(c)∀a
Full Feedback
c
Bandit Algorithm sees lat(c)
Bandit Feedback (Harder)
No counterfactual.
E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Regret: O(T)
Regret: O(dimadimc∗T)
Static Objects: "assume a sensor that can detect contact before causing movement" [2]
Utilizes 2D OR discrete object priors.
Spatially Sparse Data -> Active Learning
[1] Hu et al, Biomimetic Intelligence and Robotics 2024 ; [2] Xu et al. "TANDEM3D...", ICRA 2023
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
mt
…
xT?
Example: we measure contact at t.
(Learning) Where is the object at t=T?
(Information) How certain are we?
Measurement Model: P(mt∣θ,xt)
Dynamics: xt+1=f(θ,xt)
θ?
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Maximum (Log) Likelihood as Trajectory Optimization
P(mt∣xT)→P(mt∣τ=[x0,…,xT])P(τ)
Loss L:=−∑tlogP(mt∣xt)+∣∣xt−f(xt−1)∣∣2
Key Difficulty: contact dynamics f often have near-0 or near-∞ gradients.
B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Analogy: y=H(x−θ)
θ
x
y
D
Mean Square Error
LMSE=∑D∣∣yD−H(xD−θ)∣∣2
MSE
GD
Alternative: Graph Distance
LGD=∑Dminx∣∣(xD,yD)−(x,H(x−θ))∣∣2
Problem: look at ∇θLMSE.
It is 0 or undefined everywhere!
Trade-Off:
Θ
L
Θ~
Θ
L
Θ~
Maximum Likelihood Estimate: Θ~=argminΘL(Θ)
→dΘdL(Θ~)=0
Information → How certain am I? Ideally: answer without a strong prior.
How certain is this?
Noise Floor
Low Info
High Info
Past (Observed) Information:
I:=∑mt∇ΘL(∇ΘL)T
Future (Fisher) Information:
F:=Varmt[∇ΘL]
=Emt[∇ΘL(∇ΘL)T]
Expected Information Gain (EIG) :=logdet(FI−1+I)
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Key Difficulty: Backwards simulation isn't well-defined for Coulomb frictional contact.
E.K. Gordon et al, "Active Tactile Exploration...", ICRA 2026
"Quasi-Static" Solution: pretend ∂xt+1∂xt=∂xt∂xt+1=I
I=∑tEmt[(∇xtP(mt∣xt))2]
Pro: Easy to compute
For Gaussian measurement model, no sampling required for E
I:=Varmt[∇(θ,xT)L]=Varmt[∇(θ,τ)L∇xTτ]
Recall: out loss optimizes the entire trajectory τ,
but we want information about xT
∂xt+1∂xt
Gradient of "backwards simulation"
Ongoing Work
"Quasi-Static" Solution won't work for more dynamic systems.
Can we do better? Yes, through marginalization.
≈softmaxτ∼MCMC(logP(m0∣x0))⋅(∇xT∣∣f(xT−1)−xT∣∣22)
=∇xTlog∫τP(m0∣τ)P(τ∣xT)
We want ∇xTlogP(m0∣xT)
Con: We introduce sampling. However...
Pros:
Journal extension in the works...
We want ∇xTlogP(mt∣xT)
Ongoing Work
=∇xTlog∫xtP(mt∣xt)P(xt∣xT)
Assume:
P(xt∣xT)∝exp(f(xt,xT))
≈∇xTlog∑xt∼UP(mt∣xt)P(xt∣xT)
=softmaxxt∼U(logP(mt∣xt)+f(xt,xT))⋅(∇xTf(xt,xT))
=softmaxxt∼MCMC(logP(mt∣xt))⋅(∇xTf(xt,xT))
No gradient through sampling
At the cost of sampling trajectories xt (via MCMC), we bypass the inverse Jacobian.
(If we have a good guess x~t, MCMC should be quick)
Dataset D
Perception
Learned Model + Policy
L
VLA
Run Classical Techniques on Learned Model:
Images
L
States
Observed Information:
I:=∑mt∇ΘL(∇ΘL)T
VLA
[1] Yang et al, "Uncertainty-aware Observation Reinjection...", preprint