Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington
with Contact-Rich Active Learning
“If I can have a robot do it, I can learn to adapt to it, but it would be me feeding me, and that would be huge”
Tyler Schrenk
1985-2023
The Promise of PARs:
Contact-Rich Manipulation
Online Adaptation
Online Adaptation
There is no time for
re-training!
How can robots efficiently learn, during deployment, how to manipulate previously-unseen objects?
Policy Space Reduction
Model-Based Methods
Leveraging Haptics
Support
Inform
Support
Inform
Support
Inform
It is important to ask users, observational and qualitative research before experimentation.
Time Per Bite:
It's intuitive and familiar.
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Trade-off between autonomy (with chance of error) and high-effort manual control.
What errors are tolerable?
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Users with more limited mobility had a preference for greater autonomy, even if it experienced errors.
But error tolerance was still low, e.g.,
Minimum Food Acquisition Success Rate:
80%
T. Bhattacharjee, E.K. Gordon et al, “Is more autonomy always better?...", HRI 2020Autonomy Preference Given Errors
User Rating
R. Feng, Y. Kim, G. Lee, E. K. Gordon, et al, "...Generalizing skewering strategies...", ISRR 2019
Real data collection takes a lot of time, but we can do it. What if we just train an RL policy?
Example: 10 trajectories x 16 food types
85 person-hours
Hierarchy and Bandits
T. Bhattacharjee et al, “Towards Robotic Feeding...", R-AL 2019
Qualitative Taxonomy
Insights:
Discrete classes of strategies
Lots of variations within those classes
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Splines and force/torque thresholds.
Comparable with Euclidean Metric.
Wiggling
Tilting
High Pressure
Scooping
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Yes!
(Note the 80% acceptance threshold)
E. K. Gordon, A. Nanavati et al, “Towards General Single-Utensil...", CoRL 2023
Online learning with a discrete action space maps cleanly on the contextual bandit setting.
\(r \sim\)
\( \mathcal{N}(0.5, 1)\)
Reward
\( \mathcal{N}(0.8, 1)\)
\( \mathcal{N}(0.1, 1)\)
Interaction Protocol:
\(a =\)
\(1\)
\(2\)
\(3\)
Metric: Regret: \(\mathbb{E}[r(a^*) - r(a_t)]\)
Test time metric, balances exploration vs. exploitation,
often theoretically bounded
\(r \sim\)
\( \mathcal{N}(\mu_1(c), \sigma)\)
Reward
Interaction Protocol:
\(a =\)
\(1\)
\(2\)
\(3\)
\( \mathcal{N}(\mu_2(c), \sigma)\)
\( \mathcal{N}(\mu_3(c), \sigma)\)
\(c_t\)
Discrete Actions \(a\)
Visual Context: \(c_t\)
\(r\): Success/Failure \(\{1, 0\}\)
E. K. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020
Eye-in-Hand RGBD
SegmentAnything
ResNet Features
\(c\): Visual Context
Eye-in-Hand RGBD
SegmentAnything
ResNet Features
\(l\): Success/Failure
\(\{0, 1\}\)
\(a\):
Control Policy
Action Space Trade-Off:
E. Gordon et al, “Adaptive robot-assisted feeding...", IROS 2020
Haptic data is really good for food classification.
55ms of force data:
T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021\(l_a(c) = [0, 1, 1, 1, 1, 1, 1, 1, 1, 1]\)
Supervised Learning sees \(l_a(c) \forall a\)
Full Feedback
\(c\)
Bandit Algorithm sees \(l_{a_t}(c)\)
Bandit Feedback (Harder)
No counterfactual.
E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Consider a joint loss model:
\(l_a + \epsilon = c \cdot \theta^*_a = p \cdot \phi^*_a\)
\(\epsilon \sim \mathcal{N}\)
visual context
haptic context
\(\tilde{\theta},\tilde{\phi} = \arg\min_{\theta, \phi}\)
\(\sum_{t : \text{action}=a}||c_t\cdot\theta_a - l_t||^2 + ||p_t\cdot\phi_a - l_t||^2\)
s.t. \(\forall t,a: c_t \cdot \theta_a = p_t \cdot \phi_a\)
Once either model is learned, the other model reduces from bandit feedback to full feedback
Example:
After only 1 action, robot determines that kiwi \(\approx\) banana, and can impute the counterfactual.
T. Bhattacharjee et al, R-AL 2019 ; E. Gordon et al, "Leveraging post hoc context...", ICRA 2021Yes
New foods take ~7-8 actions to learn to user satisfaction.
E. Heiden et al, “DiSECt", RSS 2021 ; R. Feng et al, "...Generalizing skewering strategies...", ISRR 2019
Food Sim/Real gap is significant.
(this is just planar cutting)
Real data takes a lot of time.
Example: 10 trajectories x 16 food types
85 person-hours
Given data, what is our best guess of the object's parameters and location?
Estimate how much information we have from previous data.
Robot Trajectory \(r_t\)
Object Geometry \(\theta^*\)
Object Pose \(x^*_T\)
Static Objects: "assume a sensor that can detect contact before causing movement" [2]
Utilizes 2D OR discrete object priors.
Spatially Sparse Data -> Active Learning
[1] Hu et al, Biomimetic Intelligence and Robotics 2024 ; [2] Xu et al. "TANDEM3D...", ICRA 2023
Any gradients \(\nabla_\Theta\mathcal{L}\) have inverse Jacobian terms: \(\frac{\partial x_t}{\partial x_T}\)
Fundamental Problem:
What is the sensitivity of past measurements to the current state?
For e.g. Coulomb friction, this is ill-posed.
(Even the simpler: "what is the past state given the current state" is unanswerable)
For both learning and exploration, how do we handle this?
E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Log-Likelihood Loss: \(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)
(Past) Measurements
Geometry and (Current) State
Goal: Define a Loss Function as a Negative-Log-Likelihood
\(\mathcal{L} := -\sum_t\log\mathbb{P}(m_t = \{c_t, \hat{n}_{m,t}\} | \Theta = \{\theta, x_T\})\)
Step 1: Define a per-timestep Measurement Model
\(\mathbb{P}(\hat{n}_{m,t} | x_t(\Theta)) := \mathcal{N}(||\hat{n}_{m,t}-\hat{n}_t(x_t, \theta)||^2_2, \Sigma_n)\)
\(\mathbb{P}(c_t=0 | x_t(\Theta)) := sigmoid(\alpha\phi_t(x_t, \theta))\)
(Differential Collision)
Step 2 (The Hard Part): What is \(x_t(x_T, \theta)\) ?
(and its gradient)
E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Option 1: DiffSim (Shooting)
Pretend \(\Theta = \{\theta, x_0\}\), then \(x_{t+1} = f_\theta(x_t)\)
MLE \(\tilde{x}_T = f^T_\theta(\tilde{x}_0)\)
Clear Problem: Unstable. Accurate \(\tilde{x}_T\) requires an accurate \(\tilde{x}_0\)
Option 2: DiffSim (Collocation) [with Prediction Loss]
Pretend \(\Theta = \{\theta, x_t\}\), add dynamics as a penalty.
\(\mathcal{L} := \sum_t-\log\mathbb{P}(m_t | x_t,\theta) + ||x_t - f_\theta(x_{t-1})||^2\)
Fundamental Problem: \(f\) could have near-0 or near-\(\infty\) gradients.
\(f_\theta(x_t) = g_\theta(x_t, \lambda_t)\)
\(\lambda_t = \min_\lambda h_\theta(x_t, \lambda)\)
B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Contact Forces
B. Bianchini et al, "Generalization Bounded...", L4DC 2022; E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Analogy: \(y = H(x-\theta)\)
\(\theta\)
\(x\)
\(y\)
\(\mathcal{D}\)
Mean Square Error
\(\mathcal{L}_{MSE} = \sum_\mathcal{D}||y_{\mathcal{D}} - H(x_\mathcal{D}-\theta)||^2\)
MSE
GD
Alternative: Graph Distance
\(\mathcal{L}_{GD} = \sum_\mathcal{D}\min_x||(x_{\mathcal{D}}, y_{\mathcal{D}}) - (x, H(x-\theta))||^2\)
Problem: look at the gradient w.r.t. \(\theta\).
It is 0 almost everywhere!
Trade-Off:
Inner Opt (QP) over contact forces.
\(\mathbb{P}(m_t | x_t,\theta)\)
\(\Theta\)
\(\mathcal{L}\)
\(\tilde{\Theta}\)
\(\Theta\)
\(\mathcal{L}\)
\(\tilde{\Theta}\)
Maximum Likelihood Estimate: \(\tilde{\Theta} = \arg\min_\Theta\mathcal{L}(\Theta)\)
\(\rightarrow \frac{d\mathcal{L}}{d\Theta}(\tilde{\Theta}) = 0\)
Information \(\rightarrow\) How certain am I? Ideally: answer without a strong prior.
How certain is this?
Noise Floor
Low Info
High Info
Observed Information:
\(\mathcal{I} := \sum_{m_t}\nabla_{\Theta}^2\mathcal{L}\)
Expected (Fisher) Information:
\(\mathcal{F} := \mathbb{E}_{m_t}\left[\nabla_{\Theta}^2\mathcal{L}\right]\)
\(= Var_{m_t}\left[\nabla_{\Theta}\mathcal{L}\right]\)
\(= \mathbb{E}_{m_t}\left[\nabla_{\Theta}\mathcal{L}\left(\nabla_{\Theta}\mathcal{L}\right)^T\right] \)
EIG \(:= \log\det(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I})\)
E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Learn; Compute
Observed Info \(\mathcal{I}\)
Sample + Simulate
Expected Fisher Info \(\mathcal{F}\)
\(\max EIG := \log\det\left(\mathcal{F}\mathcal{I}^{-1} + \mathbf{I}\right)\)
Choose actions where simulated, expected Fisher info is distinct from Observed info.
E. Gordon et al, "Active Tactile Exploration...", ICRA 2026
Ongoing Work
Currently, to reparameterize from \(\Theta = x_t \rightarrow \Theta = x_T\), we pretend \(\frac{d x_t}{d x_T} = \mathbf{I}\). This will neglect significant dynamics. Ongoing work will address this.
We want \(\nabla_{x_T} \log\mathbb{P}(m_t | x_T)\)
Ongoing Work
\(=\nabla_{x_T} \log\int_{x_t}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)
Assume:
\(\mathbb{P}(x_t | x_T) \propto \exp(f(x_t, x_T))\)
\(\approx\nabla_{x_T} \log\sum_{x_t\sim U}\mathbb{P}(m_t | x_t)\mathbb{P}(x_t | x_T)\)
\(=softmax_{x_t\sim U}\left(\log \mathbb{P}(m_t | x_t)+ f(x_t, x_T)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)
\(=softmax_{x_t\sim MCMC}\left(\log \mathbb{P}(m_t | x_t)\right) \cdot \left(\nabla_{x_T}f(x_t, x_T)\right)\)
No gradient through sampling
At the cost of sampling trajectories \(x_t\) (via MCMC), we bypass the inverse Jacobian.
(If we have a good guess \(\tilde{x}_t\), MCMC should be quick)
E. Gordon et al, "An adaptable, safe, and portable robot-assisted feeding system.", HRI Companion 2024A. Nanavati, E. Gordon et al, "Lessons learned from designing...", HRI 2025Bonus:
Responsibilities: Lead the other officers (Program Committee, Panel Committee, Sponsorship Chair, etc.); Secure Funding (NSF + AIJ grants); run the workshop day-of; coordinate travel for all attendees; Reviewer/AC of last resort.
What I didn't do:
reach out to some prospective speakers, update the website after creation
As et al, "ActSafe...", ICLR 2025Beneficial to play optimistically w.r.t. loss
Safe to play pessimistically w.r.t. model parameters
Kapusta et al, Autonomous Robots 2019; Jenamani et al, HRI 2024; McMurray, "Robotics... for poultry processing" (Book, 2011) Food Preparation
Lots of Work
Bite Transfer
Some Work
Cloth Folding
Lots of Work
Sleeve Insertion
Some Work
Picking up and orienting clothes in preparation for insertion motions.
?
DAIR Lab
Amal Nanavati
Ethan K. Gordon
Postdoc, University of Pennsylvania
PhD 2023, University of Washington
Online and Active Learning for Physically Assistive Robotics