Active Tactile Exploration

for Rigid Body State Estimation

Ethan K. Gordon, Bruke Baraki, Michael Posa

<Items in Brackets are Meta Notes / Still in Progress. Feedback Appreciated!>

Known / Estimated:

  • Object Geometry

  • Object Pose

  • Object Mass / Inertia
  • Frictional Properties

IN Robotics, Models are Powerful

Max Planck Real Robotics Challenge 2020

Arbitrary Convex Object Repose Task

Bauer et al. "Real Robot Challenge: A Robotics Competition in the Cloud". NeurIPS 2021 Competition.

Models are difficult to build Online

  • Occlusions / Darkness

  • Clutter

  • Heterogeneous Materials

  • Broken Objects

Visual Model Learning

Structure from Motion (SFM)

Bianco et al. "Evaluating the Performance of Structure from Motion Pipelines", Journal of Imaging 2018

Wen et al. "BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects", CVPR 2023

Geometry from Video

Pros: Spatially Dense, Mature HW and SW

Cons:

  • Occlusions / Darkness
  • SFM: can't capture physical properties
  • Video: What's doing the manipulating?

State of the art Tactile Model Learning

Hu et al. "Active shape reconstruction using a novel visuotactile palm sensor", Biomimetic Intelligence and Robotics 2024

Xu et al. "TANDEM3D: Active Tactile Exploration for 3D Object Recognition", ICRA 2023

Single-Finger Poking: No friction or inertia.

Utilizes discrete object priors.

Spatially Sparse Data -> Active Learning

Active Tactile Exploration: Problem Statement

?

Assumptions:

  • Rigidity

  • Convexity

  • Coulomb friction
r[t]

What we know / measure:

  • Robot state trajectory \(r[t]\)

  • Contact force \(\lambda_m[t]\)

  • Contact normal \(\hat{n}_m[t]\)

Unknown object properties:

  • State \(x[t]\)

  • Geometry \(\theta\)

  • Inertial properties \(\theta\)
  • Frictional properties \(\theta\)
\lambda_m[t]
\hat{n}_m[t]
x[t] = (q[t], v[t])
\theta

Measurement Probability Model

<Gaussian: Major (likely incorrect) Assumption>

<A Gamma is likely more accurate (>0 and mean-dependent variance, with variance -> 0 when mean -> 0). However, in practice, a Gaussian estimator often achieves similar performance to a Gamma.>

\mathcal{L} = -\log\mathbb{P}(\lambda_m | \theta, x; r) = \sum_t\left|\left|\hat{\lambda}-\lambda_m\right|\right|_2^2

Minimize as a loss function for a Maximum Likelihood Estimate

\lambda_m[t] = \hat{\lambda}(\theta, x[t]; r[t]) + \epsilon
\epsilon \sim \mathcal{N}(0, \Sigma)

Given a simulator that can compute  \(\hat{\lambda}\)

ONe Possibility is Differential Simulation + Shooting

\mathcal{L} = \sum_t\left|\left|\hat{\lambda}(\theta, x)-\lambda_m\right|\right|_2^2
\text{s.t. } \hat{\lambda} = \arg\min_{\lambda}\sum_t\left|\left|M\Delta v_c - J^{\textrm T}\lambda\right|\right| _{M^{-1}}^2 + \phi^{\textrm T}\lambda
\phi
\mathcal{FC}(\mu)
\text{s.t. } q[t] = q[t-1] + v[t-1]
\text{Given: } x[0]

Anitescu. "Optimization-based simulation of nonsmooth rigid multibody dynamics,” Mathematical Programming 2006

\lambda \in \mathcal{FC}(\mu)
\phi > 0

Improving Stability with an Implicit Loss

\mathcal{L} = \min_{\lambda}\sum_t\left|\left|\lambda-\lambda_m\right|\right|_2^2 + \left|\left|M\Delta v_c - J^{\textrm T}\lambda\right|\right| _{M^{-1}}^2 + \phi^{\textrm T}\lambda

Bianchini et al. "Generalization Bounded Implicit Learning of Nearly Discontinuous Functions,” L4DC 2022

<TODO: Replace with self-made plot>

DiffSim + Shooting Limitations:

  • Sensitivity to x[0]
  • Discontinuities given process noise \(\epsilon_p\)
  • Only gets worse with smaller dt

 

Solution is to bring the optimization into the loss function.

+ \left|\left|\Delta q - v\right|\right|_2^2 + \min(\phi, 0)
\hat{\lambda}(\theta, x[t] + \epsilon_p; r[t]) + \epsilon_m

MSE -> Graph Distance

Violation Implicit Loss summary

Pfommer et al. "ContactNets: Learning Discontinuous Contact Dynamics with Smooth, Implicit Representations,” CoRL 2020

\phi^{\textrm T}\lambda
\min(\phi, 0)

Complementarity:

Penetration:

\left|\left|\lambda-\lambda_m\right|\right|_2^2

Measurement:

+ (1 - \hat{n}_m\cdot\hat{n})
\hat{n}_m
\hat{n}
\left|\left|M\Delta v_c - J^{\textrm T}\lambda\right|\right| _{M^{-1}}^2 + \left|\left|\Delta q - v\right|\right|_2^2

Prediction:

\left|\left|\mu J_tv\right|\right|\lambda_n + \lambda_t^{\textrm T}\mu J_tv

Power Dissipation:

(Relaxed in Anitescu)

J_tv
\lambda
+ \max(J_nv,0)^{\textrm T}\lambda_n
J_nv
\lambda=0

Learning <Preliminary> Results

Real Time Simulated Data Collection, Real Time Gradient Descent

active exploration: What is Information?

We want to (possibly) be surprised

\(\Theta\)

\(\mathcal{L}\)

Ideally information is local

(i.e. no belief distribution on \(\Theta\))

\(\Theta\)

\(\mathcal{L}\)

\(\Theta\)

\(\mathcal{L}\)

\(\Theta\)

\(\mathcal{L}\)

  • Estimate \(\hat{\Theta}\)
  • Choose \(r\)
  • Observe (random) \(\lambda_m\)

\(\hat{\Theta}\)

\(r\)

Fisher Information: Variance of the score

\mathcal{L}(\Theta, r, \lambda_m) = -\log\mathbb{P}(\lambda_m | \Theta; r)

"log-likelihood"

\nabla_\Theta\mathcal{L}(\Theta, r, \lambda_m)

"score"

We are surprised if, at \(\hat{\Theta}\), the score varies a lot with new data.

\mathcal{I} = Var_{\lambda_m}\left[\nabla_\Theta\mathcal{L}(\Theta, r, \lambda_m)\Bigr\rvert_{\hat{\Theta}}\right]

"Fisher Information"

Fisher Information definitions

\mathcal{I} = Var_{\lambda_m}\left[\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right] = \mathbb{E}_{\lambda_m}\left[\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right]^2 + \mathbb{E}_{\lambda_m}\left[\left(\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right)^2\right]
0

\(\hat{\Theta}\) is a Maximum Likelihood Estimate

(outer product)

Var_{\lambda_m}\left[\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right] = \mathbb{E}_{\lambda_m}\left[\nabla_\Theta\otimes\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right]

The variance of the gradient is the expected sensitivity of the gradient to small changes in the loss function.

\(\Theta\)

\(\mathcal{L}\)

Fisher Information definitions

Var_{\lambda_m}\left[\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right] = \mathbb{E}_{\lambda_m}\left[\nabla_\Theta\otimes\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right]

The variance of the gradient is the expected sensitivity of the gradient to small changes in the loss function.

\(\Theta\)

\(\mathcal{L}\)

Mathematically requires "certain regularity conditions":

  • \(\mathbb{E}\) is necessary: requires swapping integral and derivative order
  • Requires the \(\log\mathbb{P}\): uses normalization of the probability distribution

How to calculate Fisher Information

\mathbb{E}_{\lambda_m}\left[\left(\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right)^2\right]
  1. Start with the probability model: \(\lambda_m = \hat{\lambda} + \epsilon\)
    1. Not Necessarily Gaussian
  2. For a given \(r\), simulate forward to find \(\hat{\lambda}\)
  3. Sample possible forward values for \(\lambda_m\)
  4. Autodiff calculate \(\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\) for each sample
  5. Take the empirical mean of the outer products

<What is the right probability model? Can also simulate with process noise>

<Currently I have a bug in my implementation, so I don't have complete results.

 

I feed in \(\lambda_m\) as post-optimization impulses. Instead I need to re-optimize in calculation of \(\mathcal{L}\)>

Example: Complementarity

\mathbb{E}_{\lambda_m}\left[\left(\nabla_\Theta\phi\lambda_m + ...\right)^2\right] = \mathbb{E}_{\lambda_m}\left[\lambda_m\left(\nabla_\Theta\phi\right)^2\lambda_m\right]
\hat{\lambda}\left(\nabla_\Theta\phi\right)^2\hat{\lambda} + tr(\left(\nabla_\Theta\phi\right)^2\Sigma) + ... \text{(cross terms)}
\lambda_m = \hat{\lambda} + \epsilon
\epsilon \sim \mathcal{N}(0, \Sigma)
\nabla\mathcal{L}(\hat{\lambda}) = 0
(\nabla\phi)^2 = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}
(\nabla\phi)^2 = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}
(\nabla\phi)^2 = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}
\phi = (l, w)

Example with action library

\(tr(\mathcal{I})\) = [1388150.4359, 2878543.4818, 2905122.0841]

For Actions: [2-finger X Pinch, 2-finger Z Pinch w/ Ground, 1-finger Cube Corner Hit]

Expected Info Gain: avoid redundancy

  • \(\mathcal{I}\) is independent of past data \(\mathcal{D}\)
  • Same action will be taken every time!
  • Solution: de-prioritize info we've already seen.

Note \(\sum_\mathcal{D}\left(\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right) = \nabla_\Theta\left(\sum_\mathcal{D}\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right) = 0\), since \(\hat{\Theta}\) is the MLE

\mathcal{I}_\mathcal{O} = \sum_\mathcal{D}\left(\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right)^2 \approx \sum_\mathcal{D}\left(\nabla_\Theta\otimes\nabla_\Theta\mathcal{L}\Bigr\rvert_{\hat{\Theta}}\right)
EIG(r) = \mathcal{I}(r)\mathcal{I}_\mathcal{O}^{-1}

Final MAximization Problem

Choosing a scalarization is a whole field of study. Common choices include:

 

  1. A (average): \(tr(EIG)\) -> average EIG across parameters
  2. E (eigenvalue): \(\min(eigenvalue(EIG))\) -> prioritize parameter we know the least about.
  3. D (determinant): \(det(EIG)\) -> maximize area of the "uncertainty ellipse" around the score.
r = \arg\max_r scalarization(EIG(r))

Thank You!

<Other Funding Orgs>