Shen Shen
April 30, 2025
2:30pm, Room 32-144
Image source: Ho et al. 2020
Denoiser can be conditioned on additional inputs, \(u\): \(p_\theta(x_{t-1} | x_t, u) \)
Image backbone: ResNet-18 (pretrained on ImageNet)
Total: 110M-150M Parameters
Training Time: 3-6 GPU Days ($150-$300)
LLMs can copy the logic and extrapolate it!
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, Ahn et al. , 2022
What task-based affordances reminds us of in MDP/RL?
Value functions!
[Value Function Spaces, Shah, Xu, Lu, Xiao, Toshev, Levine, Ichter, ICLR 2022]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, Ahn et al. , 2022
Towards grounding everything in language
Language
Control
Vision
Tactile
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
https://socraticmodels.github.io
Lots of data
Less data
Less data
Roboticist
Vision
NLP
adapted from Tomás Lozano-Pérez
Instruction: Make a Line
LeanDojo
Impact: Formal verification, theorem proving automation, software correctness.
Impact: Pure mathematics, symbolic reasoning, novel conjectures.
We'd love to hear your thoughts.
1. MBRL is definitely a policy by test time behavior
2. It does search to generate data
Goal: Minimize T-gate counts in quantum circuits (critical for practical quantum computing).
Methods:
Secret Sauce:
Impact: