Towards grounding everything in language

Language

Control

Vision

Tactile

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

https://socraticmodels.github.io

Lots of data

Less data

Less data

"Language" as the glue for intelligent machines

Language

Perception

Planning

Control

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

https://socraticmodels.github.io

Some limits of "language" as intermediate representation?

-  Only for high level? what about control?

Perception

Planning

Control

Socratic Models

Inner Monologue

PaLi-3, BLIP

PaLM-SayCan

Wenlong Huang et al, 2022

Chinchilla, Sparrow

Imitation? RL?

Engineered?

PaLM-E

Challenge:  not a lot of paired

language + control data

Code is a linguistic representation of actions

and we have massive amounts of (pretraining) data for it

sax demo

Language models can write code

Code as a medium to express more complex plans

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

code-as-policies.github.io

Code as Policies: Language Model Programs for Embodied Control

Language models can write code

Code as a medium to express more complex plans

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

code-as-policies.github.io

Code as Policies: Language Model Programs for Embodied Control

SoTA on HumanEval

Language models can write code

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

code-as-policies.github.io

Code as Policies: Language Model Programs for Embodied Control

use NumPy,

SciPy code...

Language models can write code

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

code-as-policies.github.io

Code as Policies: Language Model Programs for Embodied Control

  • PD controllers
  • impedance controllers

Language models can write code

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

code-as-policies.github.io

Code as Policies: Language Model Programs for Embodied Control

Language models can write code

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

code-as-policies.github.io

Code as Policies: Language Model Programs for Embodied Control

What is the foundation models for robotics?

Extensions to Code as Policies

1. Fuse visual-language features into a robot map

2. Use code as policies to do various navigation tasks

"Visual Language Maps" Chenguang Huang et al., ICRA 2023

Extensions to Code as Policies

1. Fuse visual-language features into a robot map

2. Use code as policies to do various navigation tasks

"Visual Language Maps" Chenguang Huang et al., ICRA 2023

1. Write code to do visual reasoning

2. Few-shot SOTA improvements on VQA

"Modular VQA via Code Generation"

Sanjay Subramanian et al., 2023