Search & Policy

achieves AGI?

Search

Given dynamics model, reward model, walk down a tree of future, search for actions if doing test time planning

e.g. Tree Search, RRT, GCS

When to use:

Have an intuition of what immediate action(s) to take, from past interactions or expert demonstrations

e.g. LLM, PPO, Imitation Learning

When to use:

Planning is the test time behavior of Search

Search itself is more general

"Tree traversal, also known as tree search, is a process of visiting each node of a tree data structure"

How we distinguish planning vs policy -> test time behavior

But we can combine both at training time

With a current policy and learned dynamics model, run MCTS to plan actions, take actions to get interaction data,

use data to improve policy & dynamics model

e.g. Chess agents, Tree of thought

When to use:

1. MBRL is definitely a policy by test time behavior

2. It does search to generate data