achieves AGI?
Given dynamics model, reward model, walk down a tree of future, search for actions if doing test time planning
e.g. Tree Search, RRT, GCS
When to use:
Have an intuition of what immediate action(s) to take, from past interactions or expert demonstrations
e.g. LLM, PPO, Imitation Learning
When to use:
Planning is the test time behavior of Search
Search itself is more general
"Tree traversal, also known as tree search, is a process of visiting each node of a tree data structure"
How we distinguish planning vs policy -> test time behavior
But we can combine both at training time
With a current policy and learned dynamics model, run MCTS to plan actions, take actions to get interaction data,
use data to improve policy & dynamics model
e.g. Chess agents, Tree of thought
When to use:
1. MBRL is definitely a policy by test time behavior
2. It does search to generate data