Final Project for 11-785: Intro to Deep Learning
Rohan Pandya, Alvin Shek, Deval Shah, Chandrayee Bhaumik
Learning to learn (meta-learning) is a skill that enable humans to generalize and transfer prior knowledge.
Goal: Quickly retrain a trained model to learn a new task from few samples
Meta-Learning enables:
Problem Description
Learning useful policies by directly interacting with the environment and receiving rewards for good behaviors
Advantages:
Disadvantages:
Problem Description
Robotic Manipulation refers to the use of robots to effect physical changes to the world around us. Examples include moving objects, grasping, carrying, pushing, dropping, throwing, and so on.
Challenges:
Problem Description
Problem Description
Broad categorization of Meta-Learning approaches:
Problem Description
Fig: Train and Test tasks showing parametric task variation
Fig: Parametric vs Non-Parametric Task Variation
Fig: 7 DOF Arm performing reach-task in RL-Bench Simulation
Policy Network
Inputs: EE Pose, Goal Pose
Outputs: Delta EE pose
EE Pose
Goal Pose
Delta EE Pose
PPO
• Policy gradient RL Algorithm
• Improvements over other PG methods
• Still sample inefficient (200K+)
Reptile
• Policy gradient RL Algorithm
• Improvements over other PG methods
• Still sample inefficient (200K+)
MAML
• Model-Agnostic Meta-Learning
• General optimization algo, compatible with any model that uses SGD
• Learns good policy parameter initializations
Batch update rule
initialization parameters
updated parameters
on the i-th task
Batch updates are lower variance, and have potential to lead to faster convergence.
This is similar to performing mini-batch GD vs SGD
}
N parallel workers
80% success rate
Reptile
MAML
Multi-Headed Reptile
Vanilla PPO
In multi-headed reptile, we directly changing the parameters based on the updated parameters of task-specific models
Effectively we only account for first order gradient terms and entirely lose information of higher order terms.
Batch gradient descent and batch parameter weight update have different convergence properties
MAML > Reptile > PPO ~ Multi-headed Reptile
In this work we explored Multi-headed Reptile, which has the following two changes
Asynchronous training : This speeds up the training process
Further investigation needed into the probable causes of its failure and potential fixes