MLRG Fall 2022: Transformers
Title Text
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9874691/Screen_Shot_2022-09-19_at_5.28.42_PM.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9874689/GitHub-Copilot-1.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9874704/Screen_Shot_2022-09-27_at_10.28.42_PM.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9874706/Screen_Shot_2022-09-27_at_10.30.14_PM.png)
A brief history...
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9874715/Screen_Shot_2022-09-27_at_10.35.24_PM.png)
RNNs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877535/Screen_Shot_2022-09-28_at_11.43.13_AM.png)
RNNs + Attention
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877537/Screen_Shot_2022-09-28_at_11.43.30_AM.png)
RNNs + Attention
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877541/Screen_Shot_2022-09-28_at_11.43.46_AM.png)
Transformers
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865745/Screen_Shot_2022-09-24_at_8.46.12_PM.png)
Transformers at Scale
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877494/Screen_Shot_2022-09-19_at_6.13.58_PM.png)
Transformers at Scale
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865777/Screen_Shot_2022-09-24_at_10.27.55_PM.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877571/Screen_Shot_2022-09-28_at_11.57.03_AM.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877572/Screen_Shot_2022-09-28_at_11.56.10_AM.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877573/Screen_Shot_2022-09-28_at_11.55.44_AM.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9877574/Screen_Shot_2022-09-28_at_11.54.32_AM.png)
Goals:
- Understand how and why transformers work
- Explore the phenomenon of emergent abilities in LLMs as they are scaled up
- Learn how they are being applied in NLP and beyond
Papers
Attention is All You Need (Vaswani et al., 2017)
-
The original Transformer paper.
-
Goal: Get us all on the same page with respect to what makes up a Transformer:
-
Attention
-
Self-Attention
-
Positional Encodings
-
etc.
-
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865745/Screen_Shot_2022-09-24_at_8.46.12_PM.png)
Language Models are Few-Shot Learners
-
The GPT-3 paper
-
A look at the trend of increasingly large LMs and, more importantly, their ability to perform well on tasks unseen during training.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865746/1_C-KNWQC_wXh-Q2wc6VPK1g.png)
Chain of Thought Prompting Elicits Reasoning in Large Language Models
-
The reasoning capabilities of language models can be improved by prompting them appropriately.
-
Other papers linked in document.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865774/Screen_Shot_2022-09-24_at_10.19.40_PM.png)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
-
Transformers as an alternative to CNNs?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865775/Screen_Shot_2022-09-24_at_10.22.10_PM.png)
Decision Transformer: Reinforcement Learning via Sequence Modeling
-
Recasts reinforcement learning as a conditional sequence modelling problem
-
Matches or exceeds performance of SoTA model-free offline RL algorithms
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865776/Screen_Shot_2022-09-24_at_10.24.30_PM.png)
Neural Scaling Laws + Exploring the Limits of Large Scale Pre-Training (2 papers)
-
Neural Scaling Laws: Empirically shows that the performance of LLMs follows a power law
-
Exploring the limits: investigates the implications of this for downstream tasks
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865777/Screen_Shot_2022-09-24_at_10.27.55_PM.png)
Unveiling Transformers with LEGO: A Synthetic Reasoning Task
-
An attempt to better understand how/what transformers learn by training them on a simple reasoning task.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865778/Screen_Shot_2022-09-24_at_10.32.29_PM.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9874656/FU5e41NUAAc7PIZ.png)
Learning Transferable Visual Models From Natural Language Supervision
-
A joint image/language model, trained to match captions to images
-
Impressive few-shot transfer to downstream tasks
-
An essential part of DALL-E's architecture
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9865779/Screen_Shot_2022-09-24_at_10.35.20_PM.png)
Hierarchical Text-Conditional Image Generation with CLIP Latents
-
DALL-E
-
Requires (at least) a brief explanation of diffusion models
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2232563/images/9867085/Screen_Shot_2022-09-25_at_2.05.49_PM.png)
Paper Sign-Up
MLRG Fall 2022: Transformers
By Dylan Green
MLRG Fall 2022: Transformers
- 320