MLRG Fall 2022: Transformers

Title Text

A brief history...


RNNs + Attention

RNNs + Attention


Transformers at Scale

Transformers at Scale


  • Understand how and why transformers work
  • Explore the phenomenon of emergent abilities in LLMs as they are scaled up
  • Learn how they are being applied in NLP and beyond


Attention is All You Need (Vaswani et al., 2017)

  • The original Transformer paper. 

  • Goal: Get us all on the same page with respect to what makes up a Transformer:

    • Attention

    • Self-Attention

    • Positional Encodings

    • etc.

Language Models are Few-Shot Learners 

  • The GPT-3 paper 

  • A look at the trend of increasingly large LMs and, more importantly, their ability to perform well on tasks unseen during training.

Chain of Thought Prompting Elicits Reasoning in Large Language Models

  • The reasoning capabilities of language models can be improved by prompting them appropriately.

  • Other papers linked in document. 

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

  • Transformers as an alternative to CNNs?

Decision Transformer: Reinforcement Learning via Sequence Modeling

  • Recasts reinforcement learning as a conditional sequence modelling problem

  • Matches or exceeds performance of SoTA model-free offline RL algorithms

Neural Scaling Laws + Exploring the Limits of Large Scale Pre-Training (2 papers)

  • Neural Scaling Laws: Empirically shows that the performance of LLMs follows a power law

  • Exploring the limits: investigates the implications of this for downstream tasks 

Unveiling Transformers with LEGO: A Synthetic Reasoning Task

  • An attempt to better understand how/what transformers learn by training them on a simple reasoning task.

Learning Transferable Visual Models From Natural Language Supervision

  • A joint image/language model, trained to match captions to images

  • Impressive few-shot transfer to downstream tasks

  • An essential part of DALL-E's architecture

Hierarchical Text-Conditional Image Generation with CLIP Latents

  • DALL-E

  • Requires (at least) a brief explanation of diffusion models

Paper Sign-Up

MLRG Fall 2022: Transformers

By Dylan Green

MLRG Fall 2022: Transformers

  • 240