Transfer Learning in NLP

Hamish - OVO

Part 1: Transfer learning

Hi! I'm hamish 👋

  • Joined ACE team ~2 months ago
  • Background in physics
  • Previously at startups doing MLE/DE/FP stuff
  • Kinds of things I know a bit about
    • Sequence models, NLP stuff
    • Building models quickly
    • Building models with very little labelled data
    • Getting things deployed

Text

Transformer models

Killer app idea: potato/not-potato

  • build a potato/not-potato app
  • just a few hundred images
  • just a few lines of code

Problem 1: Architecture

Often best first step is to see what everyone else is using

????

🥔/❌

ResNet

from torchvision import models

model = models.resnet32()

Problem 2: Training

  • Training SOTA models isn't easy
  • Lots of data: 1M+ images, 1000 categories
  • Resource heavy: 50GPUs for 100 epochs
  • Can't expect to have 1M+ images, 50GPUs, 1000 categories
  • What can we reasonably do?

What do NN's learn?

Zeiler et al 2016

Can exploit this

  • get a pretrained model close to what we want
  • fix layers (transfer learning)
  • or keep them trainable (fine tuning)
  • replace the old output with something matching what we need
  • randomly initialise new output layer and train
# get pre-trained model
model = models.resnet18(pretrained=True)

# fix the parameters so they don't train
for param in model.parameters():
    param.requires_grad = False

# define a new last layer with random initialistion
model.fc = nn.Linear(512, 1)

Wait! I have code!

Why does this work?

Not a golden hammer

  • Very common in CV
  • Never see this in NLP
  • Need a pretrained NN close to what you want to do
  • Doesn't help if you're going to train big anyway (link to FAIR paper)

Transfer Learning

By Hamish dickson