FitLam

FitLaM - Finetuned Language Models

The method should be able

to leverage large amounts of available data;
it should utilize a task, which can be optimized independently, leading to further downstream improvements;
it should rely on a single model that can be used as-is for most NLP tasks;
discriminative fine-tuning, that fine-tunes lower layers to a lesser extent than higher layers in order to retain

the knowledge acquired through language modeling
Backprop Through Time for Text Classification

Language Modelling using AWD-LSTM
Target Task LM Fine-Tuning
- Gradual Unfreezing
- Cosine Annealing
- Reverse Annealing
Classifier Fine-Tuning
- Concat Pooling
Discriminative Fine-Tuning