Probabilistic Ensembles of Zero- and Few-Shot Learning Models for Emotion Classification
Angelo Basile
Guillermo Perez-Torro
Marc Franco-Salvador
angelo.basile@symanto.com
guillermo.perez@symanto.com
marc.franco@symanto.com
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/8884944/pasted-from-clipboard.png)
![](https://media4.giphy.com/media/12d19apJyRsmA/giphy.gif)
The angry wolf ate the happy boy
![](https://media3.giphy.com/media/11tTNkNy1SdXGg/giphy.gif)
![](https://media2.giphy.com/media/SXCQWrsob9TGg/giphy.gif)
Building affective corpora is hard.
![](https://media0.giphy.com/media/bBsLmHGPrZKN2/giphy.gif)
Can we approach at least the existing datasets using few or no annotated instances?
Research Question
Emotion Classification as Entailment
I loved the pizza!
This person expressed a feeling of pleasure.
This person feels sad.
Premise
Hypothesis
This person feels [...].
- Entailment
- Contradiction
- Neutral
Natural Language Inference (NLI)
JOY
SADNESS
What NLI dataset?
What PLM?
What hypothesis?
???
ANLI, Fever, MNLI, XNLI?
BERT, BART, RoBERTa?
This person feels [...], This person is feeling [...]?
![](https://media4.giphy.com/media/26ufc0OsEUTWhDw0E/giphy.gif)
Our Proposal
Build many different models and infer the best possible label from their predictions.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/8884836/pasted-from-clipboard.png)
MACE (Hovy et al., 2013)
The angry wolf ate the happy boy
![](https://media3.giphy.com/media/7VzgMsB6FLCilwS30v/giphy.gif)
Model 1
![](https://media3.giphy.com/media/7VzgMsB6FLCilwS30v/giphy.gif)
Model 2
![](https://media3.giphy.com/media/7VzgMsB6FLCilwS30v/giphy.gif)
...
![](https://media3.giphy.com/media/7VzgMsB6FLCilwS30v/giphy.gif)
Model N
JOY
SADNESS
SADNESS
...
Final Label
Experiments
![](https://media2.giphy.com/media/QU1pSfyEynvgY/giphy.gif)
Unify Emotion
(Bostan and Klinger, 2018)
![](https://media1.giphy.com/media/xUA7b7yLPq3IPOLnk4/giphy.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/8884861/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/8884902/pasted-from-clipboard.png)
- Zero-shot models provide modest performance
- A model of aggregation helps
- Few-shot NLI models are almost as good as fully-supervised models
![](https://media3.giphy.com/media/RkDX47fpp2nHlaZdjY/giphy.gif)
The benefits of a model of aggregation
- confidence value for each instance
- estimation of model's performance
- usually better than majority voting
- integrate labeled data, if available
- merge rule-based and deep learning systems
(see Passonneau and Carpenter, 2014)
RaNLP2021
By Angelo
RaNLP2021
- 493