Simple n-gram based models perform well for gender prediction.
Sometimes.
Evalita 2018 - Torino
![](https://media1.giphy.com/media/haB6FriHgXPuE/giphy.gif)
Capetown
Milano
Tirana
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5593473/map.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5593511/gareth.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5593530/photo_2018-12-11_15-04-12.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5593537/WhatsApp_Image_2018-12-11_at_15.04.41.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5592411/1.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5592410/2.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5592473/3.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5592472/4.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5592479/5.png)
![](https://media3.giphy.com/media/3o7TKHKjrDyqphX9Cg/giphy.gif)
BUT...
![](https://media1.giphy.com/media/Pn1gZzAY38kbm/giphy.gif)
If a model performs well on a gender-labelled data set, then it is (dangerously) modelling gender
ASSUMPTION
![](https://media1.giphy.com/media/xUPGcvLrwIh8ZmC8fK/giphy.gif)
Let's give GXG a try!
Build the best possible model
Take a state-of-the-art gender prediction system and test it under new conditions
Research Questions
RQ1
RQ2
In case it performs poorly, try something to improve it
MODEL
![](https://media3.giphy.com/media/PPUoRDqV054A2YHumS/giphy.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5593057/pan.png)
text
word n-grams
character n-grams
Linear SVM
![](https://media2.giphy.com/media/abGHxGls6a1K8/giphy.gif)
PAN 2017
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5593071/bleaching.png)
languages? genres?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5596936/bleach.png)
RESULTS
Results
lexical | bleached | |
---|---|---|
youtube | 62 | 59 |
74 | 67 | |
diaries | 70 | 67 |
journalism | 62 | 54 |
children | 54 | 53 |
Accuracy
IN
Results
lexical | bleached | |
---|---|---|
youtube | 57 | 53 |
52 | 50 | |
diaries | 62 | 53 |
journalism | 56 | 53 |
children | 60 | 53 |
Accuracy
CROSS
Conclusions
Gender prediction is hard!
We don't know if it is dangerous
Abstract features produce consistent but low results
RQ1
RQ2
also...
github.com/anbasile/gxg
DOWNLOAD TRAINED MODEL
![](https://media0.giphy.com/media/3o7WTAkv7Ze17SWMOQ/giphy.gif)
What next?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/421921/images/5593080/clin.png)
gxg
By Angelo
gxg
- 1,044