gxg

Simple n-gram based models perform well for gender prediction.

Sometimes.

Evalita 2018 - Torino

Capetown

Milano

Tirana

BUT...

If a model performs well on a gender-labelled data set, then it is (dangerously) modelling gender

ASSUMPTION

Let's give GXG a try!

Build the best possible model

Take a state-of-the-art gender prediction system and test it under new conditions

Research Questions

RQ1

RQ2

In case it performs poorly, try something to improve it

MODEL

text

word n-grams

character n-grams

Linear SVM

PAN 2017

languages? genres?

RESULTS

Results

Accuracy

IN

Results

Accuracy

CROSS

Conclusions

Gender prediction is hard!

We don't know if it is dangerous

Abstract features produce consistent but low results

RQ1

RQ2

also...

github.com/anbasile/gxg

DOWNLOAD TRAINED MODEL

What next?