Simple n-gram based models perform well for gender prediction.

Sometimes.

Evalita 2018 - Torino

Capetown

Milano

Tirana

BUT...

If a model performs well on a gender-labelled data set, then it is (dangerously) modelling gender

ASSUMPTION

Let's give GXG a try!

Build the best possible model

Take a state-of-the-art gender prediction system and test it under new conditions

Research Questions

RQ1

RQ2

In case it performs poorly, try something to improve it

MODEL

text

word n-grams

character n-grams

Linear SVM

PAN 2017

languages? genres?

RESULTS

Results

lexical bleached
youtube 62 59
twitter 74 67
diaries 70 67
journalism 62 54
children 54 53

Accuracy

IN

Results

lexical bleached
youtube 57 53
twitter 52 50
diaries 62 53
journalism 56 53
children 60 53

Accuracy

CROSS

Conclusions

Gender prediction is hard!

We don't know if it is dangerous

Abstract features produce consistent but low results

RQ1

RQ2

also...

github.com/anbasile/gxg

DOWNLOAD TRAINED MODEL

What next?

gxg

By Angelo