Димитрина
Златкова
(ИИ)
Даниел
Копев
(ИИ)
Атанас
Атанасов
(ИИ)
SemEval-2018
Emoji Prediction
@
So this happened :) with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…
Data:
Label:
red_heart
two_hearts
blue_heart
purple_heart
camera_with_flash
camera
Classes
Our Plan
- Data
- Text Preprocessing
- Feature Engineering
- Classification
- Evaluation
Text Processing
So this happened :) with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…
Original:
Text Processing (2)
So this happened __smile__ with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…
Pattern replace:
['so', 'this', 'happened', '__smile__', 'with', 'my', 'girls', 'and', 'erinblonshine', 'sacred', 'heart', 'tat', 'too', 'sacred']
Tokenize:
So this happened __smile__ with my girls and erinblonshine#sacredhearttattoo Sacred
Char filter:
Text Processing (3)
['happened', 'smile', 'girls', 'erinblonshine', 'sacred', 'heart', 'tat', 'sacred']
Stop words:
['happen', 'smile', 'girl', 'erinblonshine', 'sacred', 'heart', 'tat', 'sacred']
Lemmatize:
[('so','RB'),('this','DT'),('happened','VBD'),('smile','NN'), ('with','IN'),('my','PRP$'), ('girls','NNS'),('and','CC'), ('erinblonshine','NN'),('sacred','JJ'),('heart','NN'),('tat','VB'),('too', 'RB'),('sacred','VBD')]
POS Tagger:
Vectorization
- tf-idf
- GloVe
- word2vec
- FB research (StarSpace)
Feature Engineering
Text Features
- words
- unique words
- stopwords
- @user
- Words Title
- WORDS UPPER
- mean word length
- #
- a b c
- 1 2 3
- $
- %
- !!!
- ???
Emotions
- anger
- sadness
- disgust
- fear
- joy
- anticipation
- surprise
- trust
- fun, sun
- fun, sun
- sun
- sun
Colors
- black
- blue
- brown
- green
-
grey
- orange
- pink
- purple
- red
- white
- yellow
- sun
- sun
- park
Sentiment
Positive:
Negative:
"pos_0", "pos_.15", "pos_.20", "pos_.27", "pos_.4", "pos_above"
"neg_0", "neg_.15", "neg_.25", "neg_.35", "neg_.6", "neg_above"
Hierarchical Twitter Clusters
^010011000 | got qot gott g0t gotz qott gottt gawt ghot gotcho goht ggot |
^111010100010 | lmao lmfao lmaoo lmaooo lool rofl loool lmfaoo lmfaooo lmaoooo |
^111010100011 | haha hahaha hehe hahahaha hahah aha hehehe ahaha hah hahahah hahaa ahah |
Experimental Results
Precision | Recall | F1 Macro | |
Multinomial Naive Bayes | 0.05 | 0.21 | 1.763 |
Logistic Regression with L-BFGS | 0.22 | 0.28 | 13.16 |
MLP, 2 hidden layers, ReLU | 0.26 | 0.26 | 17.898 |
Random Forest (50 estimators) | 0.20 | 0.26 | 16.167 |
SVM, tf-idf | 0.23 | 0.27 | 19.554 |
SVM, Twitter embeddings | 0.16 | 0.18 | 8.522 |
AdaBoost, Extra Tree base | 0.15 | 0.19 | 7.825 |
SVM+AdaBoost+Random Forest | 0.25 | 0.24 | 13.764 |
SVM+AdaBoost+MLP | 0.25 | 0.28 | 20.106 |
(10k train, 1k test)
Deep Learning
CNN for Text Classification
LSTM Networks
Hierarchical Attention Neural Networks for Text Classification
Experimental Results
Precision | Recall | F1 Macro | |
CNN | 0.15 | 0.14 | 12.034 |
RNN with LSTM | 0.24 | 0.17 | 13.106 |
HANN | 0.30 | 0.13 | 15.999 |
(10k train, 1k test)
Final Results
Precision | Recall | F1 Macro | |
SVM, tf-idf | 0.30 | 0.33 | 23.3 |
HANN | 0.30 | 0.13 | 22.518 |
(488k train, 50k test)
valentine, loveofmylife, heart full, heart
cool kid, sunglasses, coolin, shade, cool, sunglass
ti season, christmastree, tree, christmas tree, merry christmas, merry, christmas
pretty pink, breast, pink, breast cancer
daze, beachin, sunshine state, fun sun, sunny day, sun, sunny, sunshine
veteran day, murica, veteran, america, ivoted, election, merica, vote, usa
23.3
Leaderboard
Thank you!
Emoji Prediction Seminar
By Dimitrina Zlatkova
Emoji Prediction Seminar
- 538