



Димитрина
Златкова
(ИИ)
Даниел
Копев
(ИИ)
Атанас
Атанасов
(ИИ)
SemEval-2018
Emoji Prediction
@
So this happened :) with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…

Data:
Label:



red_heart
two_hearts
blue_heart
purple_heart
camera_with_flash
camera
Classes
Our Plan
- Data
- Text Preprocessing
- Feature Engineering
- Classification
- Evaluation
Text Processing
So this happened :) with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…

Original:
Text Processing (2)
So this happened __smile__ with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…
Pattern replace:
['so', 'this', 'happened', '__smile__', 'with', 'my', 'girls', 'and', 'erinblonshine', 'sacred', 'heart', 'tat', 'too', 'sacred']
Tokenize:
So this happened __smile__ with my girls and erinblonshine#sacredhearttattoo Sacred
Char filter:
Text Processing (3)
['happened', 'smile', 'girls', 'erinblonshine', 'sacred', 'heart', 'tat', 'sacred']
Stop words:
['happen', 'smile', 'girl', 'erinblonshine', 'sacred', 'heart', 'tat', 'sacred']
Lemmatize:
[('so','RB'),('this','DT'),('happened','VBD'),('smile','NN'), ('with','IN'),('my','PRP$'), ('girls','NNS'),('and','CC'), ('erinblonshine','NN'),('sacred','JJ'),('heart','NN'),('tat','VB'),('too', 'RB'),('sacred','VBD')]
POS Tagger:

Vectorization
- tf-idf
- GloVe
- word2vec
- FB research (StarSpace)

Feature Engineering

Text Features
- words
- unique words
- stopwords
- @user
- Words Title
- WORDS UPPER
- mean word length
- #
- a b c
- 1 2 3
- $
- %
- !!!
- ???
Emotions
- anger
- sadness
- disgust
- fear
- joy
- anticipation
- surprise
- trust

- fun, sun
- fun, sun
- sun
- sun




Colors
- black
- blue
- brown
- green
-
grey
- orange
- pink
- purple
- red
- white
- yellow

- sun
- sun
- park




Sentiment
Positive:
Negative:


"pos_0", "pos_.15", "pos_.20", "pos_.27", "pos_.4", "pos_above"
"neg_0", "neg_.15", "neg_.25", "neg_.35", "neg_.6", "neg_above"
Hierarchical Twitter Clusters
| ^010011000 | got qot gott g0t gotz qott gottt gawt ghot gotcho goht ggot |
| ^111010100010 | lmao lmfao lmaoo lmaooo lool rofl loool lmfaoo lmfaooo lmaoooo |
| ^111010100011 | haha hahaha hehe hahahaha hahah aha hehehe ahaha hah hahahah hahaa ahah |
Experimental Results
| Precision | Recall | F1 Macro | |
| Multinomial Naive Bayes | 0.05 | 0.21 | 1.763 |
| Logistic Regression with L-BFGS | 0.22 | 0.28 | 13.16 |
| MLP, 2 hidden layers, ReLU | 0.26 | 0.26 | 17.898 |
| Random Forest (50 estimators) | 0.20 | 0.26 | 16.167 |
| SVM, tf-idf | 0.23 | 0.27 | 19.554 |
| SVM, Twitter embeddings | 0.16 | 0.18 | 8.522 |
| AdaBoost, Extra Tree base | 0.15 | 0.19 | 7.825 |
| SVM+AdaBoost+Random Forest | 0.25 | 0.24 | 13.764 |
| SVM+AdaBoost+MLP | 0.25 | 0.28 | 20.106 |
(10k train, 1k test)

Deep Learning
CNN for Text Classification

LSTM Networks

Hierarchical Attention Neural Networks for Text Classification

Experimental Results
| Precision | Recall | F1 Macro | |
| CNN | 0.15 | 0.14 | 12.034 |
| RNN with LSTM | 0.24 | 0.17 | 13.106 |
| HANN | 0.30 | 0.13 | 15.999 |
(10k train, 1k test)

Final Results
| Precision | Recall | F1 Macro | |
| SVM, tf-idf | 0.30 | 0.33 | 23.3 |
| HANN | 0.30 | 0.13 | 22.518 |
(488k train, 50k test)


valentine, loveofmylife, heart full, heart

cool kid, sunglasses, coolin, shade, cool, sunglass

ti season, christmastree, tree, christmas tree, merry christmas, merry, christmas

pretty pink, breast, pink, breast cancer


daze, beachin, sunshine state, fun sun, sunny day, sun, sunny, sunshine

veteran day, murica, veteran, america, ivoted, election, merica, vote, usa

23.3
Leaderboard
Thank you!

Emoji Prediction Seminar
By Dimitrina Zlatkova
Emoji Prediction Seminar
- 581