



Димитрина
Златкова
(ИИ)
Даниел
Копев
(ИИ)
Атанас
Атанасов
(ИИ)
SemEval-2018
Emoji Prediction


Action Plan
- Get Data
- Process Text
- Extract Features
- Combine Features
- Classification
- Evaluation
Text Processing
So this happened :) with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…

Original:
Text Processing (2)
So this happened smile with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…
Pattern replace:
['so', 'this', 'happened', 'smile', 'with', 'my', 'girls', 'and', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']
Tokenize:
So this happened smile with my girls and erinblonshine#sacredhearttattoo Sacred
Char filter:
Text Processing (3)
['happened', 'smile', 'girls', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']
Stop words:
['happen', 'smile', 'girl', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']
Lemmatize:
[('so','RB'),('this','DT'),('happened','VBD'),('smile','NN'), ('with','IN'),('my','PRP$'), ('girls','NNS'),('and','CC'), ('erinblonshine','NN'),('sacred','JJ'),('heart','NN'),('tattoo','NN'),('sacred','VBD')]
POS Tagger:

Vectorization
- n-grams
- tf-idf
- GloVe
- word2vec
- FB research (StarSpace)

Feature Engineering

Text Features
- words
- unique words
- stopwords
- @user
- Words Title
- WORDS UPPER
- mean word length
- #
- a b c
- 1 2 3
- $
- %
- !!!
- ???
Emotions
- anger
- sadness
- disgust
- fear
- joy
- anticipation
- surprise
- trust

- fun, sun
- fun, sun
- sun
- sun




Colors
- black
- blue
- brown
- green
-
grey
- orange
- pink
- purple
- red
- white
- yellow

- sun
- sun
- park




Sentiment
Positive:
Negative:


"pos_0", "pos_.15", "pos_.20", "pos_.27", "pos_.4", "pos_above"
"neg_0", "neg_.15", "neg_.25", "neg_.35", "neg_.6", "neg_above"
Hierarchical Twitter Clusters
| ^010011000 | got qot gott g0t gotz qott gottt gawt ghot gotcho goht ggot |
| ^111010100010 | lmao lmfao lmaoo lmaooo lool rofl loool lmfaoo lmfaooo lmaoooo |
| ^111010100011 | haha hahaha hehe hahahaha hahah aha hehehe ahaha hah hahahah hahaa ahah |
Classification
| Precision | Recall | F1 Macro | |
| Naive Bayes | 1.00 | 0.21 | 1.763 |
| SVM (non-linear) | 1.00 | 0.21 | 1.763 |
| Random Forest | 0.57 | 0.27 | 14.979 |
| MLP | 0.41 | 0.26 | 17.173 |
| StarSpace + NN |
(10k train, 1k test)

Ensemble Learning
- Random Forest
- Extra Trees
- AdaBoost
- Stacking

Deep Learning
Картинка 1

LSTMs



Hierarchical Attention Networks

Embeddings
StarSpace

Glove
Word2Vec

Results
| Precision | Recall | F1 Macro | |
| SVM (linear kernel, SGD) | 0.65 | 0.61 | 59.171 |
| Standart BiLSTM(10epo) | 0.60 | 0.35 | 39.28 |
| Convolutional LSTM | 0.60 | 0.30 | 40 |
| Hierarchical Attention | 0.76 | 0.48 | 49 |
(488k train, 50k test)






valentine, loveofmylife, heart full, heart

cool kid, sunglasses, coolin, shade, cool, sunglass

ti season, christmastree, tree, christmas tree, merry christmas, merry, christmas

pretty pink, breast, pink, breast cancer


daze, beachin, sunshine state, fun sun, sunny day, sun, sunny, sunshine

veteran day, murica, veteran, america, ivoted, election, merica, vote, usa

Tools







Emoji Prediction
By Dimitrina Zlatkova
Emoji Prediction
- 809