Димитрина
Златкова
(ИИ)
Даниел
Копев
(ИИ)
Атанас
Атанасов
(ИИ)
SemEval-2018
Emoji Prediction
Action Plan
- Get Data
- Process Text
- Extract Features
- Combine Features
- Classification
- Evaluation
Text Processing
So this happened :) with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…
Original:
Text Processing (2)
So this happened smile with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…
Pattern replace:
['so', 'this', 'happened', 'smile', 'with', 'my', 'girls', 'and', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']
Tokenize:
So this happened smile with my girls and erinblonshine#sacredhearttattoo Sacred
Char filter:
Text Processing (3)
['happened', 'smile', 'girls', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']
Stop words:
['happen', 'smile', 'girl', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']
Lemmatize:
[('so','RB'),('this','DT'),('happened','VBD'),('smile','NN'), ('with','IN'),('my','PRP$'), ('girls','NNS'),('and','CC'), ('erinblonshine','NN'),('sacred','JJ'),('heart','NN'),('tattoo','NN'),('sacred','VBD')]
POS Tagger:
Vectorization
- n-grams
- tf-idf
- GloVe
- word2vec
- FB research (StarSpace)
Feature Engineering
Text Features
- words
- unique words
- stopwords
- @user
- Words Title
- WORDS UPPER
- mean word length
- #
- a b c
- 1 2 3
- $
- %
- !!!
- ???
Emotions
- anger
- sadness
- disgust
- fear
- joy
- anticipation
- surprise
- trust
- fun, sun
- fun, sun
- sun
- sun
Colors
- black
- blue
- brown
- green
-
grey
- orange
- pink
- purple
- red
- white
- yellow
- sun
- sun
- park
Sentiment
Positive:
Negative:
"pos_0", "pos_.15", "pos_.20", "pos_.27", "pos_.4", "pos_above"
"neg_0", "neg_.15", "neg_.25", "neg_.35", "neg_.6", "neg_above"
Hierarchical Twitter Clusters
^010011000 | got qot gott g0t gotz qott gottt gawt ghot gotcho goht ggot |
^111010100010 | lmao lmfao lmaoo lmaooo lool rofl loool lmfaoo lmfaooo lmaoooo |
^111010100011 | haha hahaha hehe hahahaha hahah aha hehehe ahaha hah hahahah hahaa ahah |
Classification
Precision | Recall | F1 Macro | |
Naive Bayes | 1.00 | 0.21 | 1.763 |
SVM (non-linear) | 1.00 | 0.21 | 1.763 |
Random Forest | 0.57 | 0.27 | 14.979 |
MLP | 0.41 | 0.26 | 17.173 |
StarSpace + NN |
(10k train, 1k test)
Ensemble Learning
- Random Forest
- Extra Trees
- AdaBoost
- Stacking
Deep Learning
Картинка 1
LSTMs
Hierarchical Attention Networks
Embeddings
StarSpace
Glove
Word2Vec
Results
Precision | Recall | F1 Macro | |
SVM (linear kernel, SGD) | 0.65 | 0.61 | 59.171 |
Standart BiLSTM(10epo) | 0.60 | 0.35 | 39.28 |
Convolutional LSTM | 0.60 | 0.30 | 40 |
Hierarchical Attention | 0.76 | 0.48 | 49 |
(488k train, 50k test)
valentine, loveofmylife, heart full, heart
cool kid, sunglasses, coolin, shade, cool, sunglass
ti season, christmastree, tree, christmas tree, merry christmas, merry, christmas
pretty pink, breast, pink, breast cancer
daze, beachin, sunshine state, fun sun, sunny day, sun, sunny, sunshine
veteran day, murica, veteran, america, ivoted, election, merica, vote, usa
Tools
Emoji Prediction
By Dimitrina Zlatkova
Emoji Prediction
- 745