Димитрина

Златкова

(ИИ)

Даниел

Копев

(ИИ)

Атанас

Атанасов

(ИИ)

SemEval-2018

Emoji Prediction

Action Plan

  1. Get Data
  2. Process Text
  3. Extract Features
  4. Combine Features
  5. Classification
  6. Evaluation

Text Processing

So this happened :) with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…

Original:

Text Processing (2)

So this happened smile with my girls @user and erinblonshine. ️ #29 #sacredhearttattoo @ Sacred…

Pattern replace:

['so', 'this', 'happened', 'smile', 'with', 'my', 'girls', 'and', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']

Tokenize:

So this happened smile with my girls  and erinblonshine#sacredhearttattoo   Sacred

Char filter:

Text Processing (3)

['happened', 'smile', 'girls', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']

Stop words:

['happen', 'smile', 'girl', 'erinblonshine', 'sacred', 'heart', 'tattoo', 'sacred']

Lemmatize:

[('so','RB'),('this','DT'),('happened','VBD'),('smile','NN'), ('with','IN'),('my','PRP$'), ('girls','NNS'),('and','CC'), ('erinblonshine','NN'),('sacred','JJ'),('heart','NN'),('tattoo','NN'),('sacred','VBD')]

POS Tagger:

Vectorization

  • n-grams
  • tf-idf
  • GloVe
  • word2vec
  • FB research (StarSpace)

Feature Engineering

Text Features

  • words
  • unique words
  • stopwords
  • @user
  • Words Title
  • WORDS UPPER
  • mean word length
  • #
  • a b c
  • 1 2 3
  • $
  • %
  • !!!
  • ???

Emotions

  • anger
  • sadness
  • disgust
  • fear
  • joy
  • anticipation
  • surprise
  • trust

- fun, sun

- fun, sun

- sun

- sun

Colors

  • black
  • blue
  • brown
  • green
  • grey
  • orange
  • pink
  • purple
  • red
  • white
  • yellow

- sun

- sun

- park

Sentiment

Positive:

Negative:

"pos_0", "pos_.15", "pos_.20", "pos_.27", "pos_.4", "pos_above"

"neg_0", "neg_.15", "neg_.25", "neg_.35", "neg_.6", "neg_above"

Hierarchical Twitter Clusters

^010011000 got qot gott g0t gotz qott gottt gawt ghot gotcho goht ggot
^111010100010 lmao lmfao lmaoo lmaooo lool rofl loool lmfaoo lmfaooo lmaoooo
^111010100011 haha hahaha hehe hahahaha hahah aha hehehe ahaha hah hahahah hahaa ahah

Classification

Precision Recall F1 Macro
Naive Bayes 1.00 0.21 1.763
SVM (non-linear) 1.00 0.21 1.763
Random Forest 0.57 0.27 ​14.979
MLP 0.41 0.26 17.173
StarSpace + NN

(10k train, 1k test)

Ensemble Learning

  • Random Forest
  • Extra Trees
  • AdaBoost
  • Stacking

Deep Learning

Картинка 1

LSTMs

Hierarchical Attention Networks

Embeddings

StarSpace

 

Glove

 

Word2Vec

 

Results

Precision Recall F1 Macro
SVM (linear kernel, SGD) 0.65 0.61 59.171
Standart BiLSTM(10epo) 0.60 0.35 39.28
Convolutional LSTM 0.60 0.30 40
Hierarchical Attention 0.76 0.48 49

(488k train, 50k test)

 valentine, loveofmylife, heart full, heart

cool kid, sunglasses, coolin, shade, cool, sunglass

ti season, christmastree, tree, christmas tree, merry christmas, merry, christmas

pretty pink, breast, pink, breast cancer

daze, beachin, sunshine state, fun sun, sunny day, sun, sunny, sunshine

veteran day, murica, veteran, america, ivoted, election, merica, vote, usa

Tools

Made with Slides.com