Attitude analysis and corpus analytics of 1M tweets about feminism

Zafarali Ahmed
Jerome Boisvert-Chouinard
Dave Gurnsey
Nancy Lin
Reda Lotfi
David Taylor

Big Data Week Montreal Hackathon

April 26, 2015

Attitude Analysis vs Sentiment Analysis

 

People use negative-sentiment words to support or oppose

 

 

"I hate feminism!"

vs.

"I hate it when people mock feminism!"

 

Ongoing problem: how do you read intentions?

Methodology summary

- retrieved last 100 tweets containing "feminism", "feminist" or "feminists" every 15 minutes for 5 months

- manually classified 500 tweets as pro-, anti-feminist or neither

- parsed without stopwords, with punctuation

- trained Naive Bayes classifier with 50-55% test set accuracy, predicted attitude of 391,000 original tweets

- calculated log-likelihood of each token (word, symbol, punctuation) appearing in the pro-feminist corpus or the anti-feminist corpus

 

Principal limitation:

small training set

 

Mechanical Turk

Active learning

 

 

Make method more robust, can turn it into a tool to predict attitudes, not just sentiments

Results

 

Did we succeed in differentiating the language used by pro- and anti-feminists?

SEARCH TERM KEYNESS:

 

"feminism" more likely to return pro-feminist tweets

"feminists" more likely to return anti-feminist tweets

Plot.ly

Let's explore the keyness differences between the two corpora!

 

                          Anti                                             Pro

Some initial patterns

Note: [joy] = emoji 'face with tears of joy' 

Ascribing

&

Predictive

Describing

&

Defining

 

 

vs

 

Running experiments

HPC

rocks!

Made with Slides.com