Predicting the Future With
Social Media
by
Sitaram Asur, Bernardo A. Huberma
2010 IEEE/WIC/ACM International Conference
Web Intelligence and Intelligent Agent Technology (WI-IAT)
Group 5
Why Social Media?
Ubiquitous
Why Social Media ?
- Social Networking has attracted everyone
- Large Content Sharing Medium
- This data source is largely untapped
Social Media Examples
Social Media has EVERYTHING
- Environment
- Politics
- Technology
- Sports
- Entertainment Industry (Movies)
Source and Spread of Content
- People create own content
- Share others content
- Either way the content spreads among their friends
- This widespread contents (opinions/ views) can be an important source of information for predictions.
How effective is social Media for future prediction ?
- Can we correlate tweet rate with the success of a movie in box office
- Based on previous movies and twitter chatter about them, can we create a reliable classifer
- Can we do better predictions by coupling it with sentiment analysis
Related Work
- Studied sales spike based on online chatter
- Found outcome of carefully constructed queries can predict market trends
- Other works involving movie success prediction require metadata such as movies genre, MPAA rating, running time, release data
- Some works tried to correlate sentiments with box-office scores
- Extremely Popular
- Microblogging Service
- Content is in the form of Tweets
- Tweet is a short message (140 characters)
- Tweet can consist of
- Text
- Links to Images, Video and Articles
- Retweet is a post originally made by one user that is
forwarded by another user. - Twitter is a potential market for viral marketing.
- Due to its huge reach, a number of businesses use Twitter to advertise products and disseminate information to stakeholders
Dataset
- 2.89 Million Tweets
- 1.2 Million Users
- 24 Different Movies
- Collected using Twitter Search API
- Contains Text, Timestamp and User Info
Dataset
- Movies are released mostly on Fridays and rarely on Wednesdays
- Average of 2 movie releases per week
- Data collected over 3 months (24 movies)
Data Consistency
-
Data Consistency:
- Movie released only on Fridays are considered
- Movie released on large number of theatres are considered
Movie List
- Sherlock Homes
- Avatar
- Daybreakers
- Legion
- Leap Year
- Twilight: New Moon
- Spy Next Door
- When In Rome
Movie Resolution
- Movie titled "2012"
- It is hard to classify 2012 as the title of the movie or year
- So, sanity checks have been performed to remove such conflicting movies
Critical Period
- The time from the week before it is released
- when the promotional campaigns are in full swing
- to two weeks after release
- when its initial popularity fades and opinions from people have been disseminated
Time-series of tweets over the critical period for different movies
Number of tweets per unique authors for different movies
Log distribution of authors and tweets.
Linear Regression
- Linear Fitting
- Relating one known variable with a unknown variable
- R-square (Co-efficient of determination)
- p-value
Attention and Popularity
- Pre-release attention
- Includes promos, trailers, pictures
- Most tweets should be URL based
- Retweets should follow the same pattern
- Post release chatter
Attention and Popularity
- Pre-release attention
- But is there a correlation between number of tweets with URLs and movie success?
Attention and Popularity
Some Positive correlation but co-efficient of determination is low
Prediction of Box-office revenues
- Using the tweets referring to movies prior to their release, can we accurately predict the box-office revenue generated by the movie in its opening weekend?
Tweet rate and box office gross
-
The correlation of the average tweet rate with the box-office gross for the 24 movies considered showed a strong positive correlation, with a correlation coeffi- cient value of 0.90
-
Transylmania with 2.75 tweet per hour grossed only $263K
-
Twilight and Avatar having more than 1k tweet per hour grossed142M and 72M
Comparison with HSX
-
Hollywood Stock Exchange is virtual stock market for movies
-
Players can buy "shares" in movies, actors, directors etc
-
Stock prices are adjusted based on the gross income
-
Earlier studies have shown correlation between HSX index and movie success in Box office
Comparison with HSX
Text
Predicting HSX index
Predicting for any week
Text
Sentiment Analysis
- Sentiments to forecast the box-office values
- Positive
- Negative
- Neutral
- LingPipe Sentiment Analysis Classifier
Supervised Learning
- How to find the class descriptor ?
- Amazon Mechanical Turk (Manual Classification)
- Thousands of workers were employed to manually classify all the tweets
- Tweets with unanimous classification were taken to train the model
Preprocessing
- Elimination of stop-words
- Elimination of all special characters except exclamation marks which were replaced by < EX > and question marks (< QM >)
- Removal of urls and user-ids
- Replacing the movie title with < MOV >
Movie Subjectivity
- More value for sentiments after the movie release.
- Positive sentiments - recommendations by people
- To capture this subjectivity we define
Movie Subjectivity
Movie Polarity
- Movie with more positive tweets than negative is likely successful. So we define
- The Blind Side (5.02 to 9.65) - 34M to 40.1M
- New Moon(6.29 to 5) - 142M to 42M
Movie Polarity
Results of Regression Experiments
Text
Conclusion
- Social media can be utilized to forecast future outcomes.
- Constructed a linear regression model for
predict box-office revenues of movies in advance of their release. - Analyzed the sentiments present in tweets and demonstrated their efficacy.
- This method can be extended to a large panoply of topics eg: future product rating, election outcomes.
THANK YOU
Questions?
Predicting the Future With Social Media
By arvind ram
Predicting the Future With Social Media
- 508