Title Text
Kishor Mohite
Rajat Jain
Sampath Kolachana
Outline
- Motivation
- Problem Statement
- System Architecture
- Research Demo
- Algorithms
- Novelty & Conclusion
- Future Work
Motivation
A Random Internet Article
Movie review blog
A financial portal
A news article
People with interests
Want to know
- What news is currently trending regarding the event discussed in the article or the event of interest?
- What are the views of readers regarding those news?
- What are the views of people on social networking sites, regarding the same?
Problem Statement
Text
System Architecture
Text
Demo
Algorithms
- Clustering News Articles
- Ranking Algorithms
- Summarisation
- Tag Generation
- Trending Tags
- Categorisation
Clustering Articles
Text
Ranking Algorithms
- Comment Ranking
- Cluster Ranking
- Article Ranking
Comment Ranking
Flawed Ranking Systems
Current Approach - Wilson Score
Wilson Score or precisely the lower bound of Wilson score confidence interval for a Bernoulli parameter is used.
With a chance of 85%, the real fraction of positive ratings will be equal to this value.
Wilson Score - Advantages
- With a chance of 85%, the real fraction of positive ratings will be equal to this value.
- Quality comments make it to the top despite of the time of posting.
- Auto feedback system
- Application for comments from different platforms
Results
Cluster Ranking
Cluster 1
Cluster 3
Cluster 2
A2
A1
A4
A3
Linear Regression
- Attribute Ai is normalized to value ai, and wi is the associated weight.
- Training data set -
1. Clusters older than 10 days will have no
comment activity.
2. Use final comments on these clusters as popularity
measure and thus target values for scores.
3. The comments made till the time of cluster
generation form A4.
- Typical values of weights- [w1,w2,w3,w4]
Sectioning
Sections based on-
1. Latest time
2. Number of news sources
Sections sorted based on number of headlines and number of comments.
Sections chosen using hit and trial.
Example -
Updated in last 3 hours News sources reporting >= 5
Updated in last 7 hours News sources reporting >= 5
Updated in last 3 hours News sources reporting >= 3
Novelty
Cluster ranking never done by making use of comments.
More suitable for cluster ranking than comment ranking.
Factor of both number of news sources and news headlines covering that cluster are used
Article Ranking
Articles in a cluster
Factors used
1. Time at which article was written or last updated.
2. Normalized number of comments under article.
3. Time at which last comment was made.
Normalized done on -
News source + Category
Yo
deck
By Rajat Jain
deck
- 244