Title Text

Kishor Mohite

Rajat Jain 

Sampath Kolachana

Outline

  • Motivation
  • Problem Statement
  • System Architecture
  • Research Demo
  • Algorithms
  • Novelty & Conclusion
  • Future Work

Motivation

A Random Internet Article

Movie review blog

A financial portal

A news article

People with interests

Want to know

  • What news is currently trending regarding the event discussed in the article or the event of interest?
     
  • What are the views of readers regarding those news?
     
  • What are the views of people on social networking sites, regarding the same?

Problem Statement

Text

System Architecture

Text

Demo

Algorithms

  • Clustering News Articles
  • Ranking Algorithms
  • Summarisation
  • Tag Generation
  • Trending Tags
  • Categorisation

Clustering Articles

Text

Ranking Algorithms

  1. Comment Ranking
  2. Cluster Ranking
  3. Article Ranking

Comment Ranking

Flawed Ranking Systems

Score = Upvotes - Downvotes
Score=UpvotesDownvotesScore = Upvotes - Downvotes
Score = Upvotes / Totalvotes
Score=Upvotes/TotalvotesScore = Upvotes / Totalvotes

Current Approach - Wilson Score

Wilson Score or precisely the lower bound of Wilson score confidence interval for a Bernoulli parameter is used.

 

With a chance of 85%, the real fraction of positive ratings will be equal to this value.

Wilson Score - Advantages

  • With a chance of 85%, the real fraction of positive ratings will be equal to this value.
     
  • Quality comments make it to the top despite of the time of posting.
     
  • Auto feedback system
     
  • Application for comments from different platforms

Results

Cluster Ranking

Cluster 1

Cluster 3

Cluster 2

A2

A1

A4

A3

Linear Regression

  • Attribute Ai is normalized to value ai, and wi is the associated weight.
     
  • Training data set -
     1. Clusters older than 10 days will have no
         comment activity.
     2. Use final comments on these clusters as popularity
         measure and thus target values for scores.
     3. The comments made till the time of cluster
         generation form A4.
     
  • Typical values of weights- [w1,w2,w3,w4]
Score = w_1*a_1 + w_2*a_2 + w_3*a_3 + w_4*a_4
Score=w1a1+w2a2+w3a3+w4a4Score = w_1*a_1 + w_2*a_2 + w_3*a_3 + w_4*a_4

Sectioning

Sections based on-
1. Latest time
2. Number of news sources

Sections sorted based on number of headlines and number of comments.

Sections chosen using hit and trial.


Example -

Updated in last 3 hours News sources reporting >= 5
Updated in last 7 hours News sources reporting >= 5
Updated in last 3 hours News sources reporting >= 3

Novelty

Cluster ranking never done by making use of comments.

 

More suitable for cluster ranking than comment ranking.

 

Factor of both number of news sources and news headlines covering that cluster are used

Article Ranking

Articles in a cluster

Factors used

1. Time at which article was written or last updated.

2. Normalized number of comments under article.

3. Time at which last comment was made.

 

Normalized done on -
                          News source + Category

Normalized Value = 0.5* (Actual Value/Average Value)
NormalizedValue=0.5(ActualValue/AverageValue)Normalized Value = 0.5* (Actual Value/Average Value)

Yo

deck

By Rajat Jain

deck

  • 244