Title Text
Kishor Mohite
Rajat JainĀ
Sampath Kolachana
Outline
- Motivation
- Problem Statement
- System Architecture
- Research Demo
- Algorithms
- Novelty & Conclusion
- Future Work
Motivation
A Random Internet Article
Problem Statement
- Provide user with other news articles describing same event
- To aggregate opinions of user from various platform for same event
- To generate the event time-line for the various news events
- Summarise news event based on articles from same cluster
System Architecture
Demo
Algorithms
- Clustering News Articles
- Ranking Algorithms
- Summarisation
- Tag Generation
- Trending Tags
- Categorisation
Clustering Articles
Background
- 7 Indian News Sources
- 900-1000 Articles to Cluster
Requirements
- Incremental clustering
- Single pass algorithm
Proposed Approach
- Use news headlines Instead of Articles
- Similarity measure based clustering
Solution
- Stopword Removal
- Word Stemming
- Bag of Word Representation
- Distance computation
Results
- Gives better results with moderate headlines per cluster
- Limitations
- Noisy headlines in cluster
- If event gets large news coverage
- i.e, More headlines within short time span
Results
Ranking Algorithms
Article Summarization
Overview
- Our system generates summary for each of the news clusters generated.
- Generating summary(Abstraction) vs Extracting summary(Extraction).
- Problem: Identifying top-k sentences that summarize a news cluster or event.
- Multiple source summary generation vs Single source summary
Algorithm
- Stemming and stop words removal
- Extracting feature vector for each sentence
- Generating a complete graph
- Scoring each sentence based on distance from all other sentences
- Correction factor due to other headlines
Representative Tags Generation
Background
- Form clusters out of news articles
- Generate Representative Tags
Solution
- Use Part of Speech Tagger
- Bigram Tagger
- Unigram Tagger
- Backoff Tagger (Custom Context Free Grammer)
- Tokenize Headlines
- Choose everything that is,
- Proper Noun
- Proper Noun + Common Noun
- Noun + Verb
- Count occurance of the tags and choose 3 most frequent tags for headlines
Results & Uses
- Representative tags give idea about what cluster is representing
- Generated tags are also used for generating trends graph
- Trends graph shows the media coverage of particular term in given period
- How well an upcoming movie is being covered
- Tracking product release and people's opinions
Tags shown for a news cluster
Trends graph for tag 'Panama'
List of articles with same tag
Conclusion and Novelty
- Proposed a novel system for cross platform news exploration which aggregate news from various web sources.
- Proposed a novel news cluster ranking algorithm based on popularity prediction using comments.
- Proposed a novel news comments ranking algorithm which uses wilson score.
Copy of Copy of Cross Platform News Exploration Engine
By Rajat Jain
Copy of Copy of Cross Platform News Exploration Engine
- 218