Title Text

Kishor Mohite

Rajat JainĀ 

Sampath Kolachana

Outline

  • Motivation
  • Problem Statement
  • System Architecture
  • Research Demo
  • Algorithms
  • Novelty & Conclusion
  • Future Work

Motivation

A Random Internet Article

Problem Statement

  • Provide user with other news articles describing same event
  • To aggregate opinions of user from various platform for same event
  • To generate the event time-line for the various news events
  • Summarise news event based on articles from same cluster

System Architecture

Demo

Algorithms

  • Clustering News Articles
  • Ranking Algorithms
  • Summarisation
  • Tag Generation
  • Trending Tags
  • Categorisation

Clustering Articles

Background

  • 7 Indian News Sources
  • 900-1000 Articles to Cluster

Requirements

  • Incremental clustering
  • Single pass algorithm

Proposed Approach

  • Use news headlines Instead of Articles
  • Similarity measure based clustering

Solution

  • Stopword Removal
  • Word Stemming
  • Bag of Word Representation
  • Distance computation

Results

  • Gives better results with moderate headlines per cluster
  • Limitations
    • Noisy headlines in cluster
    • If event gets large news coverage
    • i.e, More headlines within short time span

Results

Ranking Algorithms

Article Summarization

Overview

  • Our system generates summary for each of the news clusters generated.
  • Generating summary(Abstraction) vs Extracting summary(Extraction).
  • Problem: Identifying top-k sentences that summarize a news cluster or event.
  • Multiple source summary generation vs Single source summary

Algorithm

  • Stemming and stop words removal
  • Extracting feature vector for each sentence
  • Generating a complete graph
  • Scoring each sentence based on distance from all other sentences
  • Correction factor due to other headlines

Representative Tags Generation

Background

  • Form clusters out of news articles
  • Generate Representative Tags

Solution

  • Use Part of Speech Tagger
    • Bigram Tagger
    • Unigram Tagger
    • Backoff Tagger (Custom Context Free Grammer)
  • Tokenize Headlines
  • Choose everything that is,
    • Proper Noun
    • Proper Noun + Common Noun
    • Noun + Verb
  • Count occurance of the tags and choose 3 most frequent tags for headlines

Results & Uses

  • Representative tags give idea about what cluster is representing
  • Generated tags are also used for generating trends graph
  • Trends graph shows the media coverage of particular term in given period
    • How well an upcoming movie is being covered
    • Tracking product release and people's opinions

Tags shown for a news cluster

Trends graph for tag 'Panama'

List of articles with same tag

Conclusion and Novelty

  • Proposed a novel system for cross platform news exploration which aggregate news from various web sources.
  • Proposed a novel news cluster ranking algorithm based on popularity prediction using comments.
  • Proposed a novel news comments ranking algorithm which uses wilson score.

Copy of Copy of Cross Platform News Exploration Engine

By Rajat Jain

Copy of Copy of Cross Platform News Exploration Engine

  • 218