Text summarization

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document”

WIKIPEDIA.COM

APPLICATIONS

newspaper

time saving

Approachs

Cosine Similarity

Text Rank

K-Mean Clustering

Credits: Paek Jeongyeup

 

Credits: Paek Jeongyeup

 

Credits: Paek Jeongyeup

 

Credits: Paek Jeongyeup

 

How to choose relevant sentences

based on distance to cluster centroid

with 3 clusters A, B, C

Evaluation

ROUGE is a score of overlapping words. ROUGE-N refers to overlapping n-grams

an n-gram is a contiguous sequence of n items from a given sample of text or speech.

RECALL

PRECISION

  • ROUGE-n recall=40% means that 40% of the n-grams in the reference summary are also present in the generated summary.
  • ROUGE-n precision=40% means that 40% of the n-grams in the generated summary are also present in the reference summary.
  • ROUGE-n F1-score=40% is more difficult to interpret, like any F1-score.
 

Lin, Chin-Yew. "Rouge: A package for automatic evaluation of summaries."

Results

Results

Results

Results

K-mean clustering

10 Clusters

input text: "Artificial Intelligence"

most relevants word per cluster (stemmed)

GUI

Improvement axis

  • Acronym replace

  • Abstractive approach

  • Topic-based

  • Different input data

Source code

Thank you for listening 

Any Questions ?

Made with Slides.com