Text summarization
“ Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document”
WIKIPEDIA.COM
APPLICATIONS
newspaper
time saving
Approachs
Cosine Similarity
Text Rank
K-Mean Clustering
Credits: Paek Jeongyeup
Credits: Paek Jeongyeup
Credits: Paek Jeongyeup
Credits: Paek Jeongyeup
How to choose relevant sentences
based on distance to cluster centroid
with 3 clusters A, B, C
Evaluation
ROUGE is a score of overlapping words. ROUGE-N refers to overlapping n-grams
an n-gram is a contiguous sequence of n items from a given sample of text or speech.
RECALL
PRECISION
- ROUGE-n recall=40% means that 40% of the n-grams in the reference summary are also present in the generated summary.
- ROUGE-n precision=40% means that 40% of the n-grams in the generated summary are also present in the reference summary.
- ROUGE-n F1-score=40% is more difficult to interpret, like any F1-score.
Lin, Chin-Yew. "Rouge: A package for automatic evaluation of summaries."
Results
Results
Results
Results
K-mean clustering
10 Clusters
input text: "Artificial Intelligence"
most relevants word per cluster (stemmed)
GUI
Improvement axis
-
Acronym replace
-
Abstractive approach
-
Topic-based
-
Different input data
Source code
Thank you for listening
Any Questions ?
Text summarization
By googo
Text summarization
- 681