Automatic Text Summarization

Luis Manuel Román García

ITAM

Contents

  • Problem description
  • Extractive/Abstractive text summarization
  • Progress

Problem Description

Problem Description

Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning

Problem Description

Automatic text summarization is very challenging, because when we as humans summarize a piece of text, we usually read it entirely to develop our understanding, and then write a summary highlighting its main points. 

Problem Description

Since computers lack human knowledge and language capability, it makes automatic text summarization a very difficult and non-trivial task

Problem Description

In general, there are two different approaches for automatic summarization: extraction and abstraction

Problem Description

Extractive summarization methods work by identifying important sections of the text and generating them verbatim; thus, they depend only on extraction of sentences from the original text.

Problem Description

abstractive summarization methods aim at producing important material in a new way. In other words, they interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text

Abstractive Summarization

Extractive/AbstractiveText Summarization

Summarization Methods

  • Intermediate representations
    • Topic representation
      • Word frequency
      • Latent semantic analysis
    • Indicator representation
      • Sentence length
      • Position in text
      • Presence of NER

Extractive Methods

Summarization Methods

  • Complex NLP models
    • Bayesian networks
    • Pointer networks
    • Sequence to sequence

Abstractive Methods

Progress

Progress

  • Multiple text, multiple source approach

MODULS

  • Advanced location identifier
  • NER 
  • Event desambiguation
  • Semantic similarity
{'geometry': 
{'type': 'Point', 
'coordinates': [-102.29171, 21.866713]}, 
'type': 'Feature', 
'properties': {
'box': 80, 
'doc_type': 
'news', 
'dates': ['2017-11-01 03:56:22', '2017-11-01 00:00:00', '2017-11-03 00:00:00'], 
'url': [u'http://www.noticieroelcirco.mx/policias-municipales-impidieron-que-se-suicidara-un-joven-en-aguascalientes/', u'http://www.hidrocalidodigital.com/local/articulo.php?idnota=132067', u'http://www.noticieroelcirco.mx/ya-son-126-suicidios-en-el-ano-en-aguascalientes-hombre-se-ahorco-en-su-casa/'], 
'sources': [u'Noticiero el Circo', u'Hidrocalido Digital', u'Noticiero el Circo'], 
'titles': [u'\xa1Polic\xedas municipales impidieron que se suicidara un joven en Aguascalientes! \u2013 Noticiero El Circo', 
u'Se supera la cifra hist\ufffdrica de suicidios en Aguascalientes', 
u'\xa1Ya son 126 suicidios en el a\xf1o en Aguascalientes: hombre se ahorc\xf3 en su casa! \u2013 Noticiero El Circo'], 
'words': ['circo', 'hombre', 'ahorco', 'noticiero', 'suicidios', 'ano', 'supera', 'impidieron', 'histrica', 'cifra', 'policias', 'municipales', 'suicidara', 'joven']}}

State of the App

 

Text summarization

By Luis Roman

Text summarization

  • 1,354