Domain-relevant term extraction

Tasks done in the current week

1. Papers read

2. Tentative workflow designed

3. Code

Papers Read

Review Literature:

  • Terminology extraction: an analysis of linguistic and statistical approaches
  • An overview of graph-based keyword extraction methods and approaches
  • Term extraction: A Review
  • Automatic Keyphrase Extraction: A Survey of the State of the Art

Statistical Approaches:

  • Domain-Specific Term Extraction and Its Application in Text Classification
  • Term extraction using non-technical corpora as a point of leverage
  • An Unsupervised Approach to Domain-Specific Term Extraction

Linguistic/Semantics-based approaches:

  • TextRank:Bringing Order into Texts
  • SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs 

Proposed Pipeline/Targets Achieved

  • Removal of stopwords and other unnecessary characters like punctuation (DONE)
  • POS-tagging to get only nouns, verbs and adjectives. (DONE)
  • Keeping remaining words as domain-relevant labels, removing any irrelevant words from them manually. This will become the label set for evaluation. (IN PROGRESS)
  • Actual domain term extraction, analysis of approaches.
  • Evaluation, further work/corrections

Future Work

Exploring approaches - graph-based, statistics and linguistics-based, aspect-based

 

termex

By Anjali Bhavan

termex

  • 441