Tasks done in the current week

1. Papers read

2. Tentative workflow designed

3. Code

Domain Relevant Term Extraction

Papers Read

Review Literature:

Terminology extraction: an analysis of linguistic and statistical approaches

An overview of graph-based keyword extraction methods and approaches

Term extraction: A Review

Automatic Keyphrase Extraction: A Survey of the State of the Art

Statistical Approaches:


  1. Domain-Specific Term Extraction and Its Application in Text Classification

  2. Term extraction using non-technical corpora as a point of leverage

  3. An Unsupervised Approach to Domain-Specific Term Extraction


Linguistic/Graph-based:

Proposed pipeline, targets achieved

  1. Removal of stopwords and other unnecessary characters like punctuation (DONE)

  2. POS-tagging to get only nouns, verbs and adjectives. (DONE)

  3. Keeping remaining words as domain-relevant labels, removing any irrelevant words from them manually. This will become the label set for evaluation. (IN PROGRESS)

  4. Actual domain term extraction, analysis of approaches.

  5. Evaluation, further work/corrections

Future Work

1. Exploring approaches - graph-based, statistics and linguistics-based, aspect-based

2. Evaluation metrics (?)

Miscellaneous/Observations

deck

By Anjali Bhavan

deck

  • 462