deck

Tasks done in the current week

1. Papers read

2. Tentative workflow designed

3. Code

Domain Relevant Term Extraction

Papers Read

Review Literature:

Terminology extraction: an analysis of linguistic and statistical approaches

An overview of graph-based keyword extraction methods and approaches

Term extraction: A Review

Automatic Keyphrase Extraction: A Survey of the State of the Art

Statistical Approaches:

Linguistic/Graph-based:

Proposed pipeline, targets achieved

Removal of stopwords and other unnecessary characters like punctuation (DONE)
POS-tagging to get only nouns, verbs and adjectives. (DONE)
Keeping remaining words as domain-relevant labels, removing any irrelevant words from them manually. This will become the label set for evaluation. (IN PROGRESS)
Actual domain term extraction, analysis of approaches.
Evaluation, further work/corrections

Future Work

1. Exploring approaches - graph-based, statistics and linguistics-based, aspect-based

2. Evaluation metrics (?)

Miscellaneous/Observations