Domain-relevant term extraction

Tasks done in the current week

1. Papers read

2. Tentative workflow designed

3. Code

Papers Read

Review Literature:

Terminology extraction: an analysis of linguistic and statistical approaches
An overview of graph-based keyword extraction methods and approaches
Term extraction: A Review
Automatic Keyphrase Extraction: A Survey of the State of the Art

Statistical Approaches:

Domain-Specific Term Extraction and Its Application in Text Classification
Term extraction using non-technical corpora as a point of leverage
An Unsupervised Approach to Domain-Specific Term Extraction

Linguistic/Semantics-based approaches:

TextRank:Bringing Order into Texts
SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs

Proposed Pipeline/Targets Achieved

Removal of stopwords and other unnecessary characters like punctuation (DONE)
POS-tagging to get only nouns, verbs and adjectives. (DONE)
Keeping remaining words as domain-relevant labels, removing any irrelevant words from them manually. This will become the label set for evaluation. (IN PROGRESS)
Actual domain term extraction, analysis of approaches.
Evaluation, further work/corrections

Future Work

Exploring approaches - graph-based, statistics and linguistics-based, aspect-based

termex

By Anjali Bhavan

termex

6 years ago
525

Anjali Bhavan