Tasks done in the current week
1. Papers read
2. Tentative workflow designed
3. Code
Domain Relevant Term Extraction
Papers Read
Review Literature:
Terminology extraction: an analysis of linguistic and statistical approaches
An overview of graph-based keyword extraction methods and approaches
Term extraction: A Review
Automatic Keyphrase Extraction: A Survey of the State of the Art
Statistical Approaches:
Domain-Specific Term Extraction and Its Application in Text Classification
Term extraction using non-technical corpora as a point of leverage
An Unsupervised Approach to Domain-Specific Term Extraction
Proposed pipeline, targets achieved
Removal of stopwords and other unnecessary characters like punctuation (DONE)
POS-tagging to get only nouns, verbs and adjectives. (DONE)
Keeping remaining words as domain-relevant labels, removing any irrelevant words from them manually. This will become the label set for evaluation. (IN PROGRESS)
Actual domain term extraction, analysis of approaches.
Evaluation, further work/corrections
Future Work
1. Exploring approaches - graph-based, statistics and linguistics-based, aspect-based
2. Evaluation metrics (?)
By Anjali Bhavan
- 486