Domain-relevant term extraction
Tasks done in the current week
1. Papers read
2. Tentative workflow designed
3. Code
Papers Read
Review Literature:
- Terminology extraction: an analysis of linguistic and statistical approaches
- An overview of graph-based keyword extraction methods and approaches
- Term extraction: A Review
- Automatic Keyphrase Extraction: A Survey of the State of the Art
Statistical Approaches:
- Domain-Specific Term Extraction and Its Application in Text Classification
- Term extraction using non-technical corpora as a point of leverage
- An Unsupervised Approach to Domain-Specific Term Extraction
Linguistic/Semantics-based approaches:
- TextRank:Bringing Order into Texts
- SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs
Proposed Pipeline/Targets Achieved
- Removal of stopwords and other unnecessary characters like punctuation (DONE)
- POS-tagging to get only nouns, verbs and adjectives. (DONE)
- Keeping remaining words as domain-relevant labels, removing any irrelevant words from them manually. This will become the label set for evaluation. (IN PROGRESS)
- Actual domain term extraction, analysis of approaches.
- Evaluation, further work/corrections
Future Work
Exploring approaches - graph-based, statistics and linguistics-based, aspect-based
termex
By Anjali Bhavan
termex
- 457