1. Dataset acquisition: GENIA corpus
2. Baseline using TF-IDF: Assumptions
3. Number of terms to be extracted from scored and ranked candidates: function of corpus size and number of annotated domain rel. terms?
4. Converted from smaller version of dataset to bigger, original one: observations
5. Tentative PoA