A Comprehensive Study on
Medical Information Retrieval Systems
PhD Qualifying Exam
Saeid Balaneshinkordan
advisor:
Dr. Alexander Kotov
March 2016
Clinical Decision Support (CDS) Systems
- Improve Precision of Clinical Tasks
- Reduce Duration of Clinical Tasks
- Reduce Cost of Clinical Tasks
- ...
benefits of CDS systems
Medical information retrieval
-
One of tasks a CDS system could be designed to;
- Supports clinical decision-making;
-
Overcomes abundance of medical information;
- Deals with narrative and verbose queries instead of keyword-based queries;
- Favors precision over recall
-
a structured medical query and its corresponding relevant documents:
1. Diagnosis Description: A 26-year-old obese woman with a history of bipolar disorder complains that her recent struggles with her weight and eating have caused her to feel depressed. She states that she has recently had difficulty sleeping and feels excessively anxious and agitated. She also states that she has had thoughts of suicide. She often finds herself fidgety and unable to sit still for extended periods of time. Her family tells her that she is increasingly irritable. Her current medications include lithium carbonate and zolpidem.
Summary: 26-year-old obese woman with bipolar disorder, on zolpidem and lithium, with recent difficulty sleeping, agitation, suicidal ideation, and irritability.
Query Type
Query Description
Query Summary
relevant documents
-
Query Types:
- Diagnosis (What is the patient's diagnosis?)
- Test (What tests should the patient receive?)
- Treatment (How should the patient be treated?)
Medical information retrieval
an example of a medical query
Medical information retrieval
general-purpose vs. domain-specific search engines
- General-purpose search engine
- Retrieve more types of literature
- Higher recall
- Domain-specific search engine
- Retrieve documents mainly from Medline (22 million out of 25 million)
- Higher precision
note: Medline: bibliographic database, concentrated on biomedicine, all of its records are indexed with U.S. National Library of Medicine (NLM®) Medical Subject Headings (MeSH®)
Traditional Medical information retrieval
Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®
MESH:
- a vocabulary that gives uniformity and consistency to the indexing and cataloging of biomedical literature,
- assists with subject indexing and subject searching.
MEDLINE Subject Indexing:
- determining a journal article (or other material) subject content,
- describing that content using a controlled vocabulary.
purpose: facilitating search retrieval by eliminating the use of variant terminology for the same concept
Article Title:
The role of coenzyme Q10 in heart failure.
Abstract:
OBJECTIVE: To review the clinical data demonstrating the safety and efficacy of coenzyme Q10 (CoQ10) in heart failure (HF).
DATA SOURCES: Pertinent literature was identified through MEDLINE (1966-January 2005) using the search terms coenzyme Q10, heart failure, antioxidants, and oxidative stress. Only articles written in the English language and evaluating human subjects were used.
DATA SYNTHESIS: HF impairs the ability of the heart to maintain its normal cardiac output. Following an initial insult, cardiac remodeling ensues, resulting in left ventricular dilation and hypertrophy. Oxidative stress is also increased, while CoQ10 levels are decreased in patients with HF. This has led to the hypothesis that CoQ10, an antioxidant, may decrease oxidative stress, impair remodeling, and improve cardiac function.
CONCLUSIONS: Large, well-designed studies on this topic are lacking. The limited data from well-designed trials indicate there may be some minor benefits with CoQ10 therapy in ejection fraction and end diastolic volume. CoQ10 therapy has been shown to be relatively safe with a low incidence of adverse effects.
Publication Types:
Review
MeSH Terms:
Antioxidants/therapeutic use*
Coenzymes
Heart Failure/drug therapy*
Heart Failure/pathology
Heart Failure/physiopathology
Humans
Oxidative Stress/drug effects
Ubiquinone/analogs & derivatives*
Ubiquinone/therapeutic use
Ventricular Remodeling/drug effects
Substances:
Antioxidants
Coenzymes
Ubiquinone
coenzyme Q10
Traditional Medical information retrieval
an example of MeSH indexing
MeSH indexing
- Asterisks indicate a major topic of article
- Substances include supplementary concept terms and MeSH terms, which are repeated above.
("hematologic neoplasms"[MeSH Terms]
OR
(
"hematologic"[All Fields]
AND
"neoplasms"[All Fields]
)
OR
"hematologic neoplasms"[All Fields]
OR
(
"blood"[All Fields]
AND
"cancer"[All Fields]
)
OR
"blood cancer"[All Fields]
)
AND
(
"leukocytes"[MeSH Terms]
OR
"leukocytes"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND
"cells"[All Fields]
)
OR
"white blood cells"[All Fields]
OR
"leukocyte count"[MeSH Terms]
OR
( "leukocyte"[All Fields]
AND
"count"[All Fields]
)
OR
"leukocyte count"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND "cells"[All Fields]
)
)
Query Translation
by PubMed
blood cancer white blood cells
Information Need
Traditional Medical information retrieval
an example of query translation by PubMed
Original Query
[Blood] Cancer patients who have neutropenia have a greater risk of infection.
Your risk increases when your white blood cell count gets low and stays low for a long time.
ref: http://www.chemotherapy.com/side_effects/neutropenia/
Boolean Operators: AND, OR, NOT can be used to combine query terms
Parentheses: can be used for nesting individual concepts
Asterisk (wildcard): can be used for truncation
Quotation Marks: can be used for phrase searching
Square Brackets: can be used to specify the search field tags
("hematologic neoplasms"[MeSH Terms]
OR
(
"hematologic"[All Fields]
AND
"neoplasms"[All Fields]
)
OR
"hematologic neoplasms"[All Fields]
OR
(
"blood"[All Fields]
AND
"cancer"[All Fields]
)
OR
"blood cancer"[All Fields]
)
AND
(
"leukocytes"[MeSH Terms]
OR
"leukocytes"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND
"cells"[All Fields]
)
OR
"white blood cells"[All Fields]
OR
"leukocyte count"[MeSH Terms]
OR
( "leukocyte"[All Fields]
AND
"count"[All Fields]
)
OR
"leukocyte count"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND "cells"[All Fields]
)
)
Traditional Medical information retrieval
building query blocks in PubMed
Most Recent
Relevance
Publication Date
First Author
Last Author
Journal
Title
Sort by Relevance: sorts the retrieved documents based on term frequency of query terms and mesh terms in these documents
("hematologic neoplasms"[MeSH Terms]
OR
(
"hematologic"[All Fields]
AND
"neoplasms"[All Fields]
)
OR
"hematologic neoplasms"[All Fields]
OR
(
"blood"[All Fields]
AND
"cancer"[All Fields]
)
OR
"blood cancer"[All Fields]
)
AND
(
"leukocytes"[MeSH Terms]
OR
"leukocytes"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND
"cells"[All Fields]
)
OR
"white blood cells"[All Fields]
OR
"leukocyte count"[MeSH Terms]
OR
( "leukocyte"[All Fields]
AND
"count"[All Fields]
)
OR
"leukocyte count"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND "cells"[All Fields]
)
)
Traditional Medical information retrieval
sorting results in PubMed
1- Does not consider importance of concepts
In the query "blood cancer white blood cells", the concept "blood cancer" is important than the concept "white blood cells", but no weighting is considered as PubMed is a performing Boolean search.
Traditional Medical information retrieval
drawbacks of PubMed
2- Does not able to deal with verbose free-text queries
A 26-year-old obese woman with a history of bipolar disorder complains that her recent struggles with her weight and eating have caused her to feel depressed. She states that she has recently had difficulty sleeping and feels excessively anxious and agitated. She also states that she has had thoughts of suicide. She often finds herself fidgety and unable to sit still for extended periods of time. Her family tells her that she is increasingly irritable. Her current medications include lithium carbonate and zolpidem.
For a verbose free-text query such as:
PubMed returns no result even by removing stop-words
Traditional Medical information retrieval
drawbacks of PubMed
3- Does not efficiently consider dependency between terms
PubMed considers exact phrase and individual terms of the concepts in query, but not proximity of them. In PubMed, exact phrase and terms of concepts are considered without weighting them.
4- Does not consider relevance feedback documents
Concepts only identified from the original query by using MESH knowledge base.
Traditional Medical information retrieval
drawbacks of PubMed
5- Does not consider concept semantics
For example, a concept with semantic meaning "Sign or Symptom" is considered the same way as a concept with semantic meaning "Social Behavior".
6- Does not consider concepts relationships
In PubMed, MESH Concepts are identified in the query, but no level of relationship between these concepts and other concepts is considered.
What knowledge bases other than MESH are used in Medical IR?
- Unified Medical Language System (UMLS)
- Systematized Nomenclature of Medicine--Clinical Terms (SNOMED-CT)
- International Classification of Diseases (ICD)
- ...
Unified Medical Language System (UMLS)
Unified Medical Language System (UMLS):
- the most comprehensive knowledge-base in medical domain
- contains three major components: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon.
UMLS Metathesaurus
- Unifies different knowledge bases such as CPT®, ICD-10-CM, LOINC®, MeSH®, RxNorm, and SNOMED CT®
- Groups synonymous terms into concepts
- provides a list of concepts with their
- term representations
- definitions
- relationships
- ...
UMLS Semantic Network
- Categorizes concepts by semantic types
- 113 semantic types (in 2015 version)
- Can be used to reduce the complexity of the Metathesaurus and also for dimension reduction in Medical IR
Semantic types "appropriate" for different medical tasks
ref: Limsopatham, Nut, Craig Macdonald, and Iadh Ounis. "Inferring conceptual relationships to improve medical records search." In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pp. 1-8. , 2013.
Mapping biomedical text to the UMLS Metathesaurus
- Identifying concepts drawn from a controlled vocabulary
- Different tools such as MetaMap and SemRep can be used.
- Performed in natural-language processing (NLP) and Medical IR applications
Mapping biomedical text to the UMLS Metathesaurus
Example
query: 26-year-old obese woman with bipolar disorder, on zolpidem and lithium, with recent difficulty sleeping, agitation, suicidal ideation, and irritability.
CUI | Concept Name | Concept Primary Name | Semantic Type |
---|---|---|---|
C0439508 | /year | per year | Temporal Concept |
C0580836 | Old | Old | Temporal Concept |
C0028754 | OBESE | Obesity | Disease or Syndrome |
C0043210 | WOMAN | Woman | Population Group |
C0005586 | Bipolar Disorder | Bipolar Disorder | Mental or Behavioral Dysfunction |
C0078839 | ZOLPIDEM | zolpidem | Organic Chemical,Pharmacologic Substance |
C0023870 | LITHIUM | Lithium | Element, Ion, or Isotope,Pharmacologic Substance |
C0332185 | Recent | Recent | Temporal Concept |
C0235162 | Difficulty sleeping | Difficulty sleeping | Sign or Symptom |
C0085631 | AGITATION | Agitation | Sign or Symptom |
C0424000 | Suicidal Ideation | Feeling suicidal (finding) | Finding |
C0022107 | IRRITABILITY | Irritable Mood | Finding |
MetaMap Results:
Concept Selection by using Semantic Types
Example
query: 26-year-old obese woman with bipolar disorder, on zolpidem and lithium, with recent difficulty sleeping, agitation, suicidal ideation, and irritability.
assume list of selected semantic types for this specific task:
Disease or Syndrome |
Mental or Behavioral Dysfunction |
Sign or Symptom |
CUI | Concept Name | Concept Primary Name | Semantic Type |
---|---|---|---|
C0028754 | OBESE | Obesity | Disease or Syndrome |
C0005586 | Bipolar Disorder | Bipolar Disorder | Mental or Behavioral Dysfunction |
C0235162 | Difficulty sleeping | Difficulty sleeping | Sign or Symptom |
C0085631 | AGITATION | Agitation | Sign or Symptom |
List of selected concept:
Concept-based Medical Retrieval Approach
Approach 1
Identify concepts from all the documents in the collection
Identify concepts from the queries
Using statistics of concepts in the documents in collection and in the query and find similarity of the documents to the query
sort documents in the collection
bag of words
bag of concepts
Concept-based Medical Retrieval Approach
Approach 1 - Example
ref: Wang, Chunye, and Ramakrishna Akella. "Concept-based relevance models for medical and semantic information retrieval." In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 173-182. ACM, 2015.
Concept-based Medical Retrieval Approach
Approach 2
Identify concepts from the queries
Use Sequential Dependence Model (SDM) model to get statistics of the query concepts in the collection
Use statistics of concepts in the documents in collection and in the query and find similarity of the documents to the query
sort documents in the collection
ref: Choi, Sungbin, Jinwook Choi, Sooyoung Yoo, Heechun Kim, and Youngho Lee. "Semantic concept-enriched dependence model for medical information retrieval." Journal of biomedical informatics 47 (2014): 18-27.
Indri Query Language:
Ranking function:
SDM assumption: dependency between adjacent query terms.
concept example: elderly patients ventilator associated pneumonia
weight parameters for single terms, ordered phrases and unordered phrases
Use Sequential Dependence Model (SDM) model to get statistics of the query concepts in the collection
Collection
External Sources
Top-ranked Documents
Knowledge-base
Relationship Tables
MetaMap
Direct Identification
Query
Concept Sources:
Identification Methods:
Concept Sources and Concept Identification
for Medical Query Expansion
Ranking function:
concept weight:
Matching function:
linear weighted combination of importance features
log of the language modeling estimate for concept κ with Dirichlet smoothing
linear weighted combination of matches in document D of all concepts types in T
Parameterized Concept Weighting
Concept importance features
Parameterized Concept Weighting
The probability of P being health-related over all the Wikipedia pages:
P: Wikipedia page corresponding to the concept
ref: Soldaini, Luca, Arman Cohan, Andrew Yates, Nazli Goharian, and Ophir Frieder. "Retrieving medical literature for clinical decision support." In Advances in Information Retrieval, pp. 538-549. Springer International Publishing, 2015.
General-purpose knowledge-bases
for query expansion
Minimizing at one direction at a time
Coordinate Ascent
Multivariable minimization optimization problem
univariate optimization problem
univariate optimization problem
univariate optimization problem
Optimization Techniques
for Medical query expansion
-
Medical-domain-specific IR systems have in general higher precision and less recall.
-
Traditional Medical IR systems have a number of drawbacks despite of their popularity.
-
UMLS as the most comprehensive medical Knowledge-bases is discussed.
-
Relationship between concepts is a source of concept expansion.
-
UMLS semantic types can be used to filter out redundant concepts
-
Optimization techniques like coordinate ascent should be used in finding the optimal weights of concepts
Conclusions
Thank You!
medical_information_retrieval_a_comprehensive_overview_shorted
By Saeid Balaneshin Kordan
medical_information_retrieval_a_comprehensive_overview_shorted
medical information retrieval
- 1,155