Medical Information Retrieval
CSC 5710 "Design of Intelligent Information Retrieval Systems"
Saeid Balaneshin
saeid@wayne.edu
Wayne State University
Why is Medical Information Retrieval important?
Diagnosed by Blood Cancer / hour in US
Death by Blood Cancer / hour in US
Clinical Decision Support (CDS) Systems
- Assists clinicians, staff and patients by providing information to enhance health and health care.
-
CDS consists of a variety of tools such as:
- alerts and reminders to be used either by care providers or patients;
- clinical guidelines;
- condition-specific order sets;
- focused patient data reports and summaries;
- documentation templates;
- support in diagnosis process, and contextually relevant reference information
- ...
definition of CDS systems
Clinical Decision Support (CDS) Systems
- Improve Precision of Clinical Tasks
- Reduce Duration of Clinical Tasks
- Reduce Cost of Clinical Tasks
- ...
benefits of CDS systems
Medical information retrieval
-
One of tasks a CDS system could be designed to;
- Supports clinical decision-making;
-
Overcomes abundance of medical information;
- Deals with narrative and verbose queries instead of keyword-based queries;
- Favors precision over recall
-
a structured medical query and its corresponding relevant documents:
1. Diagnosis Description: A 26-year-old obese woman with a history of bipolar disorder complains that her recent struggles with her weight and eating have caused her to feel depressed. She states that she has recently had difficulty sleeping and feels excessively anxious and agitated. She also states that she has had thoughts of suicide. She often finds herself fidgety and unable to sit still for extended periods of time. Her family tells her that she is increasingly irritable. Her current medications include lithium carbonate and zolpidem.
Summary: 26-year-old obese woman with bipolar disorder, on zolpidem and lithium, with recent difficulty sleeping, agitation, suicidal ideation, and irritability.
Query Type
Query Description
Query Summary
relevant documents
-
Query Types:
- Diagnosis (What is the patient's diagnosis?)
- Test (What tests should the patient receive?)
- Treatment (How should the patient be treated?)
Medical information retrieval
an example of a medical query
Medical information retrieval
general-purpose vs. domain-specific search engines
- General-purpose search engine
- retrieve more types of literature
- higher recall
- domain-specific search engine
- retrieve documents mainly from Medline (22 million out of 25 million)
- higher precision
note: Medline: bibliographic database, concentrated on biomedicine, all of its records are indexed with U.S. National Library of Medicine (NLM®) Medical Subject Headings (MeSH®)
vs.
Traditional Medical information retrieval
PubMed citations come from:
- Medline indexed journals,
- journals/manuscripts deposited in PubMed Central (PMC®),
- National Center for Biotechnology Information (NCBI®) Bookshelf.
biomedical search engines
Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®
Traditional Medical information retrieval
MESH:
- a vocabulary that gives uniformity and consistency to the indexing and cataloging of biomedical literature,
- assists with subject indexing and subject searching.
MEDLINE Subject Indexing:
- determining a journal article (or other material) subject content,
- describing that content using a controlled vocabulary.
purpose: facilitating search retrieval by eliminating the use of variant terminology for the same concept
Article Title:
The role of coenzyme Q10 in heart failure.
Abstract:
OBJECTIVE: To review the clinical data demonstrating the safety and efficacy of coenzyme Q10 (CoQ10) in heart failure (HF).
DATA SOURCES: Pertinent literature was identified through MEDLINE (1966-January 2005) using the search terms coenzyme Q10, heart failure, antioxidants, and oxidative stress. Only articles written in the English language and evaluating human subjects were used.
DATA SYNTHESIS: HF impairs the ability of the heart to maintain its normal cardiac output. Following an initial insult, cardiac remodeling ensues, resulting in left ventricular dilation and hypertrophy. Oxidative stress is also increased, while CoQ10 levels are decreased in patients with HF. This has led to the hypothesis that CoQ10, an antioxidant, may decrease oxidative stress, impair remodeling, and improve cardiac function.
CONCLUSIONS: Large, well-designed studies on this topic are lacking. The limited data from well-designed trials indicate there may be some minor benefits with CoQ10 therapy in ejection fraction and end diastolic volume. CoQ10 therapy has been shown to be relatively safe with a low incidence of adverse effects.
Publication Types:
Review
MeSH Terms:
Antioxidants/therapeutic use*
Coenzymes
Heart Failure/drug therapy*
Heart Failure/pathology
Heart Failure/physiopathology
Humans
Oxidative Stress/drug effects
Ubiquinone/analogs & derivatives*
Ubiquinone/therapeutic use
Ventricular Remodeling/drug effects
Substances:
Antioxidants
Coenzymes
Ubiquinone
coenzyme Q10
an example of MeSH indexing
Traditional Medical information retrieval
MeSH indexing
- Asterisks indicate a major topic of article
- Substances include supplementary concept terms and MeSH terms, which are repeated above.
("hematologic neoplasms"[MeSH Terms]
OR
(
"hematologic"[All Fields]
AND
"neoplasms"[All Fields]
)
OR
"hematologic neoplasms"[All Fields]
OR
(
"blood"[All Fields]
AND
"cancer"[All Fields]
)
OR
"blood cancer"[All Fields]
)
AND
(
"leukocytes"[MeSH Terms]
OR
"leukocytes"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND
"cells"[All Fields]
)
OR
"white blood cells"[All Fields]
OR
"leukocyte count"[MeSH Terms]
OR
( "leukocyte"[All Fields]
AND
"count"[All Fields]
)
OR
"leukocyte count"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND "cells"[All Fields]
)
)
Query Translation
by PubMed
blood cancer white blood cells
Information Need
an example of query translation by PubMed
Traditional Medical information retrieval
Original Query
[Blood] Cancer patients who have neutropenia have a greater risk of infection.
Your risk increases when your white blood cell count gets low and stays low for a long time.
ref: http://www.chemotherapy.com/side_effects/neutropenia/
("hematologic neoplasms"[MeSH Terms]
OR
(
"hematologic"[All Fields]
AND
"neoplasms"[All Fields]
)
OR
"hematologic neoplasms"[All Fields]
OR
(
"blood"[All Fields]
AND
"cancer"[All Fields]
)
OR
"blood cancer"[All Fields]
)
AND
(
"leukocytes"[MeSH Terms]
OR
"leukocytes"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND
"cells"[All Fields]
)
OR
"white blood cells"[All Fields]
OR
"leukocyte count"[MeSH Terms]
OR
( "leukocyte"[All Fields]
AND
"count"[All Fields]
)
OR
"leukocyte count"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND "cells"[All Fields]
)
)
building query blocks in PubMed
Traditional Medical information retrieval
Boolean Operators: AND, OR, NOT can be used to combine query terms
Parentheses: can be used for nesting individual concepts
Asterisk (wildcard): can be used for truncation
Quotation Marks: can be used for phrase searching
Square Brackets: can be used to specify the search field tags
("hematologic neoplasms"[MeSH Terms]
OR
(
"hematologic"[All Fields]
AND
"neoplasms"[All Fields]
)
OR
"hematologic neoplasms"[All Fields]
OR
(
"blood"[All Fields]
AND
"cancer"[All Fields]
)
OR
"blood cancer"[All Fields]
)
AND
(
"leukocytes"[MeSH Terms]
OR
"leukocytes"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND
"cells"[All Fields]
)
OR
"white blood cells"[All Fields]
OR
"leukocyte count"[MeSH Terms]
OR
( "leukocyte"[All Fields]
AND
"count"[All Fields]
)
OR
"leukocyte count"[All Fields]
OR
(
"white"[All Fields]
AND
"blood"[All Fields]
AND "cells"[All Fields]
)
)
sorting results in PubMed
Traditional Medical information retrieval
- Most Recent
- Relevance
- Publication Date
- First Author
- Last Author
- Journal
- Title
Sort by Relevance: sorts the retrieved documents based on term frequency of query terms and mesh terms in these documents
drawbacks of PubMed
Traditional Medical information retrieval
1- Does not consider importance of concepts
In the query "blood cancer white blood cells", the concept "blood cancer" is important than the concept "white blood cells", but no weighting is considered as PubMed is a performing boolean search.
2- Not being able to deal with verbose free-text queries
A 26-year-old obese woman with a history of bipolar disorder complains that her recent struggles with her weight and eating have caused her to feel depressed. She states that she has recently had difficulty sleeping and feels excessively anxious and agitated. She also states that she has had thoughts of suicide. She often finds herself fidgety and unable to sit still for extended periods of time. Her family tells her that she is increasingly irritable. Her current medications include lithium carbonate and zolpidem.
For a verbose free-text query such as:
PubMed returns no result even by removing stop-words
drawbacks of PubMed
Traditional Medical information retrieval
3- Does not efficiently consider dependency between terms
PubMed considers exact phrase and individual terms of the concepts in query, but not proximity of them. In PubMed, exact phrase and terms of concepts are considered without weighting them.
4- Does not consider relevance feedback documents
Concepts only identified from the original query by using MESH knowledge base.
drawbacks of PubMed
Traditional Medical information retrieval
5- Does not consider concept semantics
For example, a concept with semantic meaning "Sign or Symptom" is considered the same way as a concept with semantic meaning "Social Behavior".
6- Does not consider concepts relationships
In PubMed, MESH Concepts are identified in the query, but no level of relationship between these concepts and other concepts is considered.
What knowledge bases other than MESH are used in Medical IR?
- Unified Medical Language System (UMLS)
- Systematized Nomenclature of Medicine--Clinical Terms (SNOMED-CT)
- International Classification of Diseases (ICD)
- ...
Unified Medical Language System (UMLS):
- the most comprehensive knowledge-base in medical domain
- contains three major components: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon.
Unified Medical Language System (UMLS)
UMLS Metathesaurus:
- Unifies different knowledge bases such as CPT®, ICD-10-CM, LOINC®, MeSH®, RxNorm, and SNOMED CT®
- Groups synonymous terms into concepts
- provides a list of concepts with their
- term representations
- definitions
- relationships
- ...
Unified Medical Language System (UMLS)
UMLS Semantic Network:
- Categorizes concepts by semantic types
- 113 semantic types (in 2015 version)
- can be used to reduce the complexity of the Metathesaurus and also for dimension reduction in Medical IR
Unified Medical Language System (UMLS)
Mapping biomedical text to the UMLS Metathesaurus:
- Identifying concepts drawn from a controlled vocabulary
- Different tools such as MetaMap and SemRep can be used.
- Performed in natural-language processing (NLP) and Medical IR applications
Unified Medical Language System (UMLS)
Mapping biomedical text to the UMLS Metathesaurus:
example
query: 26-year-old obese woman with bipolar disorder, on zolpidem and lithium, with recent difficulty sleeping, agitation, suicidal ideation, and irritability.
CUI | Concept Name | Concept Primary Name | Semantic Type |
---|---|---|---|
C0439508 | /year | per year | Temporal Concept |
C0580836 | Old | Old | Temporal Concept |
C0028754 | OBESE | Obesity | Disease or Syndrome |
C0043210 | WOMAN | Woman | Population Group |
C0005586 | Bipolar Disorder | Bipolar Disorder | Mental or Behavioral Dysfunction |
C0078839 | ZOLPIDEM | zolpidem | Organic Chemical,Pharmacologic Substance |
C0023870 | LITHIUM | Lithium | Element, Ion, or Isotope,Pharmacologic Substance |
C0332185 | Recent | Recent | Temporal Concept |
C0235162 | Difficulty sleeping | Difficulty sleeping | Sign or Symptom |
C0085631 | AGITATION | Agitation | Sign or Symptom |
C0424000 | Suicidal Ideation | Feeling suicidal (finding) | Finding |
C0022107 | IRRITABILITY | Irritable Mood | Finding |
MetaMap Results:
How knowledge bases can improve retrieval performance?
approach 1
How knowledge bases can improve retrieval performance?
approach 2
Conclusions
-
Why a domain-specific IR system is required for medical applications is discussed.
-
Traditional Medical IR systems are introduced.
-
PubMed as the most popular Medical IR system is discussed in detail.
-
Drawbacks of PubMed is presented.
-
Medical Knowledge-bases are introduced.
-
UMLS as the most comprehensive medical Knowledge-bases is discussed.
-
Finally, how UMLS can be used in medical IR system is discussed. Two corresponding approaches are discussed.
medical_information_retrieval
By Saeid Balaneshin Kordan
medical_information_retrieval
medical information retrieval
- 1,433