Medical Information Retrieval

CSC 5710 "Design of Intelligent Information Retrieval Systems"

Saeid Balaneshin

saeid@wayne.edu

Wayne State University

Why is Medical Information Retrieval important?

Diagnosed by Blood Cancer / hour in US

Death by Blood Cancer / hour in US

Clinical Decision Support (CDS) Systems

  • Assists clinicians, staff and patients by providing  information to enhance health and health care.
  • CDS consists of a variety of tools such as: 
    • alerts and reminders to be used either by care providers or patients;
    • clinical guidelines;
    • condition-specific order sets;
    • focused patient data reports and summaries;
    • documentation templates;
    • support in diagnosis process, and contextually relevant reference information
    • ...

definition of CDS systems

Clinical Decision Support (CDS) Systems

  • Improve Precision of Clinical Tasks
  • Reduce Duration of Clinical Tasks
  • Reduce Cost of Clinical Tasks
  • ...

benefits of CDS systems

Medical information retrieval

  • One of tasks a CDS system could be designed to;

  • Supports clinical decision-making;
  • Overcomes abundance of medical information; 

  • Deals with narrative and verbose queries instead of keyword-based queries;
  • Favors precision over recall
  • a structured medical query and its corresponding relevant documents:
    1. Diagnosis De­scrip­tion: A 26-year-old obese woman with a his­tory of bipo­lar dis­or­der com­plains that her re­cent strug­gles with her weight and eat­ing have caused her to feel de­pressed. She states that she has re­cently had dif­fi­culty sleep­ing and feels ex­ces­sively anx­ious and ag­i­tated. She also states that she has had thoughts of sui­cide. She often finds her­self fid­gety and un­able to sit still for ex­tended pe­ri­ods of time. Her fam­ily tells her that she is in­creas­ingly ir­ri­ta­ble. Her cur­rent med­ica­tions in­clude lithium car­bon­ate and zolpi­dem.
    Sum­mary: 26-year-old obese woman with bipo­lar dis­or­der, on zolpi­dem and lithium, with re­cent dif­fi­culty sleep­ing, ag­i­ta­tion, sui­ci­dal ideation, and ir­ri­tabil­ity.

Query Type

Query Description

Query Summary

relevant documents

  • Query Types:
    • ​Diagnosis (What is the patient's diagnosis?)
    • Test (What tests should the patient receive?)
    • Treatment (How should the patient be treated?)

Medical information retrieval

an example of a medical query

Medical information retrieval

general-purpose vs. domain-specific search engines

  • General-purpose search engine
  • retrieve more types of literature
  • higher recall
  • domain-specific search engine
  • retrieve documents mainly from Medline (22 million out of 25 million)
  • higher precision

note: Medlinebibliographic database, concentrated on biomedicine,  all of its records are indexed with U.S. National Library of Medicine (NLM®) Medical Subject Headings (MeSH®)

vs.

Traditional Medical information retrieval

PubMed citations come from:

  1. Medline indexed journals,
  2. journals/manuscripts deposited in PubMed Central (PMC®), 
  3. National Center for Biotechnology Information (NCBI®) Bookshelf.

Two main biomedical search engines provided by U.S. National Library of Medicine (NLM®):

PubMed and Entrez

biomedical search engines

Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®

Traditional Medical information retrieval

MESH:

  1. a vocabulary that gives uniformity and consistency to the indexing and cataloging of biomedical literature,
  2. assists with subject indexing and subject searching.

MEDLINE Subject Indexing:

  1. determining a journal article (or other material) subject content,
  2. describing that content using a controlled vocabulary. 

purpose: facilitating search retrieval by eliminating the use of variant terminology for the same concept

Article Title:
The role of coenzyme Q10 in heart failure.

Abstract:
OBJECTIVE: To review the clinical data demonstrating the safety and efficacy of coenzyme Q10 (CoQ10) in heart failure (HF).

DATA SOURCES: Pertinent literature was identified through MEDLINE (1966-January 2005) using the search terms coenzyme Q10, heart failure, antioxidants, and oxidative stress. Only articles written in the English language and evaluating human subjects were used.

DATA SYNTHESIS: HF impairs the ability of the heart to maintain its normal cardiac output. Following an initial insult, cardiac remodeling ensues, resulting in left ventricular dilation and hypertrophy. Oxidative stress is also increased, while CoQ10 levels are decreased in patients with HF. This has led to the hypothesis that CoQ10, an antioxidant, may decrease oxidative stress, impair remodeling, and improve cardiac function.

CONCLUSIONS: Large, well-designed studies on this topic are lacking. The limited data from well-designed trials indicate there may be some minor benefits with CoQ10 therapy in ejection fraction and end diastolic volume. CoQ10 therapy has been shown to be relatively safe with a low incidence of adverse effects.

Publication Types:
Review

MeSH Terms:
Antioxidants/therapeutic use*
Coenzymes
Heart Failure/drug therapy*
Heart Failure/pathology
Heart Failure/physiopathology
Humans
Oxidative Stress/drug effects
Ubiquinone/analogs & derivatives*
Ubiquinone/therapeutic use
Ventricular Remodeling/drug effects

Substances:
Antioxidants
Coenzymes
Ubiquinone
coenzyme Q10

an example of MeSH indexing

Traditional Medical information retrieval

MeSH indexing

  • Asterisks indicate a major topic of article
  • Substances include supplementary concept terms and MeSH terms, which are repeated above.
("hematologic neoplasms"[MeSH Terms] 
    OR 
    (
        "hematologic"[All Fields] 
        AND 
        "neoplasms"[All Fields]
    ) 
    OR 
    "hematologic neoplasms"[All Fields] 
    OR 
    (
        "blood"[All Fields] 
        AND 
        "cancer"[All Fields]
    ) 
    OR 
    "blood cancer"[All Fields]
) 
AND 
(
    "leukocytes"[MeSH Terms] 
    OR 
    "leukocytes"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND 
        "cells"[All Fields]
    ) 
    OR 
    "white blood cells"[All Fields] 
    OR 
    "leukocyte count"[MeSH Terms] 
    OR 
    (    "leukocyte"[All Fields] 
        AND 
        "count"[All Fields]
    ) 
    OR 
    "leukocyte count"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND "cells"[All Fields]
    )
)

Query Translation

by PubMed

blood cancer white blood cells 

Information Need

an example of query translation by PubMed

Traditional Medical information retrieval

Original Query

[Blood] Cancer patients who have neutropenia have a greater risk of infection.
Your risk increases when your white blood cell count gets low and stays low for a long time.

ref: http://www.chemotherapy.com/side_effects/neutropenia/

("hematologic neoplasms"[MeSH Terms] 
    OR 
    (
        "hematologic"[All Fields] 
        AND 
        "neoplasms"[All Fields]
    ) 
    OR 
    "hematologic neoplasms"[All Fields] 
    OR 
    (
        "blood"[All Fields] 
        AND 
        "cancer"[All Fields]
    ) 
    OR 
    "blood cancer"[All Fields]
) 
AND 
(
    "leukocytes"[MeSH Terms] 
    OR 
    "leukocytes"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND 
        "cells"[All Fields]
    ) 
    OR 
    "white blood cells"[All Fields] 
    OR 
    "leukocyte count"[MeSH Terms] 
    OR 
    (    "leukocyte"[All Fields] 
        AND 
        "count"[All Fields]
    ) 
    OR 
    "leukocyte count"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND "cells"[All Fields]
    )
)

building query blocks in PubMed

Traditional Medical information retrieval

Boolean Operators: AND, OR, NOT can be used to combine query terms

Parentheses: can be used for nesting  individual concepts 

Asterisk (wildcard): can be used for truncation 

Quotation Marks: can be used for phrase searching 

Square Brackets: can be used to specify the search field tags 

("hematologic neoplasms"[MeSH Terms] 
    OR 
    (
        "hematologic"[All Fields] 
        AND 
        "neoplasms"[All Fields]
    ) 
    OR 
    "hematologic neoplasms"[All Fields] 
    OR 
    (
        "blood"[All Fields] 
        AND 
        "cancer"[All Fields]
    ) 
    OR 
    "blood cancer"[All Fields]
) 
AND 
(
    "leukocytes"[MeSH Terms] 
    OR 
    "leukocytes"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND 
        "cells"[All Fields]
    ) 
    OR 
    "white blood cells"[All Fields] 
    OR 
    "leukocyte count"[MeSH Terms] 
    OR 
    (    "leukocyte"[All Fields] 
        AND 
        "count"[All Fields]
    ) 
    OR 
    "leukocyte count"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND "cells"[All Fields]
    )
)

sorting results in PubMed

Traditional Medical information retrieval

ref: NLM technical bulletin

  • Most Recent
  • Relevance
  • Publication Date
  • First Author
  • Last Author
  • Journal
  • Title

Sort by Relevance: sorts the retrieved documents based on term frequency of query terms and mesh terms in these documents 

 

drawbacks of PubMed

Traditional Medical information retrieval

1- Does not consider importance of concepts

In the query "blood cancer white blood cells", the concept "blood cancer" is important than the concept "white blood cells", but no weighting is considered as PubMed is a performing boolean search. 

2- Not being able to deal with verbose free-text queries

A 26-year-old obese woman with a history of bipolar disorder complains that her recent struggles with her weight and eating have caused her to feel depressed. She states that she has recently had difficulty sleeping and feels excessively anxious and agitated. She also states that she has had thoughts of suicide. She often finds herself fidgety and unable to sit still for extended periods of time. Her family tells her that she is increasingly irritable. Her current medications include lithium carbonate and zolpidem.

For a verbose free-text query such as:

PubMed returns no result even by removing stop-words

drawbacks of PubMed

Traditional Medical information retrieval

3- Does not efficiently consider dependency between terms

PubMed considers exact phrase and individual terms of the concepts in query, but not proximity of them. In PubMed, exact phrase and terms of concepts are considered without weighting them.

4- Does not consider relevance feedback documents

Concepts only identified from the original query by using MESH knowledge base.

drawbacks of PubMed

Traditional Medical information retrieval

5- Does not consider concept semantics

For example, a concept with semantic meaning "Sign or Symptom" is considered the same way as a concept with semantic meaning "Social Behavior".

6- Does not consider concepts relationships

In PubMed, MESH Concepts are identified in the query, but no level of relationship between these concepts and other concepts is considered.

What knowledge bases other than MESH are used in Medical IR?

  • Unified Medical Language System (UMLS)
  • Systematized Nomenclature of Medicine--Clinical Terms (SNOMED-CT)
  • International Classification of Diseases (ICD)
  • ...

Unified Medical Language System (UMLS):

  • the most comprehensive knowledge-base in medical domain
  • contains three major components: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon.

Unified Medical Language System (UMLS)

UMLS Metathesaurus

  • Unifies different knowledge bases such as CPT®, ICD-10-CM, LOINC®, MeSH®, RxNorm, and SNOMED CT®
  • Groups synonymous terms into concepts
  • provides a list of concepts with their
    • term representations
    • definitions
    • relationships
    • ...

 

Unified Medical Language System (UMLS)

UMLS Semantic Network

  • Categorizes concepts by semantic types 
  • 113 semantic types (in 2015 version)
  •  can be used to reduce the complexity of the Metathesaurus and also for dimension reduction in Medical IR

Unified Medical Language System (UMLS)

Mapping biomedical text to the UMLS Metathesaurus

  • Identifying concepts drawn from a controlled vocabulary
  • Different tools such as MetaMap and SemRep can be used.
  • Performed in natural-language processing (NLP) and Medical IR applications

Unified Medical Language System (UMLS)

Mapping biomedical text to the UMLS Metathesaurus

example

query: 26-year-old obese woman with bipo­lar dis­or­der, on zolpi­dem and lithium, with re­cent dif­fi­culty sleep­ing, ag­i­ta­tion, sui­ci­dal ideation, and ir­ri­tabil­ity.

CUI Concept Name Concept Primary Name Semantic Type
C0439508 /year per year Temporal Concept
C0580836 Old Old Temporal Concept
C0028754 OBESE Obesity Disease or Syndrome
C0043210 WOMAN Woman Population Group
C0005586 Bipolar Disorder Bipolar Disorder Mental or Behavioral Dysfunction
C0078839 ZOLPIDEM zolpidem Organic Chemical,Pharmacologic Substance
C0023870 LITHIUM Lithium Element, Ion, or Isotope,Pharmacologic Substance
C0332185 Recent Recent Temporal Concept
C0235162 Difficulty sleeping Difficulty sleeping Sign or Symptom
C0085631 AGITATION Agitation Sign or Symptom
C0424000 Suicidal Ideation Feeling suicidal (finding) Finding
C0022107 IRRITABILITY Irritable Mood Finding

MetaMap Results:

How knowledge bases can improve retrieval performance?

approach 1

How knowledge bases can improve retrieval performance?

approach 2

Conclusions

  • Why a domain-specific IR system is required for medical applications is discussed.

  • Traditional Medical IR systems are introduced.

  • PubMed as the most popular Medical IR system is discussed in detail.

  • Drawbacks of PubMed is presented.

  • Medical Knowledge-bases are introduced.

  • UMLS as the most comprehensive medical Knowledge-bases is discussed.

  • Finally, how UMLS can be used in medical IR system is discussed. Two corresponding approaches are discussed. 

medical_information_retrieval

By Saeid Balaneshin Kordan

medical_information_retrieval

medical information retrieval

  • 1,433