A Comprehensive Study on 

Medical Information Retrieval Systems

PhD Qualifying Exam

Saeid Balaneshinkordan

advisor:

Dr. Alexander Kotov

March 2016

Why is Medical Information Retrieval important?

Diagnosed by Blood Cancer / hour in US

Death by Blood Cancer / hour in US

Clinical Decision Support (CDS) Systems

definition of CDS systems

  • Assists clinicians, staff and patients by providing  information to enhance health and health care.
  • CDS consists of a variety of tools such as: 
    • alerts and reminders to be used either by care providers or patients;
    • clinical guidelines;
    • condition-specific order sets;
    • focused patient data reports and summaries;
    • documentation templates;
    • support in diagnosis process, and contextually relevant reference information
    • ...

Clinical Decision Support (CDS) Systems

  • Improve Precision of Clinical Tasks
  • Reduce Duration of Clinical Tasks
  • Reduce Cost of Clinical Tasks
  • ...

benefits of CDS systems

Medical information retrieval

  • One of tasks a CDS system could be designed to;

  • Supports clinical decision-making;
  • Overcomes abundance of medical information; 

  • Deals with narrative and verbose queries instead of keyword-based queries;
  • Favors precision over recall
  • a structured medical query and its corresponding relevant documents:
    1. Diagnosis De­scrip­tion: A 26-year-old obese woman with a his­tory of bipo­lar dis­or­der com­plains that her re­cent strug­gles with her weight and eat­ing have caused her to feel de­pressed. She states that she has re­cently had dif­fi­culty sleep­ing and feels ex­ces­sively anx­ious and ag­i­tated. She also states that she has had thoughts of sui­cide. She often finds her­self fid­gety and un­able to sit still for ex­tended pe­ri­ods of time. Her fam­ily tells her that she is in­creas­ingly ir­ri­ta­ble. Her cur­rent med­ica­tions in­clude lithium car­bon­ate and zolpi­dem.
    Sum­mary: 26-year-old obese woman with bipo­lar dis­or­der, on zolpi­dem and lithium, with re­cent dif­fi­culty sleep­ing, ag­i­ta­tion, sui­ci­dal ideation, and ir­ri­tabil­ity.

Query Type

Query Description

Query Summary

relevant documents

  • Query Types:
    • ​Diagnosis (What is the patient's diagnosis?)
    • Test (What tests should the patient receive?)
    • Treatment (How should the patient be treated?)

Medical information retrieval

an example of a medical query

Medical information retrieval

general-purpose vs. domain-specific search engines

  • General-purpose search engine
  • Retrieve more types of literature
  • Higher recall
  • Domain-specific search engine
  • Retrieve documents mainly from Medline (22 million out of 25 million)
  • Higher precision

note: Medlinebibliographic database, concentrated on biomedicine,  all of its records are indexed with U.S. National Library of Medicine (NLM®) Medical Subject Headings (MeSH®)

Traditional Medical information retrieval

biomedical search engines

  PubMed citations come from:

  1. Medline indexed journals,
  2. journals/manuscripts deposited in PubMed Central (PMC®), 
  3. National Center for Biotechnology Information (NCBI®) Bookshelf.

Two main biomedical search engines provided by U.S. National Library of Medicine (NLM®):

PubMed and Entrez

Traditional Medical information retrieval

Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®

  MESH:

  1. a vocabulary that gives uniformity and consistency to the indexing and cataloging of biomedical literature,
  2. assists with subject indexing and subject searching.

MEDLINE Subject Indexing:

  1. determining a journal article (or other material) subject content,
  2. describing that content using a controlled vocabulary. 

purpose: facilitating search retrieval by eliminating the use of variant terminology for the same concept

Article Title:
The role of coenzyme Q10 in heart failure.

Abstract:
OBJECTIVE: To review the clinical data demonstrating the safety and efficacy of coenzyme Q10 (CoQ10) in heart failure (HF).

DATA SOURCES: Pertinent literature was identified through MEDLINE (1966-January 2005) using the search terms coenzyme Q10, heart failure, antioxidants, and oxidative stress. Only articles written in the English language and evaluating human subjects were used.

DATA SYNTHESIS: HF impairs the ability of the heart to maintain its normal cardiac output. Following an initial insult, cardiac remodeling ensues, resulting in left ventricular dilation and hypertrophy. Oxidative stress is also increased, while CoQ10 levels are decreased in patients with HF. This has led to the hypothesis that CoQ10, an antioxidant, may decrease oxidative stress, impair remodeling, and improve cardiac function.

CONCLUSIONS: Large, well-designed studies on this topic are lacking. The limited data from well-designed trials indicate there may be some minor benefits with CoQ10 therapy in ejection fraction and end diastolic volume. CoQ10 therapy has been shown to be relatively safe with a low incidence of adverse effects.

Publication Types:
Review

MeSH Terms:
Antioxidants/therapeutic use*
Coenzymes
Heart Failure/drug therapy*
Heart Failure/pathology
Heart Failure/physiopathology
Humans
Oxidative Stress/drug effects
Ubiquinone/analogs & derivatives*
Ubiquinone/therapeutic use
Ventricular Remodeling/drug effects

Substances:
Antioxidants
Coenzymes
Ubiquinone
coenzyme Q10

Traditional Medical information retrieval

an example of MeSH indexing

MeSH indexing

  • Asterisks indicate a major topic of article
  • Substances include supplementary concept terms and MeSH terms, which are repeated above.
("hematologic neoplasms"[MeSH Terms] 
    OR 
    (
        "hematologic"[All Fields] 
        AND 
        "neoplasms"[All Fields]
    ) 
    OR 
    "hematologic neoplasms"[All Fields] 
    OR 
    (
        "blood"[All Fields] 
        AND 
        "cancer"[All Fields]
    ) 
    OR 
    "blood cancer"[All Fields]
) 
AND 
(
    "leukocytes"[MeSH Terms] 
    OR 
    "leukocytes"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND 
        "cells"[All Fields]
    ) 
    OR 
    "white blood cells"[All Fields] 
    OR 
    "leukocyte count"[MeSH Terms] 
    OR 
    (    "leukocyte"[All Fields] 
        AND 
        "count"[All Fields]
    ) 
    OR 
    "leukocyte count"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND "cells"[All Fields]
    )
)

Query Translation

by PubMed

blood cancer white blood cells 

Information Need

Traditional Medical information retrieval

an example of query translation by PubMed

Original Query

[Blood] Cancer patients who have neutropenia have a greater risk of infection.
Your risk increases when your white blood cell count gets low and stays low for a long time.

ref: http://www.chemotherapy.com/side_effects/neutropenia/

Boolean Operators: AND, OR, NOT can be used to combine query terms

Parentheses: can be used for nesting  individual concepts 

Asterisk (wildcard): can be used for truncation 

Quotation Marks: can be used for phrase searching 

Square Brackets: can be used to specify the search field tags 

("hematologic neoplasms"[MeSH Terms] 
    OR 
    (
        "hematologic"[All Fields] 
        AND 
        "neoplasms"[All Fields]
    ) 
    OR 
    "hematologic neoplasms"[All Fields] 
    OR 
    (
        "blood"[All Fields] 
        AND 
        "cancer"[All Fields]
    ) 
    OR 
    "blood cancer"[All Fields]
) 
AND 
(
    "leukocytes"[MeSH Terms] 
    OR 
    "leukocytes"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND 
        "cells"[All Fields]
    ) 
    OR 
    "white blood cells"[All Fields] 
    OR 
    "leukocyte count"[MeSH Terms] 
    OR 
    (    "leukocyte"[All Fields] 
        AND 
        "count"[All Fields]
    ) 
    OR 
    "leukocyte count"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND "cells"[All Fields]
    )
)

Traditional Medical information retrieval

building query blocks in PubMed

ref: NLM technical bulletin

Most Recent

Relevance

Publication Date

First Author

Last Author

Journal

Title

Sort by Relevance: sorts the retrieved documents based on term frequency of query terms and mesh terms in these documents 

("hematologic neoplasms"[MeSH Terms] 
    OR 
    (
        "hematologic"[All Fields] 
        AND 
        "neoplasms"[All Fields]
    ) 
    OR 
    "hematologic neoplasms"[All Fields] 
    OR 
    (
        "blood"[All Fields] 
        AND 
        "cancer"[All Fields]
    ) 
    OR 
    "blood cancer"[All Fields]
) 
AND 
(
    "leukocytes"[MeSH Terms] 
    OR 
    "leukocytes"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND 
        "cells"[All Fields]
    ) 
    OR 
    "white blood cells"[All Fields] 
    OR 
    "leukocyte count"[MeSH Terms] 
    OR 
    (    "leukocyte"[All Fields] 
        AND 
        "count"[All Fields]
    ) 
    OR 
    "leukocyte count"[All Fields] 
    OR 
    (
        "white"[All Fields] 
        AND 
        "blood"[All Fields] 
        AND "cells"[All Fields]
    )
)

Traditional Medical information retrieval

sorting results in PubMed

1- Does not consider importance of concepts

In the query "blood cancer white blood cells", the concept "blood cancer" is important than the concept "white blood cells", but no weighting is considered as PubMed is a performing Boolean search. 

Traditional Medical information retrieval

drawbacks of PubMed

2- Does not able to deal with verbose free-text queries

A 26-year-old obese woman with a history of bipolar disorder complains that her recent struggles with her weight and eating have caused her to feel depressed. She states that she has recently had difficulty sleeping and feels excessively anxious and agitated. She also states that she has had thoughts of suicide. She often finds herself fidgety and unable to sit still for extended periods of time. Her family tells her that she is increasingly irritable. Her current medications include lithium carbonate and zolpidem.

For a verbose free-text query such as:

PubMed returns no result even by removing stop-words

Traditional Medical information retrieval

drawbacks of PubMed

3- Does not efficiently consider dependency between terms

PubMed considers exact phrase and individual terms of the concepts in query, but not proximity of them. In PubMed, exact phrase and terms of concepts are considered without weighting them.

4- Does not consider relevance feedback documents

Concepts only identified from the original query by using MESH knowledge base.

Traditional Medical information retrieval

drawbacks of PubMed

5- Does not consider concept semantics

For example, a concept with semantic meaning "Sign or Symptom" is considered the same way as a concept with semantic meaning "Social Behavior".

6- Does not consider concepts relationships

In PubMed, MESH Concepts are identified in the query, but no level of relationship between these concepts and other concepts is considered.

What knowledge bases other than MESH are used in Medical IR?

  • Unified Medical Language System (UMLS)
  • Systematized Nomenclature of Medicine--Clinical Terms (SNOMED-CT)
  • International Classification of Diseases (ICD)
  • ...

Unified Medical Language System (UMLS)

Unified Medical Language System (UMLS):

  • the most comprehensive knowledge-base in medical domain
  • contains three major components: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon.

UMLS Metathesaurus

  • Unifies different knowledge bases such as CPT®, ICD-10-CM, LOINC®, MeSH®, RxNorm, and SNOMED CT®
  • Groups synonymous terms into concepts
  • provides a list of concepts with their
    • term representations
    • definitions
    • relationships
    • ...

UMLS - a concept example

UMLS Semantic Network

  • Categorizes concepts by semantic types 
  • 113 semantic types (in 2015 version)
  • Can be used to reduce the complexity of the Metathesaurus and also for dimension reduction in Medical IR

semantic types "appropriate" for different medical tasks

ref: Limsopatham, Nut, Craig Macdonald, and Iadh Ounis. "Inferring conceptual relationships to improve medical records search." In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pp. 1-8. , 2013.

Mapping biomedical text to the UMLS Metathesaurus

  • Identifying concepts drawn from a controlled vocabulary
  • Different tools such as MetaMap and SemRep can be used.
  • Performed in natural-language processing (NLP) and Medical IR applications

Mapping biomedical text to the UMLS Metathesaurus

Example

query: 26-year-old obese woman with bipo­lar dis­or­der, on zolpi­dem and lithium, with re­cent dif­fi­culty sleep­ing, ag­i­ta­tion, sui­ci­dal ideation, and ir­ri­tabil­ity.

CUI Concept Name Concept Primary Name Semantic Type
C0439508 /year per year Temporal Concept
C0580836 Old Old Temporal Concept
C0028754 OBESE Obesity Disease or Syndrome
C0043210 WOMAN Woman Population Group
C0005586 Bipolar Disorder Bipolar Disorder Mental or Behavioral Dysfunction
C0078839 ZOLPIDEM zolpidem Organic Chemical,Pharmacologic Substance
C0023870 LITHIUM Lithium Element, Ion, or Isotope,Pharmacologic Substance
C0332185 Recent Recent Temporal Concept
C0235162 Difficulty sleeping Difficulty sleeping Sign or Symptom
C0085631 AGITATION Agitation Sign or Symptom
C0424000 Suicidal Ideation Feeling suicidal (finding) Finding
C0022107 IRRITABILITY Irritable Mood Finding

MetaMap Results:

Concept Selection by using Semantic Types

Example

query: 26-year-old obese woman with bipo­lar dis­or­der, on zolpi­dem and lithium, with re­cent dif­fi­culty sleep­ing, ag­i­ta­tion, sui­ci­dal ideation, and ir­ri­tabil­ity.

assume list of selected semantic types for this specific task:

Disease or Syndrome
Mental or Behavioral Dysfunction
Sign or Symptom
CUI Concept Name Concept Primary Name Semantic Type
C0028754 OBESE Obesity Disease or Syndrome
C0005586 Bipolar Disorder Bipolar Disorder Mental or Behavioral Dysfunction
C0235162 Difficulty sleeping Difficulty sleeping Sign or Symptom
C0085631 AGITATION Agitation Sign or Symptom

List of selected concept:

Concept-based Medical Retrieval Approach

Approach 1

Identify concepts from all the documents in the collection

Identify concepts from the queries 

Using statistics of concepts in the documents in collection and in the query and find similarity of the documents to the query  

sort documents in the collection 

bag of words

bag of concepts

Concept-based Medical Retrieval Approach

Approach 1 - Example

ref: Wang, Chunye, and Ramakrishna Akella. "Concept-based relevance models for medical and semantic information retrieval." In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 173-182. ACM, 2015.

Concept-based Medical Retrieval Approach

Approach 2

Identify concepts from the queries

Use Sequential Dependence Model (SDM) model to get statistics of the query concepts in the collection ​

Use statistics of concepts in the documents in collection and in the query and find similarity of the documents to the query  

sort documents in the collection 

ref: Choi, Sungbin, Jinwook Choi, Sooyoung Yoo, Heechun Kim, and Youngho Lee. "Semantic concept-enriched dependence model for medical information retrieval." Journal of biomedical informatics 47 (2014): 18-27.

Indri Query Language:

Ranking function:

SDM assumption: dependency between adjacent query terms.

concept example: elderly patients ventilator associated pneumonia

weight parameters for single terms, ordered phrases and unordered phrases

Use Sequential Dependence Model (SDM) model to get statistics of the query concepts in the collection ​

Collection

External Sources

Top-ranked Documents

Knowledge-base
 

Relationship Tables

MetaMap

Direct Identification

Query

Concept Sources:

Identification Methods:

Concept Sources and Concept Identification
for Medical Query Expansion

Ranking function:

concept weight:

Matching function:

linear weighted combination of importance features

log of the language modeling estimate for concept κ with Dirichlet smoothing

linear weighted combination of matches in document D of all concepts types in T

Parameterized Concept Weighting

Concept importance features 

Parameterized Concept Weighting

the probability of P being health-related over all the Wikipedia pages:

P: Wikipedia page corresponding to the concept

ref: Soldaini, Luca, Arman Cohan, Andrew Yates, Nazli Goharian, and Ophir Frieder. "Retrieving medical literature for clinical decision support." In Advances in Information Retrieval, pp. 538-549. Springer International Publishing, 2015.

General-purpose knowledge-bases
for query expansion

minimizing at one direction at a time

Coordinate Ascent

 multivariable minimization optimization problem 

univariate optimization problem

univariate optimization problem

univariate optimization problem

Optimization Techniques
for Medical query expansion

  • Why a domain-specific IR system is required for medical applications is discussed.

  • Traditional Medical IR systems are introduced.

  • PubMed as the most popular Medical IR system is discussed in detail.

  • Drawbacks of PubMed is presented.

  • Medical Knowledge-bases are introduced.

  • UMLS as the most comprehensive medical Knowledge-bases is discussed.

  • Finally, how UMLS can be used in medical IR system is discussed. Two corresponding approaches are discussed. 

Conclusions

Thank You!

medical_information_retrieval_a_comprehensive_overview

By Saeid Balaneshin Kordan

medical_information_retrieval_a_comprehensive_overview

medical information retrieval

  • 1,359