Deep NLP for Adverse drug event extraction

Adverse Drug Events

Problem

Motivation

  • Adverse reaction caused by drugs is a potentially dangerous
  • leads to mortality and morbidity in patients.
  • Adverse Drug Event (ADE) extraction is a significant  and unsolved problem in biomedical research.

DATA Source

PUBMED Abstracts

I have been on Methotrexate since a year ago. It seemed to be helping and under care of my doctor.  I have developed an inflammed stomach lining and two ulcers due to this drug.  Other meds I am on do not leave me with any side affects.  I have had an ultrasound..am waiting for treatment from the Endoscopy doctor that did the tests.  It will be a type of medicine to heal my stomach.  I have been very sick and vomiting, dry heaves, and am limited to what I can eat.  Please make sure if you have any of these side affects, you inform your doctor immediately.  I am off the Methotrexate for good.  Not a good experience for me.  Thank You.

Problem Definition

Given a sequence of words <w1, w2, w3, ..., wn> :

  1. entity extraction: label the sequence whether the word is a drug, disease or neither
  2. relationship extraction: extract the relationship between the drugs and diseases pairs

Example - Relationship extraction

<methotrexate, sever side effects> - YES

I have suffered sever side effects from the oral methotrexate and have not been able to remain on this medication.

EXAMPLE - Entity extraction

I
have suffered sever side
O O O B-Disease
I-Disease
effects
from the oral methotrexate
L-Disease
O O O
U-Drug
and have not been able
O O O
O O
to remain on this medication
O
O
O
O
O

BILOU - Begin, Inside, Last, Outside, Unit

Existing Architectures

Joint Models for Extracting Adverse Drug Events from Biomedical Text

https://www.ijcai.org/Proceedings/16/Papers/403.pdf

  • Uses Convolution
  • Models entity extraction and relationship extraction as a state transition problem

Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2016

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

  • Using SDP provides more context
  • Uses TreeLSTM

http://arxiv.org/abs/1601.00770

Miwa, M. and Bansal, M., 2016.

A neural joint model for entity and relation extraction from biomedical text

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1609-9

Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2017

  • Everything from above, and
  • Character embedding

Our Model

Performance comparison

Entity Extraction ADE Extraction
Li, et al, 2016 79.5 63.4
Miwa & Bansal 83.4 55.6
Li, et al, 2017 84.6 71.4
Our model 85.30 86.78

EEAP Framework for nLP

Embed, Encode, Attend, Predict

Embed, encode, attend, Predict

Embed

word-level representation

Frequency based

  • TF-IDF
    • TF - Term Frequency
    • IDF - Inverse Document Frequency
    • Penalty for common words
  • Co-occurrence Matrix
    • V x N

Word embedding

  • Distributed Representation
    • Captures semantic meaning
    • meaning is relative
  • Fundamentally based on co-occurrence
  • Prediction based vectorization
    • predict neighboring words

Word embedding

Word2vec : CBOW

Word2vec : Skipgram

Pre-trained word embeddings

  • Word2Vec, Glove
    • wikipedia
    • common crawl
  • Word vectors induced from
    • PubMed, PMC
    • ​Uses word2vec

flaws

  • Out-of-vocabulary (OOV) Tokens
    • Large Vocabulary size
    • Rare words are left out
    • Possible Solution
      • Average of neighbours

character-level word representation

  • Vocabulary of unique characters
    • fixed and small
  • Morphological Features
  • Word as a sequence (RNN)
  • Word as a 2D image [count x dim] (CNN)
  • Jointly trained along model objective

Hybrid embedding

  • Combines
    • Morphological features
    • Semantic Features
  • Combination Method
    • Concatenation
      • \( embedding(w_i) = [ W_{w_i} ; C_{w_i} ] \)
    • Gated Mixing
      • \( cg_{w_i} = f(W_{w_i}, C_{w_i})\)
      • \( wg_{w_i} = g(W_{w_i}, C_{w_i})\)
      • \( embedding(w_i) = cg_{w_i}.C_{w_i} + wg_{w_i}.W_{w_i} \)

ENcode

sequence-level representation

feed forward neural network

Recurrence

Unfolding

Unfolding

Forward Propagation

Bidirectional RNN

Vanilla RNN

Gating mechanism

LSTM - Long Short term memory

Attend

reduction by attention pooling

Attention Mechanism

  • Reference, an array of units
  • Query
  • Attention weights
    • signify which parts of reference are relevant to query
  • Which parts of the context are relevant to the query?
  • Weighted or Blended Representation

Attention Mechanism

  • Multiplicative attention
    • \(a_{ij} = h_i^TW_as_j\)
    • \(a_{ij} = h_i^Ts_j\)
  • Additive attention
    • ​\(a_{ij} = v_a^T tanh(W_1h_i + W_2s_j)\)
    • ​\(a_{ij} = v_a^T tanh(W_a [h_i ; s_j])\)
  • Blended Representation
    • \(c_i = \sum_j a_{ij} s_j\)

Attention Mechanism

predict

sequence labelling, classification

Classification

  • Final/Target Representation
  • Affine Transformation
  • Optional Non-linearity
  • Log-Likelihood
  • Softmax
    • Probability distribution across classes

our architecture

attentive sequence model for ADE extraction

Redefining the problem

  • Model ADE Extraction as a Question Answering Problem
  • Inspired by Reading Comprehension Literature
  • Given a sequence and a drug
    • Is the t_th word in the sequence an Adverse Drug Event

Architecture

embedding

  • Word Embedding

    • Fixed

    • Variable

  • Character-level Word Representation

    • CharCNN

    • Multiple filters of different widths

    • Max-pooling across word length dimension

  • PoS and Label Embedding

    • PoS embedding helps when learning from small dataset

interaction layer

entity recognition

ade extraction

state of the art

Feature augmentation

f1 histogram

ER f1 vs ade f1

heatmap

Made with Slides.com