Deep NLP for Adverse drug event extraction

Adverse Drug Events

Problem

Motivation

Adverse reaction caused by drugs is a potentially dangerous
leads to mortality and morbidity in patients.
Adverse Drug Event (ADE) extraction is a significant and unsolved problem in biomedical research.

DATA Source

PUBMED Abstracts

I have been on Methotrexate since a year ago. It seemed to be helping and under care of my doctor. I have developed an inflammed stomach lining and two ulcers due to this drug. Other meds I am on do not leave me with any side affects. I have had an ultrasound..am waiting for treatment from the Endoscopy doctor that did the tests. It will be a type of medicine to heal my stomach. I have been very sick and vomiting, dry heaves, and am limited to what I can eat. Please make sure if you have any of these side affects, you inform your doctor immediately. I am off the Methotrexate for good. Not a good experience for me. Thank You.

Problem Definition

Given a sequence of words <w1, w2, w3, ..., wn> :

entity extraction: label the sequence whether the word is a drug, disease or neither
relationship extraction: extract the relationship between the drugs and diseases pairs

Example - Relationship extraction

<methotrexate, sever side effects> - YES

I have suffered sever side effects from the oral methotrexate and have not been able to remain on this medication.

EXAMPLE - Entity extraction

I	have	suffered	sever	side
O	O	O	B-Disease	I-Disease
effects	from	the	oral	methotrexate
L-Disease	O	O	O	U-Drug
and	have	not	been	able
O	O	O	O	O
to	remain	on	this	medication
O	O	O	O	O

BILOU - Begin, Inside, Last, Outside, Unit

Existing Architectures

Joint Models for Extracting Adverse Drug Events from Biomedical Text

https://www.ijcai.org/Proceedings/16/Papers/403.pdf

Uses Convolution
Models entity extraction and relationship extraction as a state transition problem

Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2016

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

Using SDP provides more context
Uses TreeLSTM

http://arxiv.org/abs/1601.00770

Miwa, M. and Bansal, M., 2016.

A neural joint model for entity and relation extraction from biomedical text

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1609-9

Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2017

Everything from above, and
Character embedding

Our Model

Performance comparison

	Entity Extraction	ADE Extraction
Li, et al, 2016	79.5	63.4
Miwa & Bansal	83.4	55.6
Li, et al, 2017	84.6	71.4
Our model	85.30	86.78

EEAP Framework for nLP

Embed, Encode, Attend, Predict

Embed, encode, attend, Predict

Embed

word-level representation

Frequency based

TF-IDF
- TF - Term Frequency
- IDF - Inverse Document Frequency
- Penalty for common words
Co-occurrence Matrix
- V x N

Word embedding

Distributed Representation
- Captures semantic meaning
- meaning is relative
Fundamentally based on co-occurrence
Prediction based vectorization
- predict neighboring words

Word embedding

Word2vec : CBOW

Word2vec : Skipgram

Pre-trained word embeddings

Word2Vec, Glove
- wikipedia
- common crawl
Word vectors induced from
- PubMed, PMC
- Uses word2vec

flaws

Out-of-vocabulary (OOV) Tokens
- Large Vocabulary size
- Rare words are left out
- Possible Solution
  - Average of neighbours

character-level word representation

Vocabulary of unique characters
- fixed and small
Morphological Features
Word as a sequence (RNN)
Word as a 2D image [count x dim] (CNN)
Jointly trained along model objective

Hybrid embedding

Combines
- Morphological features
- Semantic Features
Combination Method
- Concatenation
  - \( embedding(w_i) = [ W_{w_i} ; C_{w_i} ] \)
- Gated Mixing
  - \( cg_{w_i} = f(W_{w_i}, C_{w_i})\)
  - \( wg_{w_i} = g(W_{w_i}, C_{w_i})\)
  - \( embedding(w_i) = cg_{w_i}.C_{w_i} + wg_{w_i}.W_{w_i} \)

ENcode

sequence-level representation

feed forward neural network

Recurrence

Unfolding

Forward Propagation

Bidirectional RNN

Vanilla RNN

Gating mechanism

LSTM - Long Short term memory

Attend

reduction by attention pooling

Attention Mechanism

Reference, an array of units
Query
Attention weights
- signify which parts of reference are relevant to query
Which parts of the context are relevant to the query?
Weighted or Blended Representation

Attention Mechanism

Multiplicative attention
- \(a_{ij} = h_i^TW_as_j\)
- \(a_{ij} = h_i^Ts_j\)
Additive attention
- \(a_{ij} = v_a^T tanh(W_1h_i + W_2s_j)\)
- \(a_{ij} = v_a^T tanh(W_a [h_i ; s_j])\)
Blended Representation
- \(c_i = \sum_j a_{ij} s_j\)

Attention Mechanism

predict

sequence labelling, classification

Classification

Final/Target Representation
Affine Transformation
Optional Non-linearity
Log-Likelihood
Softmax
- Probability distribution across classes

our architecture

attentive sequence model for ADE extraction

Redefining the problem

Model ADE Extraction as a Question Answering Problem
Inspired by Reading Comprehension Literature
Given a sequence and a drug
- Is the t_th word in the sequence an Adverse Drug Event

Architecture

embedding

Word Embedding
- Fixed
- Variable
Character-level Word Representation
- CharCNN
- Multiple filters of different widths
- Max-pooling across word length dimension
PoS and Label Embedding
- PoS embedding helps when learning from small dataset

Deep NLP for Adverse drug event extraction

Adverse Drug Events

Problem

Motivation

DATA Source

Problem Definition

Example - Relationship extraction

EXAMPLE - Entity extraction

Existing Architectures

Our Model

Performance comparison

EEAP Framework for nLP

Embed, Encode, Attend, Predict

Embed, encode, attend, Predict

Embed

word-level representation

Frequency based

Word embedding

Word embedding

Word2vec : CBOW

Word2vec : Skipgram

Pre-trained word embeddings

flaws

character-level word representation

Hybrid embedding

ENcode

sequence-level representation

feed forward neural network

Recurrence

Unfolding

Unfolding

Forward Propagation

Bidirectional RNN

Vanilla RNN

Gating mechanism

LSTM - Long Short term memory

Attend

reduction by attention pooling

Attention Mechanism

Attention Mechanism

Attention Mechanism

predict

sequence labelling, classification

Classification

our architecture

attentive sequence model for ADE extraction

Redefining the problem

Architecture

embedding

interaction layer

entity recognition

ade extraction

state of the art

Feature augmentation

f1 histogram

ER f1 vs ade f1

heatmap

Deep NLP for ADE Extraction in BIomedical text

More from Suriyadeepan R