Deep NLP for Adverse drug event extraction
Adverse Drug Events
Problem
Motivation
- Adverse reaction caused by drugs is a potentially dangerous
- leads to mortality and morbidity in patients.
- Adverse Drug Event (ADE) extraction is a significant and unsolved problem in biomedical research.
DATA Source
PUBMED Abstracts
I have been on Methotrexate since a year ago. It seemed to be helping and under care of my doctor. I have developed an inflammed stomach lining and two ulcers due to this drug. Other meds I am on do not leave me with any side affects. I have had an ultrasound..am waiting for treatment from the Endoscopy doctor that did the tests. It will be a type of medicine to heal my stomach. I have been very sick and vomiting, dry heaves, and am limited to what I can eat. Please make sure if you have any of these side affects, you inform your doctor immediately. I am off the Methotrexate for good. Not a good experience for me. Thank You. |
Problem Definition
Given a sequence of words <w1, w2, w3, ..., wn> :
- entity extraction: label the sequence whether the word is a drug, disease or neither
- relationship extraction: extract the relationship between the drugs and diseases pairs
Example - Relationship extraction
<methotrexate, sever side effects> - YES
I have suffered sever side effects from the oral methotrexate and have not been able to remain on this medication.
EXAMPLE - Entity extraction
I |
have | suffered | sever | side |
---|---|---|---|---|
O | O | O |
B-Disease |
I-Disease |
effects |
from | the | oral |
methotrexate |
L-Disease |
O | O | O |
U-Drug |
and | have | not | been | able |
O | O | O |
O | O |
to | remain | on | this | medication |
O |
O |
O |
O |
O |
BILOU - Begin, Inside, Last, Outside, Unit
Existing Architectures
Joint Models for Extracting Adverse Drug Events from Biomedical Text
https://www.ijcai.org/Proceedings/16/Papers/403.pdf

- Uses Convolution
- Models entity extraction and relationship extraction as a state transition problem
Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2016
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
- Using SDP provides more context
- Uses TreeLSTM

http://arxiv.org/abs/1601.00770
Miwa, M. and Bansal, M., 2016.
A neural joint model for entity and relation extraction from biomedical text
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1609-9

Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2017
- Everything from above, and
- Character embedding
Our Model

Performance comparison
Entity Extraction | ADE Extraction | |
---|---|---|
Li, et al, 2016 | 79.5 | 63.4 |
Miwa & Bansal | 83.4 | 55.6 |
Li, et al, 2017 | 84.6 | 71.4 |
Our model | 85.30 | 86.78 |
EEAP Framework for nLP
Embed, Encode, Attend, Predict
Embed, encode, attend, Predict
Embed
word-level representation
Frequency based
-
TF-IDF
- TF - Term Frequency
- IDF - Inverse Document Frequency
- Penalty for common words
- Co-occurrence Matrix
- V x N
Word embedding
- Distributed Representation
- Captures semantic meaning
- meaning is relative
- Fundamentally based on co-occurrence
- Prediction based vectorization
- predict neighboring words
Word embedding

Word2vec : CBOW

Word2vec : Skipgram

Pre-trained word embeddings
- Word2Vec, Glove
- wikipedia
- common crawl
-
Word vectors induced from
- PubMed, PMC
- Uses word2vec
flaws
- Out-of-vocabulary (OOV) Tokens
- Large Vocabulary size
- Rare words are left out
- Possible Solution
- Average of neighbours
character-level word representation
- Vocabulary of unique characters
- fixed and small
- Morphological Features
- Word as a sequence (RNN)
- Word as a 2D image [count x dim] (CNN)
- Jointly trained along model objective
Hybrid embedding
- Combines
- Morphological features
- Semantic Features
- Combination Method
- Concatenation
- embedding(wi)=[Wwi;Cwi]
- Gated Mixing
- cgwi=f(Wwi,Cwi)
- wgwi=g(Wwi,Cwi)
- embedding(wi)=cgwi.Cwi+wgwi.Wwi
- Concatenation
ENcode
sequence-level representation
feed forward neural network

Recurrence

Unfolding

Unfolding

Forward Propagation

Bidirectional RNN

Vanilla RNN

Gating mechanism

LSTM - Long Short term memory

Attend
reduction by attention pooling
Attention Mechanism
- Reference, an array of units
- Query
- Attention weights
- signify which parts of reference are relevant to query
- Which parts of the context are relevant to the query?
- Weighted or Blended Representation
Attention Mechanism
-
Multiplicative attention
- aij=hiTWasj
- aij=hiTsj
-
Additive attention
- aij=vaTtanh(W1hi+W2sj)
- aij=vaTtanh(Wa[hi;sj])
-
Blended Representation
- ci=∑jaijsj
Attention Mechanism

predict
sequence labelling, classification
Classification
- Final/Target Representation
- Affine Transformation
- Optional Non-linearity
- Log-Likelihood
- Softmax
- Probability distribution across classes
our architecture
attentive sequence model for ADE extraction
Redefining the problem
- Model ADE Extraction as a Question Answering Problem
- Inspired by Reading Comprehension Literature
- Given a sequence and a drug
- Is the t_th word in the sequence an Adverse Drug Event


Architecture

embedding
-
Word Embedding
-
Fixed
-
Variable
-
-
Character-level Word Representation
-
CharCNN
-
Multiple filters of different widths
-
Max-pooling across word length dimension
-
-
PoS and Label Embedding
-
PoS embedding helps when learning from small dataset
-

interaction layer

entity recognition

ade extraction

state of the art

Feature augmentation

f1 histogram

ER f1 vs ade f1

heatmap


Deep NLP for ADE Extraction in BIomedical text
By Suriyadeepan R
Deep NLP for ADE Extraction in BIomedical text
- 1,748