Deep NLP for Adverse drug event extraction
Adverse Drug Events
- Adverse reaction caused by drugs is a potentially dangerous
- leads to mortality and morbidity in patients.
- Adverse Drug Event (ADE) extraction is a significant and unsolved problem in biomedical research.
DATA Source
PUBMED Abstracts
I have been on Methotrexate since a year ago. It seemed to be helping and under care of my doctor. I have developed an inflammed stomach lining and two ulcers due to this drug. Other meds I am on do not leave me with any side affects. I have had an waiting for treatment from the Endoscopy doctor that did the tests. It will be a type of medicine to heal my stomach. I have been very sick and vomiting, dry heaves, and am limited to what I can eat. Please make sure if you have any of these side affects, you inform your doctor immediately. I am off the Methotrexate for good. Not a good experience for me. Thank You. |
Problem Definition
Given a sequence of words <w1, w2, w3, ..., wn> :
- entity extraction: label the sequence whether the word is a drug, disease or neither
- relationship extraction: extract the relationship between the drugs and diseases pairs
Example - Relationship extraction
<methotrexate, sever side effects> - YES
I have suffered sever side effects from the oral methotrexate and have not been able to remain on this medication.
EXAMPLE - Entity extraction
I |
have | suffered | sever | side |
O | O | O |
B-Disease |
I-Disease |
effects |
from | the | oral |
methotrexate |
L-Disease |
O | O | O |
U-Drug |
and | have | not | been | able |
O | O | O |
O | O |
to | remain | on | this | medication |
O |
O |
O |
O |
O |
BILOU - Begin, Inside, Last, Outside, Unit
Existing Architectures
Joint Models for Extracting Adverse Drug Events from Biomedical Text
- Uses Convolution
- Models entity extraction and relationship extraction as a state transition problem
Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2016
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
- Using SDP provides more context
- Uses TreeLSTM
Miwa, M. and Bansal, M., 2016.
A neural joint model for entity and relation extraction from biomedical text
Fei Li, Yue Zhang, Meishan Zhang, Donghong Ji, 2017
- Everything from above, and
- Character embedding
Our Model
Performance comparison
Entity Extraction | ADE Extraction | |
Li, et al, 2016 | 79.5 | 63.4 |
Miwa & Bansal | 83.4 | 55.6 |
Li, et al, 2017 | 84.6 | 71.4 |
Our model | 85.30 | 86.78 |
EEAP Framework for nLP
Embed, Encode, Attend, Predict
Embed, encode, attend, Predict
word-level representation
Frequency based
- TF - Term Frequency
- IDF - Inverse Document Frequency
- Penalty for common words
- Co-occurrence Matrix
- V x N
Word embedding
- Distributed Representation
- Captures semantic meaning
- meaning is relative
- Fundamentally based on co-occurrence
- Prediction based vectorization
- predict neighboring words
Word embedding
Word2vec : CBOW
Word2vec : Skipgram
Pre-trained word embeddings
- Word2Vec, Glove
- wikipedia
- common crawl
Word vectors induced from
- PubMed, PMC
- Uses word2vec
- Out-of-vocabulary (OOV) Tokens
- Large Vocabulary size
- Rare words are left out
- Possible Solution
- Average of neighbours
character-level word representation
- Vocabulary of unique characters
- fixed and small
- Morphological Features
- Word as a sequence (RNN)
- Word as a 2D image [count x dim] (CNN)
- Jointly trained along model objective
Hybrid embedding
- Combines
- Morphological features
- Semantic Features
- Combination Method
- Concatenation
- \( embedding(w_i) = [ W_{w_i} ; C_{w_i} ] \)
- Gated Mixing
- \( cg_{w_i} = f(W_{w_i}, C_{w_i})\)
- \( wg_{w_i} = g(W_{w_i}, C_{w_i})\)
- \( embedding(w_i) = cg_{w_i}.C_{w_i} + wg_{w_i}.W_{w_i} \)
- Concatenation
sequence-level representation
feed forward neural network
Forward Propagation
Bidirectional RNN
Vanilla RNN
Gating mechanism
LSTM - Long Short term memory
reduction by attention pooling
Attention Mechanism
- Reference, an array of units
- Query
- Attention weights
- signify which parts of reference are relevant to query
- Which parts of the context are relevant to the query?
- Weighted or Blended Representation
Attention Mechanism
Multiplicative attention
- \(a_{ij} = h_i^TW_as_j\)
- \(a_{ij} = h_i^Ts_j\)
Additive attention
- \(a_{ij} = v_a^T tanh(W_1h_i + W_2s_j)\)
- \(a_{ij} = v_a^T tanh(W_a [h_i ; s_j])\)
Blended Representation
- \(c_i = \sum_j a_{ij} s_j\)
Attention Mechanism
sequence labelling, classification
- Final/Target Representation
- Affine Transformation
- Optional Non-linearity
- Log-Likelihood
- Softmax
- Probability distribution across classes
our architecture
attentive sequence model for ADE extraction
Redefining the problem
- Model ADE Extraction as a Question Answering Problem
- Inspired by Reading Comprehension Literature
- Given a sequence and a drug
- Is the t_th word in the sequence an Adverse Drug Event
Word Embedding
Character-level Word Representation
Multiple filters of different widths
Max-pooling across word length dimension
PoS and Label Embedding
PoS embedding helps when learning from small dataset
interaction layer
entity recognition
ade extraction
state of the art
Feature augmentation
f1 histogram
ER f1 vs ade f1
Deep NLP for ADE Extraction in BIomedical text
By Suriyadeepan R
Deep NLP for ADE Extraction in BIomedical text
- 1,665