hello, wor    !

tushaar gangavarapu

__

hello, wor    !

tushaar gangavarapu

__

ld

hello, health!

tushaar gangavarapu

language and healthcare?

80% of medical data is unstructured!

language and healthcare?

80% of medical data is unstructured!

Mr. Tushaar is a ?-year-old male, with a history of migraines, underwent Toric Collamar surgery in 2020.

 

Post surgery, he developed vision with halo rings (two overlapping rings around light) and dry eyes. The patient was given Hyaluronic eye drops.

language and healthcare?

80% of medical data is unstructured!

rich patient information

(de-identification)

Mr. Tushaar is a ?-year-old male, with a history of migraines, underwent Toric Collamar surgery in 2020.

 

Post surgery, he developed vision with halo rings (two overlapping rings around light) and dry eyes. The patient was given Hyaluronic eye drops.

language and healthcare?

80% of medical data is unstructured!

rich patient information

(de-identification)

Mr. Tushaar is a ?-year-old male, with a history of migraines, underwent Toric Collamar surgery in 2020.

 

Post surgery, he developed vision with halo rings (two overlapping rings around light) and dry eyes. The patient was given Hyaluronic eye drops.

clinical forecasting

language and healthcare?

is healthcare setting different?

is healthcare setting different?

pains vs. aches?

is healthcare setting different?

cardiac arrest vs. heart attack?

pains vs. aches?

is healthcare setting different?

cardiac arrest vs. heart attack?

myocardial infarction, MI?

hospital in Michigan?

pains vs. aches?

hello, MIMIC-III!

hello, MIMIC-III!

hello, MIMIC-III!

nursing notes

radiology reports

nutrition

rehab services

consultation

echo and ECG

discharge summaries

hello, MIMIC-III!

radiology reports

nutrition

rehab services

consultation

echo and ECG

discharge summaries

nursing notes

nursing notes

nursing notes

nursing notes

acronyms (consistency)?

nursing notes

acronyms (consistency)?

duplicate notes with additions

nursing notes

acronyms (consistency)?

duplicate notes with additions

176.49 nursing notes per patient

(4,183 patients having more than 100 nursing notes, composed of over 17,890 words)

\text{Monge-Elkan}_\text{Jaro}(\eta_t^{(p)}, \eta_{t'}^{(p)}) = \frac{1}{|\eta_t^{(p)}|} \sum\limits_{i=1}^{|\eta_t^{(p)}|}\max\left\{\text{Jaro}(c_i^{(t)}, c_j^{(t')})\right\}_{j = 1}^{|\eta_{t'}^{(p)}|}
\text{Jaro}(c_i^{(t)}, c_j^{(t')}) = \begin{cases}0, & m = 0 \\ \frac{1}{m}\left(\frac{m}{|c_i^{(t)}|} + \frac{m}{|c_j^{(t')}|} + \frac{2m - t}{2m}\right),& \text{otherwise}\end{cases}

"aggregate" nursing note

\eta^{(p)}
\text{Monge-Elkan}_\text{Jaro}(\eta_t^{(p)}, \eta_{t'}^{(p)}) = \frac{1}{|\eta_t^{(p)}|} \sum\limits_{i=1}^{|\eta_t^{(p)}|}\max\left\{\text{Jaro}(c_i^{(t)}, c_j^{(t')})\right\}_{j = 1}^{|\eta_{t'}^{(p)}|}
\text{Jaro}(c_i^{(t)}, c_j^{(t')}) = \begin{cases}0, & m = 0 \\ \frac{1}{m}\left(\frac{m}{|c_i^{(t)}|} + \frac{m}{|c_j^{(t')}|} + \frac{2m - t}{2m}\right),& \text{otherwise}\end{cases}

"aggregate" nursing note

> \theta = 0.825 \implies \text{purge}(\eta_t^{(p)})
\leq \theta = 0.825 \implies \text{merge}(\eta_t^{(p)} \oplus \eta_{t'}^{(p)})
\eta^{(p)}

representing 

\eta^{(p)}
\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|
\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

bag-of-words?

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms?

W(c^{(t)}_i, \eta^{(p)}_t) = \begin{cases}\left(1 + \log_2 f(c^{(t)}_i, \eta^{(p)}_t)\right)\left(\log_2 \frac{n_p}{d(c^{(t)}_i)}\right), & d(c^{(t)}_i) > 0 \\ 0, & \text{otherwise}\end{cases}

bag-of-words → 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

word2vec (skipgram)?

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

word2vec (skipgram) → sentence2vec 

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

word2vec (skipgram) → sentence2vec → doc2vec 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

word2vec (skipgram) → sentence2vec → doc2vec 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

the "essence" of the note? 

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

word2vec (skipgram) → sentence2vec → doc2vec 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

topic modeling (Dirichlet, multinomial)

\text{topics} \sim \text{Dirichlet}(\alpha)
\text{text} \sim \text{Dirichlet}(\beta)

+ Poisson

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

word2vec (skipgram) → sentence2vec → doc2vec 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

topic modeling (Dirichlet, multinomial)

\text{topics} \sim \text{Dirichlet}(\alpha)
\text{text} \sim \text{Dirichlet}(\beta)

+ Poisson

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

word2vec (skipgram) → sentence2vec → doc2vec 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

topic modeling (Dirichlet, multinomial)

+ Poisson

representing 

\eta^{(p)}
\eta^{(p)}_t = \left(c_1^{(t)}, c_2^{(t)}, \dots, c_m^{(t)}\right)\text{; }c_i^{(t)} \in \mathbb{V}
\mathcal{T}: \eta^{(p)}_t \rightarrow \mathbb{R}^d\text{; }d \ll |\mathbb{V}|

capturing rare terms

bag-of-words → 

word2vec (skipgram) → sentence2vec → doc2vec 

\eta^{(p)} = \oplus\left\{\eta^{(p)}_t\right\}_{t=1}^{n_p}

topic modeling (Dirichlet, multinomial)

+ Poisson

attention, transformer encodings, ...

ICD-9 coding

ICD-9 coding

\vdots
\vdots
code range diagnosis
001-139 parasitic and infectious diseases
140-239 neoplasms
240-279 endocrine, immunity, metabolic, and nutritional
280-289 blood-forming organs and blood

 


 

ICD-9 coding

\vdots
\vdots
code range diagnosis
001-139 parasitic and infectious diseases
140-239 neoplasms
240-279 endocrine, immunity, metabolic, and nutritional
280-289 blood-forming organs and blood

 


 

ICD-9 coding

multi-label classification

\vdots
\vdots
code range diagnosis
001-139 parasitic and infectious diseases
140-239 neoplasms
240-279 endocrine, immunity, metabolic, and nutritional
280-289 blood-forming organs and blood

 


 

multi-label classification

multi-label classification

multi-label classification

bi-directional?

multi-label classification

multi-channel

multi-channel

multi-label classification

multi-label classification

multi-channel

multi-label classification

multi-channel

performance

performance

performance

interpretability?

performance

interpretability?

what next?

NLP for healthcare

By Tushaar Gangavarapu

NLP for healthcare

NLP for healthcare

  • 301