Ishanu Chattopadhyay PRO
ML | Data Science Biomedical Informatics | Social Science | Assistant Professor
Ishanu Chattopadhyay, PhD
Assistant Professor of Biomedical Informatics & Computer Science
University of Kentucky
ishanu_ch@uky.edu
Stamping Out the Next Pandemic **Before** The First Human Infection
BioNorad
BioNorad
Chattopadhyay, Ishanu, Kevin Wu, Jin Li, and Aaron Esser-Kahn. "Emergenet: Fast Scalable Pandemic Risk Assessment of Influenza A Strains Circulating In Non-human Hosts." (2023). Under Review
PREEMPT
Predicting Future Mutations for Viral Genomes in the Wild
predict future emergence risk
Hemaglutinnin
Neuraminidase
Mediates Cellular Entry
Surface structures involved in host interaction
Mediates Cellular Exit
*Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. "Unbiased recursive partitioning: A conditional inference framework." Journal of Computational and Graphical statistics 15, no. 3 (2006): 651-674.
emergent macro-structure
Component predictor (Conditional Inference Tree*)
Example: Influenza A HA protein
Recursive
LSM
forest
Revealing Emergent Cross-talk from observed sequence variations
>200,000 HA sequences
H3
Northern Hemisphere
2021
Influenza A: HA
H5
2013
Influenza A: HA
| 222 | 223 | 224 | --- | 560 | |
|---|---|---|---|---|---|
| strain 1 | |||||
| strain 2 | |||||
| --- | |||||
| strain m |
observables
samples
Distributions over alphabet \(\Sigma^i\)
Individual Predictor (CIT)
cross-talk
Tension between predicted and observed distribution drives change
Example: HA Site 223 on Influenza A
\(\psi^i\)
| K | G | Y | S | T |
\(\phi\) estimates \(\psi\)
population
individual
estimate is always a non-empty non-degenerate distribution
missing observation
where \(D_{JS}(P\vert \vert Q)\) is the Jensen-Shannon divergence.
This bound connects ``closeness'' of samples to the odds of perturbing from one to the other, bridging geometry to dynamics
(Sanov's Theorem, Pinkser's Inequality)
\(\psi\)
\(\psi'\)
\(\theta\)
"spatial average": average of all plausible worldviews or states
* Sizemore, Nicholas, Kaitlyn Oliphant, Ruolin Zheng, Camilia R. Martin, Erika C. Claud, and Ishanu Chattopadhyay. "A digital twin of the infant microbiome to predict neurodevelopmental deficits." Science Advances 10, no. 15 (2024): eadj0400. https://www.science.org/doi/full/10.1126/sciadv.adj0400
persistence probability
Central to Model Drift Quantification
Start with opinion vector with all entries missing
This is a standard Physics construct, quantifying curvature of the underlying latent geometry
Easily computable in LSM framework!
Apply \(\phi^i\)
Random variable quantifying dispersion around the spatial average of worlviews
const. scaling as \(N^2\)
Influenza Risk Assessment Tool (IRAT) scoring for animal strains
slow (months), quasi-subjective, expensive
*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm
24 scores in 14 years
~10,000 strains collected annually
CDC
Emergenet time: 1 second
Stamping Out the Next Pandemic **Before** The First Human Infection
BioNorad
BioNorad
Sample predicted distributions
perturbed state within \(\epsilon\) of \(\psi\)
Definition
Sample neighborhood to impute missing data
}
LSM sampling: sampling the \(\epsilon\)-neighborhood of a strain reveals local "valid perturbations"
Null state (all missing observations)
Valid perturbations/ simulations
Note: The LSM can evolve too (the rules can change over time)
Define Lagrangian*
Via the Euler-Lagrange Equations\(^\dag\):
Over-damped Gradient flow Equation*
where \(-g^{km}\) is the inverse metric tensor
kinetic energy
potential energy
* Einstein notation used
Goldstein, Herbert, et al. Classical Mechanics. 3rd ed., Pearson, 2002.
\(^\dag\)
Principle of stationary action
Local potential field eqn
Question:
Why has the Mississippi lineage of Influenza C vanished from human circulation recently, while other lineages continue to exist?
H0
H1
M0
The three bovine sequences are not part of these clusters (these are all human ICV HE), but we can still compute the distance of the individual human sequences to each of the three bovine strains. And the cluster they come closest to.. Pretty clearly is the one labelled as M0. The other clusters are labeled H0 and H1.
Distance of bovine sequences to M0 cluster
'C/Miyagi/2/94', 'C/Saitama/2/2000', 'C/Yamagata/3/2000', 'C/Miyagi/7/93', 'C/Miyagi/4/96', 'C/Saitama/1/2004', 'C/Miyagi/7/96', 'C/Greece/1/79', 'C/Yamagata/5/92', 'C/Miyagi/3/93', 'C/Miyagi/4/93', 'C/Kyoto/41/82', 'C/Nara/82', 'C/Hyogo/1/83', 'C/Miyagi/1/94', 'C/Miyagi/6/93', 'C/Miyagi/3/94', 'C/Mississippi/80', 'C/Yamagata/26/2004', 'C/Mississippi/80'
Suggests movement from M0 to H0 to H1
| M0 | -64.251 |
|---|---|
| H0 | -32.586 |
| H1 | -15.964 |
Fitness calculations are based on the Emergenet model, and correspond to the estimate loglikelihood of a strain NOT PERTURBING out of the cluster. Thus the H1 cluster is the most "fit", where the strains have moved over time, and is also the largest in the data. Overlap on the collection times between H0 and H1 implies this is not simply a collection bias effect (the sizes of the clusters). This has resulted in the strain disappearing from humans, as the virus found a more fit niche on the landscape.
8 75 87 97 141 154 165 178 181 183 203 205 211 216 230 252 327 361 506 588
Local potential fields can be computed given the LSM and dynamical considerations, which reveal future evolution
Stable
(captured by local extrema)
Free to move locally towards extrema
Observation: This lineage (Mississippi lineage) is now extinct since 2022/23
stable lineage
Define Lagrangian\(\dag\)
Over-damped Gradient flow Equation\(\dag\)
where \(-g^{km}\) is the inverse metric tensor
kinetic energy
potential energy
Goldstein, Herbert, et al. Classical Mechanics. 3rd ed., Pearson, 2002.
\(^\dag\)
Principle of stationary action
Local potential field eqn
Professor, William Robert Mills Chair in Equine Infectious Diseases
Influenza
HIV
COVID
CCHF
ishanu_ch@uky.edu
By Ishanu Chattopadhyay
Emergenet Discussion
ML | Data Science Biomedical Informatics | Social Science | Assistant Professor