Ishanu Chattopadhyay PRO
ML | Data Science Biomedical Informatics | Social Science | Assistant Professor
Ishanu Chattopadhyay, PhD
Assistant Professor of Biomedical Informatics & Computer Science
University of Kentucky
| reliten | gunlaw | abany | --- | grass | |
|---|---|---|---|---|---|
| Person 1 | |||||
| Person 2 | |||||
| --- | |||||
| Person m |
observables
samples
Distributions over alphabet \(\Sigma^i\)
Individual Predictor (CIT)
cross-talk
Tension between predicted and observed distribution drives change
Example
GSS topic: There should be more gun-control
\(\psi^i\)
| strongly agree | agree | neutral | disagree | strongly disagree |
\(\phi\) estimates \(\psi\)
Examples: GSS, ANES, WVS, ESS, Eurobarometer, Afrobarometer, Asian Barometer etc
group
individual
estimate is always a non-empty non-degenerate distribution
missing observation
where \(D_{JS}(P\vert \vert Q)\) is the Jensen-Shannon divergence.
This bound connects ``closeness'' of samples to the odds of perturbing from one to the other, bridging geometry to dynamics
(Sanov's Theorem, Pinkser's Inequality)
\(\psi\)
\(\psi'\)
\(\theta\)
"spatial average": average of all plausible worldviews or states
* Sizemore, Nicholas, Kaitlyn Oliphant, Ruolin Zheng, Camilia R. Martin, Erika C. Claud, and Ishanu Chattopadhyay. "A digital twin of the infant microbiome to predict neurodevelopmental deficits." Science Advances 10, no. 15 (2024): eadj0400. https://www.science.org/doi/full/10.1126/sciadv.adj0400
persistence probability
Central to Model Drift Quantification
Start with opinion vector with all entries missing
This is a standard Physics construct, quantifying curvature of the underlying latent geometry
Easily computable in LSM framework!
Apply \(\phi^i\)
Random variable quantifying dispersion around the spatial average of worlviews
const. scaling as \(N^2\)
Sample predicted distributions
perturbed state within \(\epsilon\) of \(\psi\)
| Variable | Masked | Reconstructed |
|---|---|---|
| spkcom | allowed | allowed |
| colcom | not fired | not fired |
| spkmil | allowed | allowed |
| colmil | allowed | not allowed |
| libmil | not remove | not remove |
| libhomo | not remove | not remove |
| reliten | strong | no religion |
| pray | once a day | once a day |
| bible | inspired word | word of god |
| abhlth | yes | yes |
| abpoor | no | no |
| pillok | agree | agree |
| intmil | very interested | very interested |
| abpoorw | always wrong | not wrong at all |
| godchnge | believe now, always have | believe now, always have |
| prayfreq | several times a week | several times a week |
| religcon | strong disagree | disagree |
| religint | disagree | disagree |
| Variable | Masked | Reconstructed |
|---|---|---|
| spkcom | allowed | allowed |
| colcom | not fired | not fired |
| libmil | not remove | not remove |
| libhomo | not remove | not remove |
| gunlaw | favor | favor |
| reliten | no religion | no religion |
| prayer | approve | approve |
| bible | book of fables | inspired word |
| abnomore | yes | yes |
| abhlth | yes | yes |
| abpoor | yes | yes |
| abany | yes | yes |
| owngun | no | no |
| intmil | moderately interested | moderately interested |
| abpoorw | not wrong at all | not wrong at all |
| godchnge | believe now, didn't used to | believe now, always have |
| prayfreq | several times a week | several times a week |
2018 GSS individual samples
Definition
Sample neighborhood to impute missing data
}
2018 GSS out-of-sample reconstruction
post-reconstruction error ratio (%)
LSM sampling: sampling the \(\epsilon\)-neighborhood of a state or worldview allows reconstruction of censored opinions
examples
Predictive ability of LSM quantified as ability to reconstruct censored out-of-sample opinions**
Null state (all missing observations)
Valid perturbations/ simulations
LSM sampling allows simulating opinion perturbations
Both Individuals and groups maybe modeled as digital twins\(\dag\)
2018 GSS
Polar separation over time
2016 Presidential Election Vote Prediction
2004
| abany | no | yes |
| abdefctw | always wrong | not wrong at all |
| abdefect | no | yes |
| abhlth | no | yes |
| abnomore | no | yes |
| abpoor | no | yes |
| abpoorw | always wrong | not wrong at all |
| abrape | no | yes |
| absingle | no | yes |
| bible | inspired word | book of fables |
| colcom | fired | not fired |
| colmil | not fired | not allowed |
| comfort | strongly agree | strongly disagree |
| conlabor | hardly any | a great deal |
| godchnge | believe now, always have | don't believe now, never have |
| grass | not legal | legal |
| gunlaw | oppose | favor |
| intmil | very interested | not at all interested |
| libcom | remove | not remove |
| libmil | not remove | remove |
| maboygrl | true | false |
| owngun | yes | no |
| pillok | agree | strongly agree |
| pilloky | strongly disagree | strongly agree |
| polabuse | no | yes |
| pray | several times a day | never |
| prayer | disapprove | approve |
| prayfreq | several times a day | never |
| religcon | strongly disagree | strongly agree |
| religint | strongly disagree | strongly agree |
| reliten | strong | no religion |
| rowngun | yes | no |
| shotgun | yes | no |
| spkcom | not allowed | allowed |
| spkmil | allowed | not allowed |
| taxrich | about right | much too low |
conservative pole
liberal pole
Clustering LSM distance \(\theta(x,y)\) between out-of-sample individuals
conservative
liberal
poles:
partial states aligning with extreme opposing worldviews
Predict 2016 votes using ideology index
Emergent global structure
Define Lagrangian*
Via the Euler-Lagrange Equations\(^\dag\):
Over-damped Gradient flow Equation*
where \(-g^{km}\) is the inverse metric tensor
kinetic energy
state collapse
strongly agree
agree
neutral
disagree
strongly disagree
strongly agree
agree
neutral
disagree
strongly disagree
\(X_i\)
potential energy
* Einstein notation used
Goldstein, Herbert, et al. Classical Mechanics. 3rd ed., Pearson, 2002.
\(^\dag\)
Principle of stationary action
Local potential field eqn
Stable
(captured by local extrema)
Free to move locally towards extrema
Why propaganda works so well
* “Exposure to opposing views on social media can increase political polarization”
by Christopher A. Bail et al., published in PNAS in September 2018 (Vol. 115, No. 37, pp. 9216–9221; DOI: 10.1073/pnas.1804840115)
GSS 2018 individuals and neighborhoods
Influenza C : strains and their neighborhoods
Even random perturbations will tend to move individuals towards local extrema increasing polarization
*
Observation: This lineage (Mississippi lineage) is now extinct since 2022/23
stable lineage
The LSM tells the latent opinion "space-time" how to curve, the curved "space-time" tells opinions how to change.
Local potential fields can be computed given the LSM and dynamical considerations, which reveal future evolution
The No-cheating Thorem: Generative models cannot cheat on complexity
Kolmogorov Complexity
Optimal Generative Model
compressed data representation
compressed model representation
Theorem
Conservation Law arising from the continuous symmetry of typicality*
Saturation relation:
Data Sufficiency Statistic \(\mu_0\)
We need LSM-sampling to calculate this
*Noether's Theorem
For every continuous symmetry of a physical system, there exists a corresponding conserved quantity
How much more data do we need?
Data saturation
Data deficient
Needed
Current
Empirical Validation
By Ishanu Chattopadhyay
DARPA-EA-25-02-05-MAGICS-PA-025
ML | Data Science Biomedical Informatics | Social Science | Assistant Professor