Detect adversarial responses ("lying")  in structured interviews:

A Generative AI To Read Your Mind


High Complexity

Low Surprise

Responses that are reflective of symptoms

structured interview

properties of true responses




Existing tools are unreliable, can be defeated with coaching, and have poor performance

Hidden structure of cross-talk between responses to interview items

PTSD diagnostic interview


Three parameters


Kolmogorov complexity


Naive diagnostic Risk

304 VA participants with physician validated PTSD (malingering possibility not considered?)

310 online participants with no mental health diagnosis asked to intentionally malinger

~5% successfully beat the test

~89% of PTSD positive patients pass the test

complexity is high for "truthful" response patterns

"Positive" and "negative" Q-nets inferred from Case and Control cohort can be used as a diagnostic

surprise is low for

"truthful" response patterns

27 forensic psychiatrists recruited to take the challenge

~3.8% successfully beat the test

Identical Correlation Structure in three independent populations

complexity and surprise

score and diagnosis (estimate or ground truth)


Minimum AUC = \(0.95 \pm 0.005\)

Cannot be coached, or memorized

Number of possible responses


Minimum Performance (n=624)

Average Time: 3.5 min

No. of Items: 20

AUC > 0.95

PPV > 0.86

NPV > 0.92

At least 83.3% sensitivity at 94% specificity

Beat the test!

Substance Abuse Disorder





Cook County Data

Estimated malingering rate 0.34

For \(i = 1, \ldots, n\), let \(P_i := P(X_i\,|\,X_j=x_j \text{ for } j \neq i)\) denote the conditional distribution of \(X_i\) given the values of the other components of \(X\).  


Finally, for each \(i = 1, \ldots, n\), let \(\Phi^P_i\) denote an estimate of the distribution \(P_i\).  


Then the set \(\Phi^P := \{\Phi^P_i\}_{i=1}^n\) is called a Quasinet (Qnet).

X = (X_1, \ldots, X_n) \sim P
\textrm{ where } \operatorname{supp}(X) = \Sigma = \displaystyle\prod_{i=1}^n \Sigma_i \textrm{ with } |\Sigma| < \infty.


\omega^P_x \triangleq \operatorname{Pr}(x \rightarrow x) = \prod_{i=1}^n \Phi^P_i(X_i = x_i\,|\,X_j = x_j, j \neq i)

persistence function

\theta_{P,Q}(x,y) := \mathbf{E} \left[ \mathbb{J}^{\frac{1}{2}} \left(\Phi_i^P(x_{-i}),\Phi_i^Q(y_{-i})\right ) \right]

q-distance function

Jensen-Shannon Divergence


 "physics" informed, adaptive distance between response vectors

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i^P(x_{-i}) , \Phi_i^Q(y_{-i})\right ) \right )

This distance is "special"

smaller distances imply a quatitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Sanov's Theorem & Pinsker's Inequality


\left \vert \ln \frac{Pr(x \rightarrow y ) }{Pr( y \rightarrow y)} \right \vert \leqq \beta \theta(x,y)
\textrm{score: } \ \mu \triangleq \frac{\ln Pr(x \rightarrow x \vert M^+)}{\ln Pr(x \rightarrow x \vert M^0)} = \frac{\ln \omega_x^{M^+}}{\ln \omega_x^{M^0}}
\chi(x) \triangleq \big ( \mu(x) \geqq \mu_0 \big ) \bigwedge \bigg ( \big (\kappa(x) \leqq \kappa_0\big ) \vee \big (\nu(x) \geqq \nu_0\big ) \bigg )

Malingering Condition

\textrm{complexity: } \ \kappa \triangleq - \frac{1}{\vert x \vert} \ln Pr(x \rightarrow x \vert M^+) = - \frac{\ln \omega_x^{M^+}}{\vert x \vert } \\
\textrm{surprise: } \ \nu \triangleq \mathbf{E}_i \left ( 1 - \Phi_i^{M^+} (x_{-i}) \vert_{x_i} \right ) \\

Connection to Kolmogorov Complexity

The algorithmic complexity of a response \(x\) conditional on the number of survey items \(n\) is at most  \(\kappa(x) + O(1) \).

Lemma 1

Lemma 2

\nu(x) \leqq 1 - e^{-\kappa(x)}

DoD Applications

  • VA mental Health 
  • Personality Evaluation
  • Application in general structured interviews
  • An Algorithmic Lie Detector with validated and coach-proof performance