VeRITAS

ishanu chattopadhyay

Asst Professsor

ishanu@uchicago.edu

VeRITAS AI

Detect adversarial responses ("lying") in structured interviews:

A Generative AI To Read Your Mind

https://paraknowledge.ai/veritas

VeRITAS

High Complexity

Low Surprise

Responses that are reflective of symptoms

structured interview

properties of true responses

Existing tools are unreliable, can be defeated with coaching, and have poor performance

Hidden structure of cross-talk between responses to interview items

PTSD diagnostic interview

Q-Net

Three parameters

\kappa

\nu

\mu

Kolmogorov complexity

Surprise

Naive diagnostic Risk

304 VA participants with physician validated PTSD (malingering possibility not considered?)

310 online participants with no mental health diagnosis asked to intentionally malinger

~5% successfully beat the test

~89% of PTSD positive patients pass the test

complexity is high for "truthful" response patterns

"Positive" and "negative" Q-nets inferred from Case and Control cohort can be used as a diagnostic

surprise is low for

"truthful" response patterns

27 forensic psychiatrists recruited to take the challenge

~3.8% successfully beat the test

Identical Correlation Structure in three independent populations

complexity and surprise

score and diagnosis (estimate or ground truth)

\chi

Minimum AUC = $0.95 \pm 0.005$

Cannot be coached, or memorized

Number of possible responses

10^{25}

Minimum Performance (n=624)

Average Time: 3.5 min

No. of Items: 20

AUC > 0.95

PPV > 0.86

NPV > 0.92

At least 83.3% sensitivity at 94% specificity

Beat the test!

paraknowledge.ai/veritas

Substance Abuse Disorder

\kappa

\nu

malingering

SUD

No SUD

Cook County Data

Estimated malingering rate 0.34

For $i = 1, \ldots, n$, let $P_i := P(X_i\,|\,X_j=x_j \text{ for } j \neq i)$ denote the conditional distribution of $X_i$ given the values of the other components of $X$.

Finally, for each $i = 1, \ldots, n$, let $\Phi^P_i$ denote an estimate of the distribution $P_i$.

Then the set $\Phi^P := \{\Phi^P_i\}_{i=1}^n$ is called a Quasinet (Qnet).

X = (X_1, \ldots, X_n) \sim P

\textrm{ where } \operatorname{supp}(X) = \Sigma = \displaystyle\prod_{i=1}^n \Sigma_i \textrm{ with } |\Sigma| < \infty.

Q-nets

\omega^P_x \triangleq \operatorname{Pr}(x \rightarrow x) = \prod_{i=1}^n \Phi^P_i(X_i = x_i\,|\,X_j = x_j, j \neq i)

persistence function

\theta_{P,Q}(x,y) := \mathbf{E} \left[ \mathbb{J}^{\frac{1}{2}} \left(\Phi_i^P(x_{-i}),\Phi_i^Q(y_{-i})\right ) \right]

q-distance function

Jensen-Shannon Divergence

q-distance

"physics" informed, adaptive distance between response vectors

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i^P(x_{-i}) , \Phi_i^Q(y_{-i})\right ) \right )

This distance is "special"

smaller distances imply a quatitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Sanov's Theorem & Pinsker's Inequality

Theorem

\left \vert \ln \frac{Pr(x \rightarrow y ) }{Pr( y \rightarrow y)} \right \vert \leqq \beta \theta(x,y)

\textrm{score: } \ \mu \triangleq \frac{\ln Pr(x \rightarrow x \vert M^+)}{\ln Pr(x \rightarrow x \vert M^0)} = \frac{\ln \omega_x^{M^+}}{\ln \omega_x^{M^0}}

\chi(x) \triangleq \big ( \mu(x) \geqq \mu_0 \big ) \bigwedge \bigg ( \big (\kappa(x) \leqq \kappa_0\big ) \vee \big (\nu(x) \geqq \nu_0\big ) \bigg )

Malingering Condition

\textrm{complexity: } \ \kappa \triangleq - \frac{1}{\vert x \vert} \ln Pr(x \rightarrow x \vert M^+) = - \frac{\ln \omega_x^{M^+}}{\vert x \vert } \\

\textrm{surprise: } \ \nu \triangleq \mathbf{E}_i \left ( 1 - \Phi_i^{M^+} (x_{-i}) \vert_{x_i} \right ) \\

Connection to Kolmogorov Complexity

The algorithmic complexity of a response $x$ conditional on the number of survey items $n$ is at most $\kappa(x) + O(1) $.

Lemma 1

Lemma 2

\nu(x) \leqq 1 - e^{-\kappa(x)}

DoD Applications

VA mental Health
Personality Evaluation
Application in general structured interviews
An Algorithmic Lie Detector with validated and coach-proof performance