Costly Belief Elicitation

Brandon
Williams

Alistair
Wilson

Barcelona Summer Forum
June 2025

An experimental testbed for understanding effort and incentive in belief elicitation

Intro/ Basic Idea

Experimental economists commonly give incentives when we elicit beliefs. Why?
Hope that by providing incentives we collect better, more-accurate beliefs:
- Understanding what is asked requires effort
- Overcome personal motives to distort
- Doing burdensome calculations
Therefore, if belief elicitation is an effortful exercise, how do we best increase the precision of the expressed belief?

Want to understand what incentives produce honest, deliberative beliefs

Motivation

100

Want to understand what incentives produce honest, deliberative beliefs

Motivation

100

Project Roadmap

Create a task that mirrors forming a probabilistic belief that requires effort and responds to incentives
Use experiments on Prolific to understand the relationship between cost, effort, and output
- Vary the task primitives and measure how long it takes to complete (effort), willingness to accept (cost) and accuracy (output)
- Understand how hard this task is to guess (zero effort output)
- Then use the task with different incentives
- Bring this back to a common lab task

Literature

Some examples of recent papers in belief elicitation:

Testing incentive compatibility:
- Danz, Vesterlund, and Wilson, 2022
- Healy and Kagel, 2023
"Close enough" payments:
- Enke, Graeber, Oprea, and Young, 2024
- Ba, Bohren, and Imas, 2024
- Settele, 2022
QSR or BSR:
- Hoffman and Burks, 2020
- Radzevick and Moore, 2010
- Harrison et al., 2022
Others (exact or quartile):
- Huffman, Raymond, and Shvets, 2022
- Bullock, Gerber, Hill, and Huber, 2015
- Prior, Sood, and Khanna, 2015
- Peterson and Iyengar, 2020

Treatments

So far we have the following treatments:

A calibration treatment where we measure how long it takes to complete the problem (Effort); and how accurate you are (Output), then elicit your willingness to accept payment for the task (Cost)
An initial guess treatment where we measure your first instinct (low-effort output)
Incentives treatments where we change the reward structure (horserace across incentives)

Basic Task

Create a task that mirrors forming a probabilistic belief that requires effort

What is the proportion of blue tokens in this urn?

Ans: 56.25%

Create a task that mirrors forming a probabilistic belief that requires effort
Use experiments on Prolific to understand the relationship between cost, effort, and precision
- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept (calibration treatment)
- Understand how hard this problem is to guess (initial guess treatment)
- Vary the reward structure (incentive treatments):
  - BSR with no information
  - BRR with qualitative information
  - BSR with quantitative information
  - A "close enough" incentive

Experimental task

How to get you to exert effort when formulating your belief?

We start by paying $0.50 if you exactly count:

Number of blue tokens
Number of total tokens

Measure accuracy and time taken

Vary the difficulty over 5 tasks

Calibration: Training

Task Variance

Small no gaps

Small gaps

Larger no gaps

Larger gaps

Task Variance

Each problem characterized by a tuple $\left(\theta^\star,N,\delta_\text{Gaps}\right) $:

Core parameter:

$\theta^\star$ the true proportion of blue tokens

Cost Parameters to find $\theta^\star$:

$N$: total number of tokens in urn
$\delta_{\text{Gaps}}$: indicator for gaps

$\left(\theta^\star,N,\delta_\text{Gaps}\right) = (139 , \tfrac{81}{139},1) $

Willingness to Pay: Oprea (2020)

Ten rounds: An easy task, or a Hard one plus amount $\$X$

LHS:

Constant

Difficulty

RHS:

Varying

Difficulty

Always

Pays $.50

If Correct

Choose

$X threshold

Calibration Results ($N=250$)

From models over $\left(N,\delta_\text{Gaps}\times N\right) $

$\text{OLS }(\text{Effort})$

$\text{Tobit}(\text{WTA})$

$\text{Logit }(\text{Correct})$

Output

Effort

WTA

Initial Guesses

Before we move to incentives, we also assess how hard this problem is to guess.

Give them 15 or 45 seconds to form and enter a guess on the proportion
Higher powered rewards:
- $2.50 if within 1%
- $1.00 if within 5%
- $0.50 if within 10%
Ask them about 10 proportions (pay three decisions)

Initial Guess Results ($N=200$)

After 15 second

After 45 second

Task Conclusions

So we have an experimental task that:
- That responds to effort and incentives, where we can scale the difficulty
- We understand the effort required to succeed, and amount we need to pay people
- We can quantify the output at low-effort

Incentives

Use four incentives to ask about beliefs in ten different urns:

BSR-Desc: $1.50 prize with only qualitative information on the details
- Text description of payoff structure (Vespa & Wilson, 2018)
BSR-Inf: as above but with quantitative information
- Full information on the quantitative incentives (Danz et al., 2022)
BSR-NoInf: only know there is a $1.50 prize, no other information on the incentives
A "close enough" incentive
- $1.50 if within 1%; $0.50 if within 5%
- Current use in several papers (e.g. Ba et al., 2024)

Pay three of the ten rounds
$N=100$ each incentive treatment
No time limit

Model

Agents know that there are $N$ tokens in the urn, but are unsure about the blue proportion
Initial prior on the number of blue tokens is \[\text{BetaBinomial}(1,1,N)\]
Can choose to sample $0 \leq n \leq N $ tokens without replacement from the urn (Hyper geometric signal)
- Counting $k$ blue balls and $n-k$ non-blue leads to posterior \[\text{BetaBinomial}(1+k,1+n-k,N-n)\]
With this model we can calculate the expected return from counting $n$ balls from $N$ under a mechanism that incentivizes a report $q$ with payment $\phi(q)$

Incentives to Exert Effort

Results

Output by difficulty

Using our calibration treatments to construct instruments for difficulty:

Effort by difficulty

Using our calibration treatments to construct instruments for difficulty:

Output: Accuracy

-15%

+16%

+56%

+43%

Effort compared to Calibration

Aside: Cognitive Uncertainty

Common self-reported measure, which here we validate against effort and output in BSR-NoInf treatment...

Incentive effects

"Close enough" outperforms BSR on both accuracy and time spent
- The better incentive effects stemming from a richer outcome space
Also cheaper for the experimenter, incentive payments to participants reduced by ~50% over BSR
With a fixed budget, how much more effort could be induced?
(But the gains here are dwarfed by the gains when we explicitly tell them what to do)

Incentive effects

Bayesian Updating

Participants told:

Number of tokens
Proportion blue
Proportion Dots | Blue
Proportion Dots | Red

Are then asked for the proportion of blue balls given a dot:

Calculation identical to standard Bayesian updating experiments
But can also just count...

After gaining experience will ask WTP against standard counting task

Bayesian Updating

Calc or Count

Calc only

Proportion of Blue dotted tokens

Can they do it?

Bayesian Updating

Elicit how much they require to do the task:

Bayesian Updating: WTA

For those who can do the calculation

For those who can't

Avg: $0.82

Avg: $0.99

Conclusions (so far...)

Close enough incentive works best well for inducing effort but requires binomial/cardinal realizations
Varying the incentives:
- No substantive differences across BSR presentations
- Offering incentives and letting them choose effort is dominated by authority of telling them what to do
For a Bayesian updating task:
- 50% or Prolific subjects can perform the calculation if given frequentist version
- Need to be given ~$0.55 to offset costs

Costly Belief Elicitation

An experimental testbed for understanding effort and incentive in belief elicitation

Intro/ Basic Idea

Motivation

Motivation

Project Roadmap

Literature

Treatments

Basic Task

Experimental task

Calibration: Training

Task Variance

Task Variance

Willingness to Pay: Oprea (2020)

Calibration Results (\(N=250\))

Initial Guesses

Initial Guess Results (\(N=200\))

Task Conclusions

Incentives

Model

Incentives to Exert Effort

Results

Output by difficulty

Effort by difficulty

Output: Accuracy

Effort compared to Calibration

Aside: Cognitive Uncertainty

Incentive effects

Incentive effects

Bayesian Updating

Bayesian Updating

Can they do it?

Bayesian Updating

Bayesian Updating: WTA

Conclusions (so far...)

CostlyBelief-BSF

CostlyBelief-BSF

Alistair Wilson

Costly Belief Elicitation

An experimental testbed for understanding effort and incentive in belief elicitation

Intro/ Basic Idea

Motivation

Motivation

Project Roadmap

Literature

Treatments

Basic Task

Experimental task

Calibration: Training

Task Variance

Task Variance

Willingness to Pay: Oprea (2020)

Calibration Results (\(N=250\))

Initial Guesses

Initial Guess Results (\(N=200\))

Task Conclusions

Incentives

Model

Incentives to Exert Effort

Results

Output by difficulty

Effort by difficulty

Output: Accuracy

Effort compared to Calibration

Aside: Cognitive Uncertainty

Incentive effects

Incentive effects

Bayesian Updating

Bayesian Updating

Can they do it?

Bayesian Updating

Bayesian Updating: WTA

Conclusions (so far...)

CostlyBelief-BSF

More from Alistair Wilson