Costly belief Elicitation

Brandon Williams

Alistair Wilson

NHH: Incorporating Psychology into Economic Models

August 12, 2025

Experimental economists often give incentives to eliciting beliefs. Why?
We hope providing incentives leads to collecting better, more accurate beliefs:
- Understanding what is asked requires effort
- Overcome personal motives to distort
- Doing burdensome calculations
Therefore, if belief elicitation is an effortful exercise, how do we best increase the precision of the expressed belief?

Motivation

We want to understand what incentives produce honest, deliberative beliefs

Motivation

0

100

20

80

We want to understand what incentives produce honest, deliberative beliefs

Motivation

0

100

20

80

We want to understand what incentives produce honest, deliberative beliefs
- We need to understand the psychological costs within a testing environment
- The scale and structure of the marginal costs becomes important
Drawing a distinction between revealing a true belief and the effort required to generate a deliberative belief

Motivation

Create a task that mirrors forming a probabilistic belief that requires effort and responds to incentives
Use experiments on Prolific to understand the relationship between cost, effort, and precision
- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept
- Understand how hard this task is to guess (zero effort)
- Test different incentive structures
Apply back to traditional lab experiment

Roadmap

Create a task that mirrors forming a probabilistic belief that requires effort

Task

Create a task that mirrors forming a probabilistic belief that requires effort

Task

What is the proportion of blue tokens to total tokens in this urn?

81 blue

63 non-blue

56.25% true amount

=

- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept

Calibration

- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept

Calibration

How to get you to exert effort when formulating your belief?

We start by paying $0.50 if you exactly count:

Number of blue tokens
Number of total tokens

Measure accuracy and time taken as a proxy for effort

Vary the difficulty over 5 tasks

- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept

Calibration

Vary the difficulty over 5 tasks

Small with gaps

- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept

Calibration

Vary the difficulty over 5 tasks

Larger with no gaps

- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept

Calibration

Vary the difficulty over 5 tasks

Larger with gaps

- Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept

Calibration

Vary the difficulty over 5 tasks

Largest with no gaps

- Ten rounds with an easy task or the hard task base pay plus $X (Oprea, 2020)

Calibration: WTP

LHS:

Constant

Difficulty

RHS:

Varying

Difficulty

Always

Pays $.50

If Correct

$X

If Correct

Choose

$X

Calibration: Results (n=250)

Model over (tokens, gaps)

Calibration: Results (n=250)

Model over (tokens, gaps)

Calibration: Results (n=250)

Model over (tokens, gaps)

Calibration: Results (n=250)

Model over (tokens, gaps)

Initial Guesses

- Understand how hard this problem is to guess (initial guess treatment)
- Participants have 15 or 45 seconds to form and enter a guess on the proportion
- High powered rewards:
  - $2.50 if within 1%
  - $1.00 if within 5%
  - $0.50 if within 10%

Initial Guesses: Results (N=200)

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

4.8%

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

95.1%

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

15 Seconds

Initial Guesses: Results (N=200)

Within 10%

Within 5%

Within 1%

Exact

45 Seconds

Taking Stock

So we have an experimental task that:

That responds to effort and incentives, where we can scale the difficulty
We understand the effort required to succeed, and amount we need to pay people
We can quantify the output at low-effort

Incentives

- Vary the reward structure
  - BSR-Desc: $1.50 prize with only text description of payoff structure (Vespa & Wilson, 2018)
  - BSR-Inf: as above but with quantitative information
  - BSR-NoInf: only know there is a $1.50 prize, no other information on the incentives
  - A "close enough" incentive: $1.50 if within 1%; $0.50 if within 5%

BSR Qual.

BSR Quant.

Incentives

How should effort affect expected reward?

Incentives

Results

Incentives: Accuracy

Use our calibration to construct instruments for difficulty

Incentives: Accuracy

Incentives: Accuracy

Incentives: Accuracy

Incentives: Accuracy

Incentives: Accuracy

Incentives: Accuracy

Incentives: Accuracy

Incentives: Accuracy

+56%

+47%

Incentives: Accuracy

+56%

+47%

-15%

+16%

Incentives: Research costs

"Close enough" outperformed BSR on both accuracy and time spent
Also cheaper in payments to participants (~50%) over BSR
With a fixed budget, how much more effort could be induced?
Interesting caveat: the gains here are dwarfed by the gains when we explicitly tell them what to do

Incentives: Research costs

Incentives: Research costs

Related task

Consider a task similar to what we might commonly do in lab

What is the proportion of tokens with a dot that are blue to total tokens with a dot?

Calculation is the same as Bayes updating experiments

...except they could count?

Tell them:

Total number of tokens
Proportion of blue tokens
Proportion dots | red
Proportion dots | blue

Bayesian updating

Bayesian updating

Bayesian updating

Can they do the Bayesian task?

Bayesian updating

Bayesian updating

Bayesian updating

Bayesian updating

Bayesian updating

Conclusions and future work

Close enough incentive works well for inducing effort but requires binomial/cardinal realizations
Varying the incentives:
- No substantive differences across BSR presentations
- Offering incentives and letting them choose effort is dominated by authority of telling them what to do
For a Bayesian updating task:
- 50% or Prolific subjects can perform the calculation if given frequentist version
- Need to be given ~$0.55 to offset costs

Can we apply this calibration to subjective beliefs?

Thank you!

Alistair Wilson

Brandon Williams

alistair@pitt.edu

brandon.williams@pitt.edu

Some examples of recent papers in belief elicitation:
- Testing incentive compatibility:
  - Danz, Vesterlund, and Wilson, 2022
  - Healy and Kagel, 2023
- "Close enough" payments:
  - Enke, Graeber, Oprea, and Young, 2024
  - Ba, Bohren, and Imas, 2024
  - Settele, 2022
- QSR or BSR:
  - Hoffman and Burks, 2020
  - Radzevick and Moore, 2010
  - Harrison et al., 2022
- Others (exact or quartile):
  - Huffman, Raymond, and Shvets, 2022
  - Bullock, Gerber, Hill, and Huber, 2015
  - Prior, Sood, and Khanna, 2015
  - Peterson and Iyengar, 2020

Literature

Incentives

- Vary the reward structure
  - BSR with only qualitative information

Back

Incentives

- Vary the reward structure
  - BSR with quantitative information

Back

Incentives

- Vary the reward structure
  - BSR with only qualitative information

Back

Incentives

- Vary the reward structure
  - BSR with quantitative information

Back

Incentives

- Vary the reward structure
  - BSR with only qualitative information
    - Text description of payoff structure (Vespa & Wilson, 2018)
  - BSR with quantitative information
    - Full information on the quantitative incentives (Danz et al., 2022)
  - A "close enough" incentive
    - $1.50 if within 1%; $0.50 if within 5%
    - Current use in several papers (e.g. Ba et al., 2024)

Incentives: accuracy

Back