Costly Belief Elicitation

Brandon
Williams
Alistair
Wilson
Barcelona Summer Forum
June 2025

An experimental testbed for understanding effort and incentive in belief elicitation

Intro/ Basic Idea

  • Experimental economists commonly give incentives when we elicit beliefs. Why?
  • Hope that by providing incentives we collect better, more-accurate beliefs:
    • Understanding what is asked requires effort
    • Overcome personal motives to distort
    • Doing burdensome calculations
  • Therefore, if belief elicitation is an effortful exercise, how do we best increase the precision of the expressed belief?

Want to understand what incentives produce honest, deliberative beliefs

 

Motivation

0

100

20

80

Want to understand what incentives produce honest, deliberative beliefs

 

Motivation

0

100

20

80

Project Roadmap

  • Create a task that mirrors forming a probabilistic belief that requires effort and responds to incentives
  • Use experiments on Prolific to understand the relationship between cost, effort, and output
    • Vary the task primitives and measure how long it takes to complete (effort), willingness to accept (cost) and accuracy (output)
    • Understand how hard this task is to guess (zero effort output)
    • Then use the task with different incentives
    • Bring this back to a common lab task

Literature

Some examples of recent papers in belief elicitation:

  • Testing incentive compatibility:
    • Danz, Vesterlund, and Wilson, 2022
    • Healy and Kagel, 2023
  • "Close enough" payments:
    • Enke, Graeber, Oprea, and Young, 2024
    • Ba, Bohren, and Imas, 2024
    • Settele, 2022
  • QSR or BSR:
    • Hoffman and Burks, 2020
    • Radzevick and Moore, 2010
    • Harrison et al., 2022
  • Others (exact or quartile):
    • Huffman, Raymond, and Shvets, 2022
    • Bullock, Gerber, Hill, and Huber, 2015
    • Prior, Sood, and Khanna, 2015
    • Peterson and Iyengar, 2020

Treatments

So far we have the following treatments:

  1. A calibration treatment where we measure how long it takes to complete the problem (Effort); and how accurate you are (Output), then elicit your willingness to accept payment for the task (Cost)
  2. An initial guess treatment where we measure your first instinct (low-effort output)
  3.  Incentives treatments  where we change the reward structure (horserace across incentives)

Basic Task

  • Create a task that mirrors forming a probabilistic belief that requires effort
  • What is the proportion of blue tokens in this urn?

Ans: 56.25%

  • Create a task that mirrors forming a probabilistic belief that requires effort
  • Use experiments on Prolific to understand the relationship between cost, effort, and precision
    • Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept (calibration treatment)
    • Understand how hard this problem is to guess (initial guess treatment)
    • Vary the reward structure (incentive treatments):
      • BSR with no information
      • BRR with qualitative information
      • BSR with quantitative information
      • A "close enough" incentive

Experimental task

How to get you to exert effort when formulating your belief?

We start by paying $0.50 if you exactly count:

  1. Number of blue tokens
  2. Number of total tokens

Measure accuracy and time taken 

Vary the difficulty over 5 tasks

Calibration: Training

Task Variance

Small no gaps

Small gaps

Larger no gaps

Larger gaps

Task Variance

Each problem characterized by a tuple \(\left(\theta^\star,N,\delta_\text{Gaps}\right) \):

Core parameter:

  • \(\theta^\star\) the true proportion of blue tokens

Cost Parameters to find \(\theta^\star\):

  • \(N\): total number of tokens in urn
  • \(\delta_{\text{Gaps}}\): indicator for gaps

\(\left(\theta^\star,N,\delta_\text{Gaps}\right) = (139 , \tfrac{81}{139},1) \)

Willingness to Pay: Oprea (2020)

Ten rounds: An easy task, or a Hard one plus amount \(\$X\)

LHS:

Constant

Difficulty

RHS:

Varying

Difficulty

Always

Pays $.50

If Correct

$X 

If Correct

Choose

$X threshold

Calibration Results (\(N=250\))

From models over \(\left(N,\delta_\text{Gaps}\times N\right)  \)

\(\text{OLS }(\text{Effort})\)

\(\text{Tobit}(\text{WTA})\)

\(\text{Logit }(\text{Correct})\)

Output

Effort

WTA

Initial Guesses

Before we move to incentives, we also assess how hard this problem is to guess.

  • Give them 15 or 45 seconds to form and enter a guess on the proportion
  • Higher powered rewards:
    • $2.50 if within 1%
    • $1.00 if within 5%
    • $0.50 if within 10%
  • Ask them about 10 proportions (pay three decisions)

Initial Guess Results (\(N=200\))

After 15 second

After 45 second

Task Conclusions

  • So we have an experimental task that:
    • That responds to effort and incentives, where we can scale the difficulty
    • We understand the effort required to succeed, and amount we need to pay people
    • We can quantify the output at low-effort

Incentives

Use four incentives to ask about beliefs in ten different urns:

  1. BSR-Desc:  $1.50  prize with only qualitative information on the details
    • Text description of payoff structure (Vespa & Wilson, 2018)
  2. BSR-Inf: as above but with quantitative information
    • Full information on the quantitative incentives (Danz et al., 2022)
  3. BSR-NoInf: only know there is a $1.50 prize, no other information on the incentives
  4. A "close enough" incentive
    • $1.50 if within 1%; $0.50 if within 5%
    • Current use in several papers (e.g. Ba et al., 2024)
  • Pay three of the ten rounds
  • \(N=100\) each incentive treatment
  • No time limit

Model

  • Agents know that there are \(N\) tokens in the urn, but are unsure about the blue proportion
  • Initial prior on the number of blue tokens is \[\text{BetaBinomial}(1,1,N)\]
  • Can choose to sample \(0 \leq n \leq N \) tokens without replacement from the urn (Hyper geometric signal)
    • Counting \(k\) blue balls and \(n-k\) non-blue leads to posterior \[\text{BetaBinomial}(1+k,1+n-k,N-n)\]
  • With this model we can calculate the expected return from counting \(n\) balls from \(N\) under a mechanism that incentivizes a report \(q\) with payment \(\phi(q)\)

Incentives to Exert Effort

Results

Output by difficulty

Using our calibration treatments to construct instruments for difficulty:

Effort by difficulty

Using our calibration treatments to construct instruments for difficulty:

Output: Accuracy

-15%

+16%

+56%

+43%

Effort compared to Calibration

Aside: Cognitive Uncertainty

Common self-reported measure, which here we validate against effort and output in BSR-NoInf treatment...

Incentive effects

  • "Close enough" outperforms BSR on both accuracy and time spent
    • The better incentive effects stemming from a richer outcome space
  • Also cheaper for the experimenter, incentive payments to participants reduced by ~50% over BSR
  • With a fixed budget, how much more effort could be induced?
  • (But the gains here are dwarfed by the gains when we explicitly tell them what to do)

Incentive effects

Bayesian Updating

Participants told:

  • Number of tokens
  • Proportion blue
  • Proportion Dots | Blue
  • Proportion Dots | Red

Are then asked for the proportion of blue balls given a dot:

  • Calculation identical to standard Bayesian updating experiments
  • But can also just count...

 

After gaining experience will ask WTP against standard counting task

Bayesian Updating

Calc or Count

Calc only

Proportion of Blue dotted tokens

Can they do it?

Bayesian Updating

Elicit how much they require to do the task:

Bayesian Updating: WTA

For those who can do the calculation

For those who can't

Avg: $0.82

Avg: $0.99

Conclusions (so far...)

  • Close enough incentive works best well for inducing effort but requires binomial/cardinal realizations
  • Varying the incentives:
    • No substantive differences across BSR presentations
    • Offering incentives and letting them choose effort is dominated by authority of telling them what to do
  • For a Bayesian updating task:
    • 50% or Prolific subjects can perform the calculation if given frequentist version
    • Need to be given ~$0.55 to offset costs

CostlyBelief-BSF

By Alistair Wilson

CostlyBelief-BSF

  • 36