Costly Belief Elicitation

Brandon
Williams
Alistair
Wilson
ASSA Beliefs Session
January 2025

An experimental testbed for understanding effort and incentive in belief elicitation

Intro/ Basic Idea

  • Experimental economists commonly give incentives when we elicit beliefs. Why?
  • Hope that by providing incentives we collect better, more-accurate beliefs:
    • Understanding what is asked requires effort
    • Overcome personal motives to distort
    • Doing burdensome calculations
  • Therefore, if belief elicitation is an effortful exercise, how do we best increase the precision of the expressed belief?

Want to understand what incentives produce honest, deliberative beliefs

 

Motivation

0

100

20

80

Want to understand what incentives produce honest, deliberative beliefs

 

Motivation

0

100

20

80

Project Roadmap

  • Create a task that mirrors forming a probabilistic belief that requires effort
  • Use experiments on Prolific to understand the relationship between cost, effort, and output
    • Vary the task primitives and measure how long it takes to complete (effort), willingness to accept (cost) and accuracy (output)
    • Understand how hard this task is to guess (zero effort output)
    • Then use the task with different incentives
  • To do: Measure the psychological costs for forming objective Bayesian posteriors

Literature

Some examples of recent papers in belief elicitation:

  • Testing incentive compatibility:
    • Danz, Vesterlund, and Wilson, 2022
    • Healy and Kagel, 2023
  • "Close enough" payments:
    • Enke, Graeber, Oprea, and Young, 2024
    • Ba, Bohren, and Imas, 2024
    • Settele, 2022
  • QSR or BSR:
    • Hoffman and Burks, 2020
    • Radzevick and Moore, 2010
    • Harrison et al., 2022
  • Others (exact or quartile):
    • Huffman, Raymond, and Shvets, 2022
    • Bullock, Gerber, Hill, and Huber, 2015
    • Prior, Sood, and Khanna, 2015
    • Peterson and Iyengar, 2020

Treatments

So far we have the following treatments:

  1. A calibration treatment where we measure how long it takes to complete the problem (Effort); and how accurate you are (Output), then elicit your willingness to accept payment for the task (Cost)
  2. An initial guess treatment where we measure your first instinct (low-effort output)
  3.  Incentives treatments  where we change the reward structure (horserace across incentives)

Basic Task

  • Create a task that mirrors forming a probabilistic belief that requires effort
  • What is the proportion of blue tokens in this urn?

Ans: 56.25%

  • Create a task that mirrors forming a probabilistic belief that requires effort
  • Use experiments on Prolific to understand the relationship between cost, effort, and precision
    • Vary the cost for precision and calibrate on how long it takes to complete and willingness to accept (calibration treatment)
    • Understand how hard this problem is to guess (initial guess treatment)
    • Vary the reward structure:
      • BSR with no information
      • BRR with qualitative information
      • BSR with quantitative information
      • A "close enough" incentive

Experimental task

How to get you to exert effort when formulating your belief?

We start by paying $0.50 if you exactly count:

  1. Number of blue tokens
  2. Number of total tokens

Measure accuracy and time taken 

Vary the difficulty over 5 tasks

Calibration: Training

Task Variance

Small no gaps

Small gaps

Larger no gaps

Larger gaps

Task Variance

Each problem characterized by a tuple \(\left(N,\theta^\star,\delta_\text{Gaps}\right) \):

  • \(N\): total number of tokens in urn
  • \(\theta^\star\) the true proportion of blue tokens
  • \(\delta_{\text{Gaps}}\): indicator for gaps

\(\left(N,\theta^\star,\delta_\text{Gaps}\right) = (139 , \tfrac{81}{139},1) \)

Willingness to Pay: Oprea (2020)

Ten rounds: An easy task, or a Hard one plus amount \(\$X\)

LHS:

Constant

Difficulty

RHS:

Varying

Difficulty

Always

Pays $.50

If Correct

$X 

If Correct

Choose

$X threshold

Calibration Results (\(N=250\))

From models over \(\left(N,\log(N),\delta_\text{Gaps}\right)  \)

Effort (time spent)

Cost (WTA)

Output (within 1%)

\(\text{OLS }\log(\text{Effort})\)

\(\text{Tobit}(\text{WTA})\)

\(\text{Logit}(\text{Within 1\%})\)

Initial Guesses

Need to get a sense for how hard this problem is to guess.

  • Give them 15 or 45 seconds to form and enter a guess on the proportion
  • Higher powered rewards:
    • $2.50 if within 1%
    • $1.00 if within 5%
    • $0.50 if within 10%
  • Ask them about 10 proportions (pay three decisions)

Initial Guess Results (\(N=200\))

After 15 second

After 45 second

Task Conclusions

  • So we have an experimental task that:
    • We can scale the difficulty
    • Where we understand the broad costs and efforts required to succeed
    • Where we understand the output level at low-effort

Incentives

Use four incentives to ask about beliefs in ten different urns:

  1. BSR-Desc:  $1.50  prizewith only qualitative information on the details
    • Text description of payoff structure (Vespa & Wilson, 2018)
  2. BSR-Inf: as above but with quantitative information
    • Full information on the quantitative incentives (Danz et al., 2022)
  3. BSR-NoInf: only know there is a $1.50 prize, no other information on the incentives
  4. A "close enough" incentive
    • $1.50 if within 1%; $0.50 if within 5%
    • Current use in several papers (e.g. Ba et al., 2024)
  • Pay three of the rounds
  • \(N=100\) each incentive treatment
  • No time limit

Model

  • Agents know that there are \(N\) tokens in the urn, but are unsure about the blue proportion
  • Initial prior on the number of blue tokens is \[\text{BetaBinomial}(1,1,N)\]
  • Can choose to sample \(0 \leq n \leq N \) tokens without replacement from the urn (Hyper geometric signal)
    • Counting \(k\) blue balls and \(n-k\) non-blue leads to posterior \[\text{BetaBinomial}(1+k,1+n-k,N-n)\]
  • With this model we can calculate the expected return from counting \(n\) balls from \(N\) under a mechanism that incentivizes a report \(q\) with payment \(\phi(q)\)

Incentives to Exert Effort

Results

Output by difficulty

Using our calibration treatments to construct instruments for difficulty:

Effort by difficulty

Using our calibration treatments to construct instruments for difficulty:

Output: Accuracy

-15%

+16%

+56%

+43%

Effort compared to Calibration

Aside: Cognitive Uncertainty

Common self-reported measure, which here we validate against effort and output in BSR-NoInf treatment...

Incentive effects

  • "Close enough" outperforms BSR on both accuracy and time spent
    • The better incentive effects stemming from a richer outcome space
  • Also cheaper for the experimenter, payments to participants reduced by ~50% over BSR
  • With a fixed budget, how much more effort could be induced?
  • (But the gains here are dwarfed by the gains when we explicitly tell them what to do)

Incentive effects

Conclusions (so far...)

  • Have a well-behaved task, which scales in cost/effort/output
  • Close enough incentive works best well for inducing effort
  • Varying the incentives:
    • For BSR, no substantive differences 
    • Close enough produces substantial increase in effort/output; mirrors the  theoretical predictions
    • Still, offering incentives and letting them choose effort is swamped by authority of telling them what to do

To do:

  • Examine and measure the effective costs for Bayesian updating elicitations in experiments

Bayesian Updating

Participants told:

  • Number of tokens
  • Proportion blue
  • Proportion Dots | Blue
  • Proportion Dots | Red

Are then asked for the proportion of blue balls given a dot:

  • Calculation identical to standard Bayesian updating experiments
  • But can also just count...

 

After gaining experience will ask WTP against standard counting task