Preferences over Experimental Populations

Neeraja Gupta

Luca Rigotti

Alistair Wilson

Preferences over Experimental Populations

Neeraja Gupta

Luca Rigotti

Alistair Wilson

\( \text{\textcircled{r}} \)

\( \text{\textcircled{r}} \)

\( \text{\textcircled{r}} \)

Understanding an Economic Phenomena

Resources as an Academic

Understanding an Economic Phenomena

Resources as an Academic

Budget Constraints:

  • While we may want to separate resources from our ability to answer economic questions, for experimental studies, these will be related
  • Even if you have a huge research budget, you still might want to use that budget more efficiently

Understanding an Economic Phenomena

Resources as an Academic

Choices over Population:

  • Given options over where we collect experimental data, want to get the greatest bang for our buck
  • Where other methodological studies have examined whether an effect replicates across populations, we instead want to understand which population maximizes our inferential power

Basic Idea

  • You have a budget \(\$Y\) budget  for an experiment
  • Trying to uncover qualitative difference between:
    • Treatment (A)
    • Control (B)
  • Allow the populations to differ over:
    • Costs per observation, \(c\)
    • Attenuation in effect size, \(\gamma\)

Use a T-stat to formulate a preference:

Observation Costs

Populations differ in the cost per observation \(\$c\):

  • Physical Lab subjects tend to have a high \(\$c\)
  • Online platforms like MTurk tend to have a low \(\$c\)

Costs are set in some larger equilibrium that we take as given; calibrate the incentive levels to be ecologically valid

Observation Costs

  • Given budget \(\$Y\) for the project, different observation costs \(\$c\) will translate into different sample sizes:
    • \( N(c ;Y)= \frac{Y}{c}\)
  • All else equal we prefer lower observation cost, as it yields a bigger sample

Effect Attenuation

  • Suppose treatments \(A\) and \(B\) exhibit an average response difference in baseline population \(P_0\) of
\begin{array}{rcl} \Delta^0_{AB}&=&\text{Treatment Avg}-\text{Control Avg} \\ &=&\mu^0_A-\mu^0_B\end{array}
  • Changes to the size of treatment effect as quality measure:
    • Noise (parameter \(\gamma_\epsilon\) )
    • Reduced elasticity of response (parameter \(\gamma_\Delta\) )

Effect Attenuation

  • Data from an alternative population \(\tilde{P}\) has a proportion or participants \(\gamma_\epsilon\) that act randomly (independent of condition) with mean \(\epsilon\):
\begin{array}{rcl} \Delta^0_{AB}&=&\text{Treatment Avg}-\text{Control Avg} \\ &=&\mu^0_A-\mu^0_B\end{array}
\begin{array}{ccc} \tilde{\Delta}_{AB} &=& \tilde{\mu}_A -\tilde{\mu}_B\\ &=& (\gamma_\epsilon\cdot \epsilon + (1-\gamma_\epsilon)\cdot \mu^0_A)-(\gamma_\epsilon\cdot \epsilon + (1-\gamma_\epsilon)\cdot \mu^0_B) \\ &=& (1-\gamma_\epsilon)\cdot\Delta^0_{AB} \end{array}

Effect Attenuation

\Delta^0_{AB}
\begin{array}{ccc} \tilde{\Delta}_{AB} &=& (1-\gamma_\epsilon)\cdot\Delta^0_{AB} \end{array}

Baseline:

Noise:

Alternatively, the population may just have a smaller response, where \(\gamma_\Delta\) indicates the effect-size reduction:

\begin{array}{ccc} \tilde{\Delta}_{AB} &=& \tilde{\mu}_A-\tilde{\mu}_B\\ &=& \left(\tilde{\mu}_B +(1-\gamma_\Delta)\cdot\Delta^0_{AB} \right)-\tilde{\mu}_B\\ &=& (1-\gamma_\Delta)\cdot\Delta^0_{AB}\\ \end{array}

Effect Attenuation

\Delta^0_{AB}
\begin{array}{ccc} \tilde{\Delta}_{AB} &=& (1-\gamma_\epsilon)\cdot\Delta^0_{AB} \end{array}

Baseline:

Noise:

\begin{array}{ccc} \tilde{\Delta}_{AB} &=& (1-\gamma_\Delta)\cdot\Delta^0_{AB}\\ \end{array}

Reduc. Effect:

Though we'll attempt to separate them, the compound effect of both noise and reduced effect size has an overall reduction \(\gamma\)

\begin{array}{ccc} \tilde{\Delta}_{AB} &=& (1-\gamma)\cdot\Delta^0_{AB}\\ &=& (1-\gamma_\epsilon)\cdot(1-\gamma_\Delta)\cdot\Delta^0_{AB} \end{array}

Experimenter Preference

U(c,\gamma; Y) = \text{Pr} ( \left|T\right| > 1.96 )
T \propto \frac{(1-\gamma)\Delta_0}{\sqrt{c}}

where

T \propto \frac{\text{Effect size}}{\sqrt{\text{Obs. Cost}}}

Dual Problems

Iso-Power

Iso-Budget

Maximize power subject to budget

Minimize budget subject to power

Treatments

Our environment varies:

  • Four strategic games presented to the participants (random order at participant level)
  • A presentation effect: (\(C\) action first vs. \(D\) action first
  • The population/mode from which the sample was drawn:
    • Standard Physical Lab (undergrads)
    • Virtual Lab sample (undergrads)
    • MTurk
    • Prolific
    • CloudResearch (Approved List)

Four Games

Both games C is individually dominant and socially efficient

Dom 1:

Dom 2:

Games differ in PD tension (PD1 more temptation)

PD 1:

PD 2:

(21,21) (2,28)
(28,2) (8,8)

\(C\)

\(D\)

\(C\)

\(D\)

(19,19) (8,22)
(22,8) (9,9)

\(C\)

\(D\)

\(C\)

\(D\)

\(C\)

\(D\)

\(C\)

\(D\)

(17,17) (12,16)
(16,12) (10,10)

\(C\)

\(D\)

(15,15) (16,10)
(10,16) (11,11)

\(C\)

\(D\)

  • We say that a participant makes a \(\Sigma\)-dominated choice if they choose an action dominated both according to their own payoff, but according to the joint payoff
  • Measure inattentive response via the proportion of participants that chose \(D\) in either of the two DOM games

Dom 1:

Dom 2:

(17,17) (12,16)
(16,12) (10,10)

\(C\)

\(C\)

\(D\)

\(D\)

(15,15) (16,10)
(10,16) (11,11)

\(C\)

\(C\)

\(D\)

\(D\)

The two games differ in both the temptation to defect and the size of the gain from joint cooperation:

  • PD1: High gain but high temptation
  • PD2: Moderate gain but small temptation
(21,21) (2,28)
(28,2) (8,8)

\(C\)

\(D\)

\(C\)

\(D\)

(19,19) (8,22)
(22,8) (9,9)

\(C\)

\(D\)

\(C\)

\(D\)

PD 1:

PD 2:

(21,21) (2,28)
(28,2) (8,8)

\(C\)

\(D\)

\(C\)

\(D\)

(19,19) (8,22)
(22,8) (9,9)

\(C\)

\(D\)

\(C\)

\(D\)

PD 1:

PD 2:

Behavioral Prediction: Cooperation in these PD games has been predicted by the Rapoport ratio:

\[\rho = \frac{\pi(C,C)-\pi(D,D)}{\pi(D,C)-\pi(C,D)} \]

One-shot lab literature predicts (Charness et al 2016):

\[\Delta^0_{1\rightarrow 2} =\text{Coop}(PD1)-\text{Coop}(PD2)= -17.2\%\]

(17,17) (12,16)
(16,12) (10,10)

\(C\)

\(D\)

\(C\)

\(D\)

Dom 1:

Your
Action
Partner Action Your
Payoff
Partner
Payoff
A A
A B
B A
B B

$17

$17

$12

$16

$16

$12

$10

$10

Table (\(A=C, B=D\)):

(C,C)
(C,D)
(D,C)
(D,D)
(17,17) (12,16)
(16,12) (10,10)

\(C\)

\(D\)

\(C\)

\(D\)

Dom 1:

Your
Action
Partner Action Your
Payoff
Partner
Payoff
A A
A B
B A
B B

$17

$17

$12

$16

$16

$12

$10

$10

Table (\(A=D, B=C\)):

(C,C)
(C,D)
(D,C)
(D,D)

Population Costs

$22.08

$21.75

$3.01

$3.23

$4.36

Obs. Cost

$6

$6

$1

$1

$1.60

Fixed

1/4

1/4

1/4 x 1/10

1/4 x 1/10

1/4 x 1/10

Incentive

Physical Lab

Virtual Lab

Mech Turk

CloudResearch

Prolific

Lab

VLab

M-Turk

Cloud-R

Prolific

74

74

548

541

385

Sample

Assessment

Setting the fixed and variable payments to match typical levels (and minimums for each population) we recruited a sample on each population

 

Main outcome measures:

  • Fraction of participants choosing a dominated action in either  DOM game
  • Difference in behavior across the game reframing
  • Cooperation difference between the two PD games

Dominated Response (Noise)

  • Fraction of participants choosing D in either  DOM game
  • M-Turk exhibits substantial noise, with sensitivity to the first listed option
  • Other platforms have much lower noise

Arrows show response to action order

Dominated Response (Noise)

Estimate mixture model over:

  1. Those choosing C in both DOM games
  2. Those randomizing (uniformly)
  3. Those choosing the first listed action
Type Lab VLab M-Turk Cloud-R Prolific
1 0.86 0.78 0.44 0.84 0.83
2 0.14 0.22 0.45 0.16 0.15
3 0.00 0.00 0.11 0.00 0.02

Population Power \(\gamma_\epsilon\)

  • If dominated choices were only problem, Prolific and Cloud Research would be far superior
    • Higher power due to low observation costs with low noise

Pure noise effects

PD Cooperation Static

  • Both M-Turk and Prolific exhibit null effect on the PD static
  • There are significant effects in the other populations

Arrows show PD-2 to PD-1

Observed PD-Static 

  • Prolific's small response elasticity in PD static makes it worse than than Lab
  • Even though Cloud-R effect size is smaller than the lab, cheaper observation costs more than compensate

Total attenuation

Conclusions

  • Unfiltered M-Turk is unfit for purpose: noise in the response leads to a substantial reduction in power
  • Both Prolific and Cloud Research offer curated populations with low rates of dominated response - only slightly larger than those in the lab samples (but much cheaper per obs)
  • With respect to our study-specific behavioral comparative static (PD-game cooperation):
    • Prolific shows very low elasticity of response (participants are overly cooperative, regardless of tension)
    • CloudResearch exhibits about half the effect-size from the lab samples, but cheaper observations lead to substantially increased power

Rapoport ratio:

\[\rho = \frac{\pi(C,C)-\pi(D,D)}{\pi(D,C)-\pi(C,D)} \]

(21,21) (2,28)
(28,2) (8,8)

\(C\)

\(D\)

\(C\)

\(D\)

(19,19) (8,22)
(22,8) (9,9)

\(C\)

\(D\)

\(C\)

\(D\)

PD 1:

PD 2:

\rho=0.50
\rho=0.71
(14,14) (5,25)
(25,5) (13,13)

\(C\)

\(D\)

\(C\)

\(D\)

(18,18) (3,27)
(27,3) (12,12)

\(C\)

\(D\)

\(C\)

\(D\)

PD 3:

PD 4:

\rho=0.25
\rho=0.05

Rapoport ratio:

\[\rho = \frac{\pi(C,C)-\pi(D,D)}{\pi(D,C)-\pi(C,D)} \]

(19,19) (8,22)
(22,8) (9,9)

\(C\)

\(D\)

\(C\)

\(D\)

PD 2:

\rho=0.71
(14,14) (5,25)
(25,5) (13,13)

\(C\)

\(C\)

\(D\)

\(D\)

PD 3:

\rho=0.05

Look at the difference between these two extremes to examine whether a less subtle treatment can detect an effect 

Look at Cooperation difference from PD2 to PD3