Preferences over Experimental Populations
Neeraja Gupta
Luca Rigotti
Alistair Wilson
Preferences over Experimental Populations
Neeraja Gupta
Luca Rigotti
Alistair Wilson
\( \text{\textcircled{r}} \)
\( \text{\textcircled{r}} \)
\( \text{\textcircled{r}} \)
Understanding an Economic Phenomena
Resources as an Academic
Understanding an Economic Phenomena
Resources as an Academic
Budget Constraints:
- While we may want to separate resources from our ability to answer economic questions, for experimental studies, these will be related
- Even if you have a huge research budget, you still might want to use that budget more efficiently
Understanding an Economic Phenomena
Resources as an Academic
Choices over Population:
- Given options over where we collect experimental data, want to get the greatest bang for our buck
- Where other methodological studies have examined whether an effect replicates across populations, we instead want to understand which population maximizes our inferential power
Basic Idea
- You have a budget \(\$Y\) budget for an experiment
- Trying to uncover qualitative difference between:
- Treatment (A)
- Control (B)
- Allow the populations to differ over:
- Costs per observation, \(c\)
- Attenuation in effect size, \(\gamma\)
Use a T-stat to formulate a preference:
Observation Costs
Populations differ in the cost per observation \(\$c\):
- Physical Lab subjects tend to have a high \(\$c\)
- Online platforms like MTurk tend to have a low \(\$c\)
Costs are set in some larger equilibrium that we take as given; calibrate the incentive levels to be ecologically valid
Observation Costs
- Given budget \(\$Y\) for the project, different observation costs \(\$c\) will translate into different sample sizes:
- \( N(c ;Y)= \frac{Y}{c}\)
- All else equal we prefer lower observation cost, as it yields a bigger sample
Effect Attenuation
- Suppose treatments \(A\) and \(B\) exhibit an average response difference in baseline population \(P_0\) of
- Changes to the size of treatment effect as quality measure:
- Noise (parameter \(\gamma_\epsilon\) )
- Reduced elasticity of response (parameter \(\gamma_\Delta\) )
Effect Attenuation
- Data from an alternative population \(\tilde{P}\) has a proportion or participants \(\gamma_\epsilon\) that act randomly (independent of condition) with mean \(\epsilon\):
Effect Attenuation
Baseline:
Noise:
Alternatively, the population may just have a smaller response, where \(\gamma_\Delta\) indicates the effect-size reduction:
Effect Attenuation
Baseline:
Noise:
Reduc. Effect:
Though we'll attempt to separate them, the compound effect of both noise and reduced effect size has an overall reduction \(\gamma\)
Experimenter Preference
where
Dual Problems
Iso-Power
Iso-Budget
Maximize power subject to budget
Minimize budget subject to power
Treatments
Our environment varies:
- Four strategic games presented to the participants (random order at participant level)
- A presentation effect: (\(C\) action first vs. \(D\) action first
- The population/mode from which the sample was drawn:
- Standard Physical Lab (undergrads)
- Virtual Lab sample (undergrads)
- MTurk
- Prolific
- CloudResearch (Approved List)
Four Games
Both games C is individually dominant and socially efficient
Dom 1:
Dom 2:
Games differ in PD tension (PD1 more temptation)
PD 1:
PD 2:
| (21,21) | (2,28) |
| (28,2) | (8,8) |
\(C\)
\(D\)
\(C\)
\(D\)
| (19,19) | (8,22) |
| (22,8) | (9,9) |
\(C\)
\(D\)
\(C\)
\(D\)
\(C\)
\(D\)
\(C\)
\(D\)
| (17,17) | (12,16) |
| (16,12) | (10,10) |
\(C\)
\(D\)
| (15,15) | (16,10) |
| (10,16) | (11,11) |
\(C\)
\(D\)
- We say that a participant makes a \(\Sigma\)-dominated choice if they choose an action dominated both according to their own payoff, but according to the joint payoff
- Measure inattentive response via the proportion of participants that chose \(D\) in either of the two DOM games
Dom 1:
Dom 2:
| (17,17) | (12,16) |
| (16,12) | (10,10) |
\(C\)
\(C\)
\(D\)
\(D\)
| (15,15) | (16,10) |
| (10,16) | (11,11) |
\(C\)
\(C\)
\(D\)
\(D\)
The two games differ in both the temptation to defect and the size of the gain from joint cooperation:
- PD1: High gain but high temptation
- PD2: Moderate gain but small temptation
| (21,21) | (2,28) |
| (28,2) | (8,8) |
\(C\)
\(D\)
\(C\)
\(D\)
| (19,19) | (8,22) |
| (22,8) | (9,9) |
\(C\)
\(D\)
\(C\)
\(D\)
PD 1:
PD 2:
| (21,21) | (2,28) |
| (28,2) | (8,8) |
\(C\)
\(D\)
\(C\)
\(D\)
| (19,19) | (8,22) |
| (22,8) | (9,9) |
\(C\)
\(D\)
\(C\)
\(D\)
PD 1:
PD 2:
Behavioral Prediction: Cooperation in these PD games has been predicted by the Rapoport ratio:
\[\rho = \frac{\pi(C,C)-\pi(D,D)}{\pi(D,C)-\pi(C,D)} \]
One-shot lab literature predicts (Charness et al 2016):
\[\Delta^0_{1\rightarrow 2} =\text{Coop}(PD1)-\text{Coop}(PD2)= -17.2\%\]
| (17,17) | (12,16) |
| (16,12) | (10,10) |
\(C\)
\(D\)
\(C\)
\(D\)
Dom 1:
| Your Action |
Partner Action | Your Payoff |
Partner Payoff |
|---|---|---|---|
| A | A | ||
| A | B | ||
| B | A | ||
| B | B |
$17
$17
$12
$16
$16
$12
$10
$10
Table (\(A=C, B=D\)):
| (17,17) | (12,16) |
| (16,12) | (10,10) |
\(C\)
\(D\)
\(C\)
\(D\)
Dom 1:
| Your Action |
Partner Action | Your Payoff |
Partner Payoff |
|---|---|---|---|
| A | A | ||
| A | B | ||
| B | A | ||
| B | B |
$17
$17
$12
$16
$16
$12
$10
$10
Table (\(A=D, B=C\)):
Population Costs
$22.08
$21.75
$3.01
$3.23
$4.36
Obs. Cost
$6
$6
$1
$1
$1.60
Fixed
1/4
1/4
1/4 x 1/10
1/4 x 1/10
1/4 x 1/10
Incentive
Physical Lab
Virtual Lab
Mech Turk
CloudResearch
Prolific
Lab
VLab
M-Turk
Cloud-R
Prolific
74
74
548
541
385
Sample
Assessment
Setting the fixed and variable payments to match typical levels (and minimums for each population) we recruited a sample on each population
Main outcome measures:
- Fraction of participants choosing a dominated action in either DOM game
- Difference in behavior across the game reframing
- Cooperation difference between the two PD games
Dominated Response (Noise)
- Fraction of participants choosing D in either DOM game
- M-Turk exhibits substantial noise, with sensitivity to the first listed option
- Other platforms have much lower noise
Arrows show response to action order
Dominated Response (Noise)
Estimate mixture model over:
- Those choosing C in both DOM games
- Those randomizing (uniformly)
- Those choosing the first listed action
| Type | Lab | VLab | M-Turk | Cloud-R | Prolific |
|---|---|---|---|---|---|
| 1 | 0.86 | 0.78 | 0.44 | 0.84 | 0.83 |
| 2 | 0.14 | 0.22 | 0.45 | 0.16 | 0.15 |
| 3 | 0.00 | 0.00 | 0.11 | 0.00 | 0.02 |
Population Power \(\gamma_\epsilon\)
- If dominated choices were only problem, Prolific and Cloud Research would be far superior
- Higher power due to low observation costs with low noise
Pure noise effects
PD Cooperation Static
- Both M-Turk and Prolific exhibit null effect on the PD static
- There are significant effects in the other populations
Arrows show PD-2 to PD-1
Observed PD-Static
- Prolific's small response elasticity in PD static makes it worse than than Lab
- Even though Cloud-R effect size is smaller than the lab, cheaper observation costs more than compensate
Total attenuation
Conclusions
- Unfiltered M-Turk is unfit for purpose: noise in the response leads to a substantial reduction in power
- Both Prolific and Cloud Research offer curated populations with low rates of dominated response - only slightly larger than those in the lab samples (but much cheaper per obs)
- With respect to our study-specific behavioral comparative static (PD-game cooperation):
- Prolific shows very low elasticity of response (participants are overly cooperative, regardless of tension)
- CloudResearch exhibits about half the effect-size from the lab samples, but cheaper observations lead to substantially increased power
Rapoport ratio:
\[\rho = \frac{\pi(C,C)-\pi(D,D)}{\pi(D,C)-\pi(C,D)} \]
| (21,21) | (2,28) |
| (28,2) | (8,8) |
\(C\)
\(D\)
\(C\)
\(D\)
| (19,19) | (8,22) |
| (22,8) | (9,9) |
\(C\)
\(D\)
\(C\)
\(D\)
PD 1:
PD 2:
| (14,14) | (5,25) |
| (25,5) | (13,13) |
\(C\)
\(D\)
\(C\)
\(D\)
| (18,18) | (3,27) |
| (27,3) | (12,12) |
\(C\)
\(D\)
\(C\)
\(D\)
PD 3:
PD 4:
Rapoport ratio:
\[\rho = \frac{\pi(C,C)-\pi(D,D)}{\pi(D,C)-\pi(C,D)} \]
| (19,19) | (8,22) |
| (22,8) | (9,9) |
\(C\)
\(D\)
\(C\)
\(D\)
PD 2:
| (14,14) | (5,25) |
| (25,5) | (13,13) |
\(C\)
\(C\)
\(D\)
\(D\)
PD 3:
Look at the difference between these two extremes to examine whether a less subtle treatment can detect an effect
Look at Cooperation difference from PD2 to PD3
PopulationPower (Exter)
By Alistair Wilson
PopulationPower (Exter)
- 65