Taylor Weidman
Alistair Wilson
Marta Boczon
Emanuel Vespa
When will equilibrium selection matter?
Broadly matching our initial treatment variation, consider:
Net directional effect from consolidation unclear:
With competing effects, need a theory capable of assessing the scale of the competing effects to get direction
One candidate for predicting selection is the Basin of Attraction:
Index measure of strategic uncertainty
as summary of the competing forces.
But to what extent can we think of calibrated (and opposing) shifts in these variables as direct substitutes?
A large experimental literature has examined how participants behave in the canonical environment for examining implicit coordination/cooperation:
While many experiments find that implicit cooperation is commonplace and predictable, implicit collusion is thought to be less common in IO settings (where explicit collusion is thought to be more common)
The broad conclusions from the repeated prisoner's dilemma literature (summarized in the meta study):
While much of the experimental meta-study is descriptive, one thrust is outlining how a theoretic measure is predictive over observed cooperation levels:
Theoretical measures are useful here as:
The Size of the Basin of attraction for Always Defect provides a balance between tractability and extensibility. Looks at two simple (and extreme) strategic responses:
Assuming the world is captured entirely by the tradeoff across these two extreme responses we can go from any stage game to a critical probability for the other cooperating
| (R, R) | (S, T) |
| (T, S) | (P,P) |
C
D
C
D
Grim
AllD
Grim
AllD
\((R,R)\)
\((V_S,V_T)\)
\((V_T,V_S)\)
\((P,P)\)
So the theory is a constructable function of the primitives, and \(p^\star\) serves as a decreasing index for cooperation
Grim
AllD
Grim
AllD
\((1,1)\)
\((V_S,V_T)\)
\((V_T,V_S)\)
\((0,0)\)
For the RPD the Basin size \(p^\star\) is a function of three parameters:
1. Temptation \(t\); 2. Sucker cost \(s\); 3. Discount \(\delta\)
| (1, 1) | (-s, 1+t) |
| (1+t, -s) | (0,0) |
C
D
C
D
Dal Bo and Frechette meta-study comparative statics:
Main empirical outcomes considered:
\(\leftarrow\)Coop Risk Dom.
\(\leftarrow\)Existence
| 1 | -s |
| (1+t) | 0 |
C
D
C
D
| 1 | -x |
| 1+x | 0 |
C
D
All C
Not All C
\(\left(N-1\right)\) others
You:
Other:
If your model of the others is that they act monolithically, then the formula is identical to the RPD:
\(\left(N-1\right)\) others
You:
Other:
If your model of others is that they act independently, then the Basin-size gets larger with \(N\):
| 1 | -X |
| 1+X | 0 |
C
D
All C
Not All C
| 1 | -S |
| 1+T | 0 |
C
D
C
D
| $20 | $11-$x |
| $20+$x | $11 |
Green
Red
All Green
One or more Red
\(\left(N-1\right)\) others
You:
| $20 | $2 |
| $29 | $11 |
Green
Red
All Green
One or more Red
\(\left(N-1\right)\) others:
You:
| $20 | $10 |
| $21 | $11 |
Green
Red
All Green
One or more Red
\(\left(N-1\right)\) others:
You:
Start with a baseline level of strategic uncertainty \(p_0\)
Using variation in both \(X\) and \(N\) we then separately manipulate the correlated and independent basins:
Effectively the correlated basin is the null hypothesis over movement in \(N\), focusing just on the independent basin we have:
As such, the Independent basin gives us a \(2\times 2\) design over two basin sizes
Each treatment runs for 20 supergame repetitions
Split into two identical 10-supergame halves
Three sessions of each treatment
20/24 University of Pittsburgh UG subjects
Paid for last round in two randomly selected supergames
So we validate the extended theory within our game only for ongoing cooperation
So we validate the extended theory within our game only for ongoing cooperation
| $20 | $2 |
| $29 | $11 |
Green
Red
All Green
One+ Red
\(\left(N-1\right)\) others
You:
| $20 | $2 |
| $29 | $11 |
Green
Red
One+ Green
All Red
\(\left(N-1\right)\) others
You:
Initial cooperation rate:
Type of reinforcement learning allowing for dynamic response across the available actions \(a_1,\ldots,a_n\) and across states \(\omega_1,\ldots,\omega_n\)
Chosen actions determined by maximal weight Updated weights on realized outcomes are scaled up/down according to the payoffs. The system (with multiple independent agents) is then allowed to converge to a steady state.
The form of the algorithm's update rule is important for these conclusions:
Asynchronous updates just the chosen action
Synchronous also updates unchosen actions based on the counterfactual
Synchronous updating requires a model of the world to do additional learning over. Asker et al. papers demonstrate that even relatively simple models (downward-sloping demand say) lead to substantial reductions in collusive behavior.
Though the relationship shifts if you let the AIs learn counterfactually
So within the experimental PD parameterizations we do achieve parallel results (at least on average) with AI subjects using simple updating rules:
Going forward we think that these simple AIs can be useful in helping us understand how to extend our theories for human subjects to examine alternative environments with richer strategy spaces:
And also explore where selection theories are weakest for the AIs, and use this to design experiments for humans to examine parallels.