yuan meng
april 25, 2022
exit talk 🐻
exit talk
"a painting of 20 athletes playing a brand new sport"
dall·e 2 (openai, 2022)
palm ( google ai, 2022 )
trained on broad data at scale + adaptable to wide range of downstream tasks
anti-depressants
mood
thoughts
anti-depressants
mood
thoughts
🤔
intervene on mood (e.g., exercise) without anti-depressants 👉 are thoughts affected?
no
yes
data isn't always sufficient or clear...
image credit: unsplash @maxsaeling
two well-known facts
explanation
the night sky is not filled with starlight; there's darkness in between
if light has travelled infinitely long, it will have filled the night sky
why?🤨
light has only travelled for a finite amount of time 👉 the universe may have a beginning
don't always need new data!
intervention
explaining (cause: why, mechanism: how...)
where the quest began...
walker et al. (2014)
walker and lombrozo (2017)
williams and lombrozo (2010)
explaining helps children and adults acquire generalizable abstractions useful for future scenarios (lombrozo, 2016)
inductive rich properties
the moral of the story
subtle yet broad patterns
explaining may help learners pinpoint own uncertainty or grasp the difference-making principle behind effective intervention
"trouble-shooting"
intervention design
legare (2012)
why is it suddenly working?
why does it stop working?
explanations generated by reinforcement learning agent
lampinen et al. (2021)
other domains: question-asking (ruggeri et al., 2019), choosing to observe more in multi-armed bandit problem (liquin & lombrozo, 2017)...
task desiderata
adapted from coenen et al. (2015)
😵💫
task desiderata
adapted from coenen et al. (2015)
expected information gain (EIG)
1. surprisal of each hypothesis
2. average surprisal = information entropy (shannon, 1948)
3. reduction in entropy after intervention (e.g., turning on green) and outcome = information gain
expected information gain over all outcomes 👉 guides intervention
don't know what's gonna happen!
difference making
1
0
0
positive test strategy (PTS)
test one hypothesis at a time
(coenen et al., 2015; mccormack et al., 2016; meng et al., 2018; meng & xu, 2019; nussenbaum et al., 2020; steyvers et al. 2003)
not being considered yet
1
0.5
0
why suboptimal?
positive test strategy (PTS)
test one hypothesis at a time
take the maximum across hypotheses
: yellow
(coenen et al., 2015; mccormack et al., 2016; meng et al., 2018; meng & xu, 2019; nussenbaum et al., 2020; steyvers et al. 2003)
1
0.5
0
1
0
0
1. each person's weight of EIG: θ
2. value of an intervention
3. probability choosing intervention
infer each person's θ and τ from interventions they chose
n × 6 × 3 tensor (unobserved): probability of each of n people choosing each of 3 interventions in each of 6 puzzles
n-vector (unobserved): each of n people’s EIG weight
n-vector (unobserved): each of n people’s "temperature"
6 × 3 matrix (observed): EIG score of each of 3 interventions in each of 6 puzzles
6 × 3 matrix (observed): PTS score of each of 3 interventions in each of 6 puzzles
n × 6 matrix (observed): each of n people’s choices in each of 6 puzzles
adapted from coenen et al. (2015)
5- to 7-year-olds: n = 39
adults: n = 29
meng, bramley, and xu (2018)
mccormack et al. (2016)
tl;dr: adults mainly used EIG vs. children mainly PTS
0.86
0.38
0.77
0.24
averaged across all participants and all puzzles: chance = 1/3
tl:dr: children didn't learn from informative interventions like adults did
0.91
0.45
only included trials where EIG = 1 (same later)
5- to 7-year-olds: n = 59 (explain); n = 58 (report)
adults: n = 30 (explain); n = 27 (report)
meng and xu (2019)
"after deciding on which light bulb you wanna use, don't turn it on yet! point to it and I'll ask you…"
"cool, can you tell me…"
if choice changed, use last one for modeling
tl:dr: explaining had no significant impact on intervention strategies
0.86
0.38
0.95
0.94
0.44
0.39
baseline 👉 explanation 👉 report
0.76, 0.86, 0.84
0.24, 0.28, 0.21
tl:dr: children who either explained or reported were able to learn from interventions
0.91
0.97
0.94
0.45
0.80
0.65
*
*
*
hypothesis: explanation quality matters?
wrong explanations likely relevant to the task
wrong explanations don't target intervention selection at all
hypothesis: does the quality of explanations matter?
confounding!
intelligence, education...
explanation type
intervention strategy
hypothesis: does the quality of explanations matter?
7- to 8-year-olds: n = 31 (explain), n = 27 (training)
9- to 11-year-olds: n = 30 (explain), n = 30 (training)
(control condition in progress)
same as in study 2
first trial as training
"if alex turns on the {yellow, green, red} light bulb, can they find out the answer?" (correct if wrong)
tl:dr: without training, PTS is the most common type in 7- to 8-year-olds and EIG in 9- to 11-year-olds
true rationale unknown
tl:dr: EIG was the dominant type during training; only remained so in 9- to 11-year-olds after training
training trials (first puzzle)
non-training trials (last 5)
explanation condition
tl;dr: EIG-based explanations 👉 informative interventions; PTS-based explanations 👉 uninformative interventions
condition: explanation
condition: training
no 9- to 11-year-olds explained this way
tl:dr: 9- to 11-year-olds relied far more on EIG after training; 7- to 8-year-olds weren't affected as much
0.44
0.86
0.64
0.59
on a par with adult baseline (study 1)
explanation 👉 training
0.26, 0.35
0.45, 0.71
*
*
(training trials excluded from all analyses)
*
tl:dr: all children chose correct structures after informative interventions (EIG = 1) most of the time
0.90
0.89
0.94
0.95
6 years later...
thinking alone doesn't lead to better experimentation, but training the prepared mind to think might
open question: why use PTS at all?
# of structures:
# of costly computations (hold all structures in mind)
future research: manipulate cost-quality balance
worse case 👉 test one edge at a time: 6 × 2 = 12 interventions
gonna be complicated...
each edge: 2 interventions at most
open question: what does PTS really mean in causal learning?
make everything happen? e.g., "yellow is most useful because it may turn on every single one."
make same thing happen? e.g., "green always goes to red, so it helps in both examples."
future research: unpack PTS
examples of PTS-based explanations
1 (0.5)
1 (1)
2 (1)
0 (0)
0 (0)
0 (0)
open question: why did older children benefit more from training?
nussenbaum et al. (2020)
EIG weight
decision noise
~0.38
future research: more elaborative training for younger children?
— charles s. pierces (1882)
"i procrastinated more than i had ever done in my life thus far: i watched lots of tv shows, took many naps, and wasted countless hours messing around online.
unlike my friends with nine-to-five jobs, there was no boss to look over my shoulder day to day, so i let my mind roam free without any structure in my life.
having full intellectual freedom was actually a curse, since i was not yet prepared to handle it." (p. 16-17)
"so, was it fun?
i'll answer using another f-word: it was fun at times, but more importantly, it was fulfilling.
fun is often frivolous, ephemeral, and easy to obtain, but true fulfillment comes only after overcoming significant and meaningful challenges." (p. 107)
well into my 3rd year: hooked by data science 👉re-realized coolness & importance of self-supervised learning + bayes + information theory... or just scientific computing in general (post on data science)