Shayan Doroudi
Thesis Defense
April 11, 2019
UN Millennium Development Goals (2000-2015)
UN Sustainable Development Goals (2016-2030)
1940
1920
1960
1980
2000
2020
1940
1960
1980
2000
2020
1920
Photos from suppes-corpus.stanford.edu
1940
1960
1980
2000
2020
1920
1940
1960
1980
2000
2020
1920
1940
1960
1980
2000
2020
1920
1940
1960
1980
2000
2020
Can be used to develop successful ITSs with high quality content.
1920
Content development is costly (25+ hours for each instructional hour)
Limited form of adaptive instructional sequencing.
1940
1960
1980
2000
2020
1920
Underconstrained
Cannot be used to create high quality content.
Can be used to automatically infer how to sequence instruction.
I demonstrate several techniques for impactful, cost-effective semi-automated curriculum design that combine machine learning, human computation, and principles from the learning sciences
Properties of Student Models
Sequencing Instruction
Learner-Generated Content
Doroudi, Aleven, & Brunskill - L@S '17
Doroudi & Brunskill - LAK '19
Doroudi, Aleven, & Brunskill - Under Review
Doroudi et al. - EDM '15
Doroudi et al. - EDM '16
Doroudi, Thomas, and Brunskill - UAI '17
*Best Paper*
Doroudi & Brunskill - EDM '17
*Best Paper Nominee*
Doroudi et al. - CHI '16
Doroudi et al. - ICLS '18
Doroudi et al. - Under Preparation
Properties of Student Models
Sequencing Instruction
Learner-Generated Content
Doroudi, Aleven, & Brunskill - L@S '17
Doroudi & Brunskill - LAK '19
Doroudi, Aleven, & Brunskill - Under Review
Doroudi et al. - EDM '15
Doroudi et al. - EDM '16
Doroudi, Thomas, and Brunskill - UAI '17
*Best Paper*
Doroudi & Brunskill - EDM '17
*Best Paper Nominee*
Doroudi et al. - CHI '16
Doroudi et al. - ICLS '18
Doroudi et al. - Under Preparation
My Thesis
Want workers to become more skilled
No existing curriculum
Tasks changing over time
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
Sig. difference between example and control
(Mann-Whitney U test, \(p < 0.05\))
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
Predict which solutions lead to highest future performance when validated
Number of characters in solution only feature with non-zero coefficient
LASSO regression
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
* Workers see one short (<800 char) and one long (>800 char) solution
* Workers see one short (<800 char) and one long (>800 char) solution
** Post Hoc analysis of workers who saw one medium (500-800 char)
and one extra long (>1000 char) solution
Doroudi, Kamar, Brunskill, and Horvitz, CHI 2016
No sig. differences between conditions
No sig. differences between conditions
No sig. differences between conditions
3 Highest Quality Examples
Sig. difference between two good examples and control on Task A'
(Mann-Whitney U test, \(p < 0.01\))
Sig. difference between two good examples and control on Task A'
(Mann-Whitney U test, \(p < 0.01\))
Sig. difference between two good examples and control on Task A'
(Mann-Whitney U test, \(p < 0.01\))
Peer Work
ML
Model
Can lead to new insights about how people learn.
Length
Spacing
Word choice
What makes a pedagogically effective example?
Quality
for each student
Seeing peer work that is curated with simple rules can be an effective form of training.
Future direction: use more sophisticated machine learning models to curate better solutions
However, our results suggest we can do better!
and to learn more about how peer work leads to learning.
Peer-generated work can be used for cost-effective content generation
Model-Based RL
Overconstrained
Biased
Underconstrained
High Variance
Cognitive
Mastery Learning
Deep RL
Importance Sampling
Model/Theory-Driven
Data-Driven
Model-Based RL
Overconstrained
Biased
Underconstrained
High Variance
Cognitive
Mastery Learning
Deep RL
Importance Sampling
Model/Theory-Driven
Data-Driven
Corbett and Anderson, 1995
Corbett and Anderson, 1995
“The [BKT] model overestimates the true learning and performance parameters for below-average students who make many errors. While these students receive more remedial exercises than the above average students, they nevertheless receive less remedial practice than they need and perform worse on the test than expected.”
Corbett and Anderson, 1995
Even though mastery learning is better than one-size fits all, it may not always be give enough practice.
500 students
20 practice opportunities
High P(slip)!
P(Correct)
Doroudi and Brunskill, Educational Data Mining 2017, Best Paper Nominee
Average P(Correct)
at Mastery:
0.54
P(Correct)
Mastery
Learning
Declare
Mastery
200 students
20 practice opportunities
200 students
20 practice opportunities
Fast Learners
Slow Learners
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
Average P(Correct)
at Mastery:
0.56
Average P(Correct) at Mastery:
0.45
Mastery
Learning
Mastery
Learning
Fast Learners
Slow Learners
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
Model-Based RL
Overconstrained
Biased
Underconstrained
High Variance
Cognitive
Mastery Learning
Deep Learning
Importance Sampling
Model/Theory-Driven
Data-Driven
Theoretically Plausible Models
Productively
Constrained
Doroudi, Aleven, and Brunskill, Under Review
Model-Based RL
Overconstrained
Biased
Underconstrained
High Variance
Cognitive
Mastery Learning
Deep Learning
Importance Sampling
Model/Theory-Driven
Data-Driven
Robustness to Many Models
Not Constrained by a Single Model
Doroudi, Aleven, and Brunskill, L@S 2017
Using theories of learning to inform automated instructional sequencing
Using learner contributions to generate new content
+
machine learning to curate new content
Using learner input to inform instructional sequencing
Using teacher input to inform instructional sequencing
Artificial Intelligence
Rule-Based + Logical AI
Statistical AI
Rule-Based
Theory-Driven
Machine Learning
Data-Driven
Automated Instruction
Artificial Intelligence
Rule-Based + Logical AI
Statistical AI
Rule-Based
Theory-Driven
Machine Learning
Data-Driven
Semi-Automated Instruction
The research reported here was supported in part by the Institute of Education Sciences, U.S. Department of Education, through Grants R305A130215 and R305B150008 to Carnegie Mellon University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Dept. of Education. Some of the research was also funded with the generous support of Microsoft Research and Google.
I am fortunate to have worked on the research presented here with a number of collaborators including Emma Brunskill, Vincent Aleven,
Ece Kamar, Eric Horvitz, and Phil Thomas.
I also acknowledge the support of my thesis committee,
Emma Brunskill, Vincent Aleven, Ken Koedinger, Chinmay Kulkarni, and Eric Horvitz,
as well as Sharon Carver and David Klahr.
Model-Based RL
Overconstrained
Biased
Underconstrained
High Variance
Cognitive
Mastery Learning
Deep RL
Model/Theory-Driven
Data-Driven
Importance Sampling
Instructional Policy
Model
(MDP)
Instructional Policy
Model
(MDP)
Model
(MDP)
Chi et al., 2011
Rowe et al., 2014
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Overconstrained
Biased
Underconstrained
High Variance
Cognitive
Mastery Learning
Deep RL
Importance Sampling
Model/Theory-Driven
Data-Driven
Model-Based RL
Estimator that gives unbiased and consistent estimates for a policy!
Can have very high variance when policy is different from prior data.
Example: Worked example or problem-solving?
20 sequential decisions ⇒ need over 2^{20}$20\(2^{20}\) students!
Importance sampling can prefer the worse of two policies more often than not (Doroudi, Thomas, and Brunskill, 2017).
Doroudi, Thomas, and Brunskill, Uncertainty in Artificial Intelligence 2017, Best Paper
Model-Based RL
Overconstrained
Biased
Underconstrained
High Variance
Cognitive
Mastery Learning
Deep Learning
Importance Sampling
Model/Theory-Driven
Data-Driven
Significant advances in computer vision, natural language processing, and game playing
Recent work on instructional sequencing, but only simulations
Not enough data
Learning is fundamentally different from images, language, and games
Baselines are much stronger for instructional sequencing
New Instructional Policy
New
Model
Fractions
Tutor
Fractions
Tutor
Around 1000 students
New Instructional Policy
New
Model
Fractions
Tutor
Fractions
Tutor
Fractions
Tutor
vs.
Baseline Policy
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
New
Model
New
Model
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
New Instructional Policy
New
Model
Fractions
Tutor
vs.
Baseline Policy
| Student Models | Policy 1 |
Policy 2 |
Policy 3 |
|---|---|---|---|
| Student Model 1 | |||
| Student Model 2 | |||
| Student Model 3 |
\(V_{SM_1,P_1}\) \(V_{SM_1,P_2}\) \(V_{SM_1,P_3}\)
\(V_{SM_2,P_1}\) \(V_{SM_2,P_2}\) \(V_{SM_2,P_3}\)
\(V_{SM_3,P_1}\) \(V_{SM_3,P_2}\) \(V_{SM_3,P_3}\)
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
| Student Models |
Baseline Policy |
Adaptive Policy |
|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
| Student Models |
Baseline Policy |
Adaptive Policy |
|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 |
| Bayesian Knowledge Tracing | 6.5 ± 0.8 | 7.0 ± 1.0 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
| Student Models |
Baseline Policy |
Adaptive Policy |
|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 |
| Bayesian Knowledge Tracing | 6.5 ± 0.8 | 7.0 ± 1.0 |
| Deep Knowledge Tracing | 9.9 ± 1.5 | 8.6 ± 2.1 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
| Student Models |
Baseline Policy |
Adaptive Policy | Awesome Policy |
|---|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 | 16 |
| Bayesian Knowledge Tracing | 6.5 ± 0.8 | 7.0 ± 1.0 | 16 |
| Deep Knowledge Tracing | 9.9 ± 1.5 | 8.6 ± 2.1 | 16 |
Posttest Scores (out of 16 points)
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Instructional Policy
Student
Model
vs.
Baseline Policy
Doroudi, Aleven, and Brunskill, In Submission
Doroudi, Aleven, and Brunskill, In Submission
leer
to read
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Doroudi, Aleven, and Brunskill, In Submission
reading
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Doroudi, Aleven, and Brunskill, In Submission
Worked Example
Problem
Solving
\(x^2 - 4 = 12\)
Solve for \(x\):
\(x^2 - 4 = 12\)
\(x^2 = 4 + 12\)
\(x^2 = 16\)
\(x = \sqrt{16} = \pm4\)
\(x^2 - 4 = 12\)
Solve for \(x\):
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Doroudi, Aleven, and Brunskill, In Submission
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Doroudi, Aleven, and Brunskill, In Submission
| RL Policy Outperformed Baseline |
Mixed Results |
RL Policy Did Not Outperform Baseline |
|
|---|---|---|---|
| Paired-Association Tasks | 10 | 0 | 3 |
| Concept Learning Tasks | 2 | 3 | 0 |
| Sequencing Activity Types | 4 | 4 | 0 |
| Sequencing Interdependent Content | 0 | 0 | 6 |
Doroudi, Aleven, and Brunskill, In Submission
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Use Psychologically-Inspired Models
Spacing Effect
Expertise Reversal Effect
Use Data-Driven
Models
Theoretical Basis
More
Less
We attempt to treat the same problem with several alternative models each with different simplifications but with a common...assumption. Then, if these models, despite their different assumptions, lead to similar results, we have what we can call a robust theorem that is relatively free of the details of the model.
Hence, our truth is the intersection of independent lies.
- Richard Levins, 1966
200 students
20 practice opportunities
Fast Learners
Slow Learners
Doroudi and Brunskill, Learning Analytics & Knowledge 2019