All Student Models are Wrong
But Some are Useful
Shayan Doroudi
December 3, 2018
University of California, Irvine


Student
Model
Data
Instructional Policy



Activity
Response




Millions of students
Model-Based Instructional Sequencing


Over 500,000 students/year
25 million active monthly users
~12 million active monthly users
Student
Model
Data







Model-Based Instructional Sequencing in 1960s
Photos from suppes-corpus.stanford.edu
“We assume that a mathematical model of learning will provide an approximate description of the student's learning, and the task for a theory of instruction is then to settle the question of how the instructional sequence of concepts, skills, and facts should be organized to optimize for a given student his rate of learning.”
Suppes (1974)
The Place of Theory in Educational Research
AERA Presidential Address
“It would be my prediction that we will see increasingly sophisticated theories of instruction in the near future.”
Student
Model
Data
Instructional Policy
We haven’t seen “increasingly sophisticated theories of instruction”
Student
Model
Data
Instructional Policy





“All Models are Wrong
But Some are Useful”
George Box, 1979
Statistical Models of Student Learning
≠
How Students Learn
2. Relying on a wrong model can have adverse consequences for students.
1. Undesirable behaviors of student models can be explained by using a wrong model.
I will demonstrate...
Three Consequences of a Wrong Model
Misguided Inferences about Learning
Misguided Instructional Decisions
Inequitable Outcomes
All Student Models are Wrong
But Some are Useful
Data-Driven Models + Learning Theory
Misguided Instructional Decisions
Inequitable Outcomes
Misguided Inferences about Learning
Robustness to How Students Learn
Student Models
Bayesian Knowledge Tracing (BKT)
Corbett and Anderson, 1994
Bayesian Knowledge Tracing (BKT)
Data

Student A:

Student B:

Student C:

...
Bayesian Knowledge Tracing (BKT)
Data

Student A:

Student B:

Student C:

...

Machine Learning
Bayesian Knowledge Tracing (BKT)
Addition
Subtraction
Multiplication
Mastery Learning
Keep giving practice opportunities on a skill/concept until student reaches mastery:
Then move onto the next skill/concept
Corbett and Anderson, 1994
Data
Mastery
Learning







Rosen et al., 2018
Ritter et al., 2007
Corbett and Anderson, 1994
Additive Factor Model (AFM)
Cen, 2009

\(\theta\) - Student Ability \(\sim \mathcal{N}(0, 1)\)
\(\beta\) - Item Difficulty
\(\gamma\) - Learning Rate
P(Correct)
All Student Models are Wrong
But Some are Useful
Data-Driven Models + Learning Theory
Misguided Instructional Decisions
Inequitable Outcomes
Misguided Inferences about Learning
Robustness to How Students Learn
Semantics of a BKT Model
75% of skills in a middle school mathematics tutoring system had P(guess) > 0.5 or P(slip) > 0.5
Baker, Corbett, and Aleven, 2008:
High guess and slip parameters are the result of BKT being unidentifiable.
Beck and Chang, 2007:
BKT Model actually is identifiable.
Semantics of a BKT Model
Doroudi and Brunskill, 2017:
High guess and slip parameters could be due to
fitting the wrong model.
Doroudi and Brunskill, Educational Data Mining 2017, Best Paper Nominee
Data



“Analysis of specification error relates to a rhetorical strategy in which we
suggest a model as the “true” one for sake of argument,
determine how our working model differs from it
and what the consequences of the difference(s) are,
and thereby get some sense of
how important the mistakes we will inevitably make may be.”
Otis Dudley Duncan, 1975
Semantics of a Wrong Model
P(Correct)
Doroudi and Brunskill, Educational Data Mining 2017, Best Paper Nominee
Semantics of a Wrong Model
500 students
20 practice opportunities
High P(slip)!
P(Correct)
Doroudi and Brunskill, Educational Data Mining 2017, Best Paper Nominee
Not
Learned
Learned
Semantics of a Wrong Model
100 students
200 practice opportunities
High P(guess)!
Doroudi and Brunskill, Educational Data Mining 2017, Best Paper Nominee
P(Correct)
Not
Learned
Learned
Takeaway Message
Researcher

Incorrect inference
(e.g., throw out questions)
Data



Relying on the semantics of a wrong model can lead to incorrect inferences about student learning.
Takeaway Message
All Student Models are Wrong
But Some are Useful
Data-Driven Models + Learning Theory
Misguided Instructional Decisions
Inequitable Outcomes
Misguided Inferences about Learning
Robustness to How Students Learn
Data
Mastery
Learning







Misguided Notion of Mastery
Average P(Correct)
at Mastery:
0.54
P(Correct)


Mastery
Learning
Declare
Mastery
Data
Declare mastery early



Takeaway Message
Could have cascading effects that could impede on future learning.


Data
New Instructional Policy





Going Beyond Mastery Learning
New
Model

Fractions
Tutor
Fractions
Tutor

Around 1000 students
Data
New Instructional Policy





Experiment
New
Model
Fractions
Tutor
Fractions
Tutor


Fractions
Tutor
vs.
Baseline Policy
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Simulated Experiment
New
Model
New
Model
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Data
New Instructional Policy





New
Model
Fractions
Tutor


vs.
Baseline Policy
| Baseline Policy |
Adaptive Policy | |
|---|---|---|
| Simulated Results | 5.9 ± 0.9 | 9.1 ± 0.8 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
Simulation Results
| Baseline Policy |
Adaptive Policy | |
|---|---|---|
| Simulated Results | 5.9 ± 0.9 | 9.1 ± 0.8 |
| Experimental Results | 5.5 ± 2.6 | 4.9 ± 1.8 |
Posttest Scores (out of 16 points)
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Experiment Results
Single Model Simulation
Fitted
Model
Fitted
Model
Chi et al., 2011
Rowe et al., 2014
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017


Instructional
Policy
Robust Simulation
Fitted
Model


Instructional
Policy
“True”
Model
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Robust Evaluation
| Baseline Policy |
Adaptive Policy | |
|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
Robust Evaluation
| Baseline Policy |
Adaptive Policy | |
|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 |
| Bayesian Knowledge Tracing | 6.5 ± 0.8 | 7.0 ± 1.0 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
Robust Evaluation
| Baseline Policy |
Adaptive Policy | |
|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 |
| Bayesian Knowledge Tracing | 6.5 ± 0.8 | 7.0 ± 1.0 |
| Deep Knowledge Tracing | 9.9 ± 1.5 | 8.6 ± 2.1 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
Takeaway Message
Using a wrong student model can lead to incorrect inferences about the efficacy of an instructional policy.
All Student Models are Wrong
Semantics of a Wrong Model
But Some are Useful
Robust Models
Data-Driven Models + Learning Theory
Misguided Instructional Decisions
Inequitable Outcomes
Mastery learning intends to give each student
the right amount of instruction.
Equity of Mastery Learning
“The [BKT] model overestimates the true learning and performance parameters for below-average students who make many errors. While these students receive more remedial exercises than the above average students, they nevertheless receive less remedial practice than they need and perform worse on the test than expected.”
Corbett and Anderson, 1994
Corbett and Anderson, 1994
Equity of Mastery Learning
Corbett and Anderson, 1994
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
Even after individualizing BKT parameters, they found that
low-performing students do worse on the test.
This inequity could be due to fitting the wrong model.
Doroudi and Brunskill, 2019:
Solution: Individualize BKT parameters for different students.
Corbett and Anderson, 1994
Equity of Mastery Learning
200 students
20 practice opportunities
200 students
20 practice opportunities
Fast Learners
Slow Learners
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
Equity of Mastery Learning
Average P(Correct)
at Mastery:
0.56
Average P(Correct) at Mastery:
0.45


Mastery
Learning


Mastery
Learning
Fast Learners
Slow Learners
Doroudi and Brunskill, Learning Analytics & Knowledge 2019

Consider how
(1) algorithms,
(2) machine learning,
(3) technology design, and
(4) socio-cultural forces
combine to affect equity in
learning technologies.
Equity of Learning Technologies
All Student Models are Wrong
But Some are Useful
Data-Driven Models + Learning Theory
Misguided Instructional Decisions
Inequitable Outcomes
Misguided Inferences about Learning
Robustness to How Students Learn
Robust Evaluation Matrix
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
| Student Models | Policy 1 |
Policy 2 |
Policy 3 |
|---|---|---|---|
| Student Model 1 | |||
| Student Model 2 | |||
| Student Model 3 |
\(V_{SM_1,P_1}\) \(V_{SM_1,P_2}\) \(V_{SM_1,P_3}\)
\(V_{SM_2,P_1}\) \(V_{SM_2,P_2}\) \(V_{SM_2,P_3}\)
\(V_{SM_3,P_1}\) \(V_{SM_3,P_2}\) \(V_{SM_3,P_3}\)
Robust Evaluation Matrix
| Student Models |
Baseline Policy |
Adaptive Policy |
|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 |
| Bayesian Knowledge Tracing | 6.5 ± 0.8 | 7.0 ± 1.0 |
| Deep Knowledge Tracing | 9.9 ± 1.5 | 8.6 ± 2.1 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
Robust Evaluation Matrix
| Student Models |
Baseline Policy |
Adaptive Policy | Awesome Policy |
|---|---|---|---|
| New Model | 5.9 ± 0.9 | 9.1 ± 0.8 | 16 |
| Bayesian Knowledge Tracing | 6.5 ± 0.8 | 7.0 ± 1.0 | 16 |
| Deep Knowledge Tracing | 9.9 ± 1.5 | 8.6 ± 2.1 | 16 |
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
Posttest Scores (out of 16 points)
Robust Evaluation Matrix
Doroudi, Aleven, and Brunskill, Learning @ Scale 2017
| Student Models | Policy 1 |
Policy 2 |
Policy 3 |
|---|---|---|---|
| Demographic 1 | |||
| Demographic 2 | |||
| Demographic 3 |
\(V_{SM_1,P_1}\) \(V_{SM_1,P_2}\) \(V_{SM_1,P_3}\)
\(V_{SM_2,P_1}\) \(V_{SM_2,P_2}\) \(V_{SM_2,P_3}\)
\(V_{SM_3,P_1}\) \(V_{SM_3,P_2}\) \(V_{SM_3,P_3}\)
Can tell us which policies are equitable
Robust Evaluation Matrix
|
Student Models |
Mastery Learning BKT |
|---|---|
| AFM - Fast Learners | 56% |
| AFM - Slow Learners | 45% |
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
Robust Evaluation Matrix
|
Student Models |
Mastery Learning BKT |
|---|---|
| AFM - Fast Learners | 56% |
| AFM - Slow Learners | 45% |
| BKT - Fast Learners | 98%* |
| BKT - Slow Learners | 97.3%* |
*Percent of students who are in learned state.
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
Robust Evaluation Matrix
|
Student Models |
Mastery Learning
BKT |
Mastery Learning AFM |
|---|---|---|
| AFM - Fast Learners | 56% | 96% |
| AFM - Slow Learners | 45% | 95% |
| BKT - Fast Learners | 98%* | |
| BKT - Slow Learners | 97.3%* |
*Percent of students who are in learned state.
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
Robust Evaluation Matrix
|
Student Models |
Mastery Learning
BKT |
Mastery Learning AFM |
|---|---|---|
| AFM - Fast Learners | 56% | 96% |
| AFM - Slow Learners | 45% | 95% |
| BKT - Fast Learners | 98%* | 99.8%* |
| BKT - Slow Learners | 97.3%* | 99.5%* |
*Percent of students who are in learned state.
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
All Student Models are Wrong
Misguided Inferences about Learning
But Some are Useful
Robustness to How Students Learn
Data-Driven Models + Learning Theory
Misguided Instructional Decisions
Inequitable Outcomes
Student
Model
Big
Data





Student
Model
Data





Theory
Doroudi, Aleven, and Brunskill, In Submission
Integrating Data with Theory
Cognitive
(Information Processing)
DistributedCognition
Constructivism
Socio-Cultural
Situated Cognition

“It can be argued that there is a trade-off between accounting for the subjective experience of doing mathematics and the precision inherent in expressing models in the syntax of computer formalisms.”
Paul Cobb, 1987
“It is desirable to formulate situative models that are specific enough to implement them as simulation programs”
James Greeno, 1998
Theory-Model Gap
Socio-Cultural
Model
Cognitive
Model
Data
Robustness to Learning Theories
Policy 1 |
Policy 2 |
Policy 3 |
|
|---|---|---|---|
| Cognitive Model | |||
| Constructivist Model | |||
| Socio-Cultural Model |
Robustness to Learning Theories
\(V_{SM_1,P_1}\) \(V_{SM_1,P_2}\) \(V_{SM_1,P_3}\)
\(V_{SM_2,P_1}\) \(V_{SM_2,P_2}\) \(V_{SM_2,P_3}\)
\(V_{SM_3,P_1}\) \(V_{SM_3,P_2}\) \(V_{SM_3,P_3}\)
The Bigger Picture
Research Landscape
Properties of Models of Learning
Sequencing Instruction
Learner-Generated Content
Doroudi, Aleven, & Brunskill - L@S '17
Doroudi & Brunskill - LAK '19
Doroudi, Aleven, & Brunskill - In Submission
Doroudi et al. - EDM '15
Doroudi et al. - EDM '16
Doroudi, Thomas, and Brunskill - UAI '17
*Best Paper*
Doroudi & Brunskill - EDM '17
*Best Paper Nominee*
Doroudi et al. - CHI '16
Doroudi et al. - ICLS '18
Research Landscape
Properties of Models of Learning
Sequencing Instruction
Learner-Generated Content
Doroudi, Aleven, & Brunskill - L@S '17
Doroudi & Brunskill - LAK '19
Doroudi, Aleven, & Brunskill - In Submission
Doroudi et al. - EDM '15
Doroudi et al. - EDM '16
Doroudi, Thomas, and Brunskill - UAI '17
*Best Paper*
Doroudi & Brunskill - EDM '17
*Best Paper Nominee*
Doroudi et al. - CHI '16
Doroudi et al. - ICLS '18
This Talk
Assess the robustness of various student models and instructional policies
Future Directions
Study the equitability of learning technologies, including how algorithms interact with socio-cultural factors
Work with online education providers to study how the
consequences in this talk affect actual students
Build student models for settings that we care about
by bridging the theory-model gap
To build more robust data-driven learning technologies while advancing the science of learning
Vision
Acknowledgements
The research reported here was supported, in whole or in part, by the Institute of Education Sciences, U.S. Department of Education, through Grants R305A130215 and R305B150008 to Carnegie Mellon University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Dept. of Education.
Some of the work reported here was written in papers with co-authors Emma Brunskill and Vincent Aleven. I thank Emma Brunskill, Ken Holstein, and Petr Johanes for discussions that influenced this work.
References
Box, G. E. (1979). Robustness in the strategy of scientific model building. In Robustness in statistics (pp. 201-236).
Cen, H. (2009). Generalized learning factors analysis: improving cognitive models with machine learning (Doctoral dissertation). Carnegie Mellon University, Pittsburgh, PA.
Chi, M., VanLehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction, 21(1-2), 137-180.
Cobb, P. (1990). A constructivist perspective on information-processing theories of mathematical activity. International Journal of Educational Research, 14(1), 67-92.
Corbett, A. T., & Anderson, J. R. (1994). Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction, 4(4), 253-278.
Doroudi, S., & Brunskill, E. (2017, June). The misidentified identifiability problem of Bayesian Knowledge Tracing. In Proceedings of the 10th International Conference on Educational Data Mining. International Educational Data Mining Society.
References
Doroudi, S. & Brunskill, E. (2019, March). Fairer but not fair enough: On the equitability of knowledge tracing. To appear in Proceedings of the 9th International Learning Analytics & Knowledge Conference. ACM.
Doroudi, S., Aleven, V., & Brunskill, E. (2017, April). Robust evaluation matrix: Towards a more principled offline exploration of instructional policies. In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale (pp. 3-12). ACM.
Doroudi, S., Aleven, V. & Brunskill, E. (2018). Where's the reward? A review of reinforcement learning for instructional sequencing. Manuscript in submission.
Duncan, O. D. (1975). Introduction to structural equation models. Elsevier.
Greeno, J. G. (1998). The situativity of knowing, learning, and research. American psychologist, 53(1), 5.
Rowe, J. P., Mott, B. W., & Lester, J. C. (2014). Optimizing Player Experience in Interactive Narrative Planning: A Modular Reinforcement Learning Approach. AIIDE, 3, 2.
Backup Slides
Data
New Instructional Policy





Review of Data-Driven Instruction
Student
Model


vs.
Baseline Policy
Doroudi, Aleven, and Brunskill, In Submission
Better understand researchers' beliefs about learning and computational modeling via interviews (ongoing work).
Bridging the Theory-Model Gap
Use agent-based modeling and social simulation to model socio-cultural and situative theories.
Assess robustness of models under different conceptions of learning.
At least 95% of students learn the skill


Mastery
Learning
Four Clusters of Studies
Doroudi, Aleven, and Brunskill, In Submission
aprender

to learn
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Four Clusters of Studies
Doroudi, Aleven, and Brunskill, In Submission


reading
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Four Clusters of Studies
Doroudi, Aleven, and Brunskill, In Submission
Worked Example
Problem
Solving
\(x^2 - 4 = 12\)
Solve for \(x\):
\(x^2 - 4 = 12\)
\(x^2 = 4 + 12\)
\(x^2 = 16\)
\(x = \sqrt{16} = \pm4\)
\(x^2 - 4 = 12\)
Solve for \(x\):
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Five Clusters of Studies
Doroudi, Aleven, and Brunskill, In Submission


Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Four Clusters of Studies
Doroudi, Aleven, and Brunskill, In Submission
| Data-Driven Policy Outperformed Baseline |
Mixed Results |
Data-Driven Policy Did Not Outperform Baseline |
|
|---|---|---|---|
| Paired-Association Tasks | 10 | 0 | 3 |
| Concept Learning Tasks | 2 | 3 | 0 |
| Sequencing Activity Types | 4 | 4 | 0 |
| Sequencing Interdependent Content | 0 | 0 | 6 |
Four Clusters of Studies
Doroudi, Aleven, and Brunskill, In Submission
Paired-Association Tasks
Concept Learning Tasks
Sequencing Activity Types
Sequencing Interdependent Content
Use Psychologically-Inspired Models
Spacing Effect
Worked-Example Effect
Use Data-Driven
Models
We attempt to treat the same problem with several alternative models each with different simplifications but with a common...assumption. Then, if these models, despite their different assumptions, lead to similar results, we have what we can call a robust theorem that is relatively free of the details of the model.
Hence, our truth is the intersection of independent lies.
- Richard Levins, 1966
Importance Sampling
Estimator that gives unbiased and consistent estimates for a policy!
Can have very high variance when policy is different from prior data.
-
Example: Worked example or problem-solving?
20 sequential decisions ⇒ need over 2^{20}$20\(2^{20}\) students!
Importance sampling can prefer the worse of two policies more often than not (Doroudi, Thomas, and Brunskill, 2017).
Doroudi, Thomas, and Brunskill, Uncertainty in Artificial Intelligence 2017, Best Paper
200 students
20 practice opportunities
Fast Learners
Slow Learners
Doroudi and Brunskill, Learning Analytics & Knowledge 2019
UCI Talk
By Shayan Doroudi
UCI Talk
Invited talk given at UCI on December 3rd, 2018.
- 190