What Might Machine Learning Teach Us About Human Learning?


Shayan Doroudi

March 5, 2025

That hasn't been the case for the past three decades.

My Takeaways from the History of AI and Education

Artificial intelligence (AI) research was fundamentally intertwined with education research since the early days of AI.

Normative Claim:
We should revive this interdisciplinary mode of thinking.

But what reason do we have to believe this would actually be fruitful?

Have decades of interdisciplinary research on learning resulted in any insights?

AI Has Influenced How We Think About Learning—Even If We Don't Realize It

Example:

Etymology of feedback

Cybernetics (an AI-adjacent discipline) introduced the term feedback to the social sciences.

AI Has Influenced How We Think About Learning—Even If We Don't Realize It

Another Example:

AI Has Influenced How We Think About Learning—Even If We Don't Realize It

Wenger received his PhD from Information and Computer Science at UCI!

But in what specific ways can machine learning offer insight on human learning?

Perception and Perceptrons

Case Studies

Language Learning Limits

Individual Differences in Learning and the Bias-Variance Tradeoff

Impossibility result

Possibility result

Providing an explanation for empirically observed phenomena

Perception and Perceptrons

Case Studies

Language Learning Limits

Individual Differences in Learning and the Bias-Variance Tradeoff

Chomsky (1980) claims there is a poverty of the stimulus: children are simply not exposed to enough verbal data in the environment to learn language from scratch.

Language Learning

Is our ability to learn grammar due to some innate linguistic capabilities or are we able to learn by exposure?

Chomsky (1956) characterized several different kinds of formal languages. The simplest of these is called regular languages. He also showed that regular languages are not powerful enough to express all English sentences.

Chomsky, N. (1980). Rules and representations. Behavioral and Brain Sciences, 3(1), 1-15.

Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2(3), 113-124.

Machine Learning Theory

Valiant (1984): Probably Approximately Correct (PAC) Learning.
To successfully be able to learn a function means to:

  • Learn a reasonable approximation of the function
  • In a limited* amount of time and with a limited* amount of data
  • With high probability

*Limited means a polynomial function of N and some other parameters.

For example, suppose we want to learn a regular language where we keep seeing sentences of length N and are told whether the sentence belongs to the language is proper in the language or not.

Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134-1142.

Language Learning Limits

Are regular languages PAC-learnable?

Kearns and Valiant (1994): If regular languages are PAC-learnable, then the algorithm used to learn them can be used to break RSA cryptography!

So was Chomsky right?

The issue lies not in the amount of samples needed, but the amount of time needed! That is, there may not be a poverty of the stimulus, but a lack of time in childhood to break the linguistic code from the data we have.

Are regular languages PAC-learnable?

Kearns and Valiant (1994): If regular languages are PAC-learnable, then the algorithm used to learn them can be used to break RSA cryptography!

Possible Interpretations:

  • There is a universal grammar à la Chomsky.
  • There is more to language learning than seeing a bunch of examples—language learning is interactive!
  • The result is only relevant in the worst case (i.e., for the most complicated languages possible)—natural languages could be much simpler.
  • P = NP — a theoretical result that widely believed to be false but no computer scientist has been able to prove
  • Our minds are quantum computers!

Language Learning Limits

Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM (JACM), 41(1), 67-95.

Are regular languages PAC-learnable?

Kearns and Vazirani (1994): Regular languages are PAC-learnable if we can ask questions of the form "does sentence X belong to the language?".

Possible Interpretations:

  • If language learning is interactive, then language is learnable (and Chomsky was wrong)!
  • But remember, this is only for regular languages. Other results suggest that formal languages closer to natural languages are not PAC-learnable under reasonable assumptions.
    • Chomsky is vindicated again!

Language Learning Limits

Kearns, M. J., & Vazirani, U. (1994). An introduction to computational learning theory. MIT Press.

Perception and Perceptrons

Case Studies

Language Learning Limits

Individual Differences in Learning and the Bias-Variance Tradeoff

Perceptrons is remembered (and criticized) for being the book that halted neural network research for >15 years!

Perceptrons

Perceptron

Minsky, M., & Papert, S. (1968). Perceptrons: An introduction to computational geometry. MIT Press.

Connectedness

Limitations of Perceptrons

Minsky and Papert (1969): No perceptron where each "neuron" has access to a small number of pixels can detect if an image is fully connected.

Minsky, M., & Papert, S. (1968). Perceptrons: An introduction to computational geometry. MIT Press.

Capabilities of Perceptrons

Minsky and Papert (1969): If an image is guaranteed to have no holes, a perceptron where each "neuron" has access to at most four pixels can detect if an image is fully connected.

This shows that there is a global property of an image that a perceptron can compute with only local evidence!

But Papert used this example in a debate with Chomsky on whether language needs to be innate!

Minsky, M., & Papert, S. (1968). Perceptrons: An introduction to computational geometry. MIT Press.

I think this result has been overlooked in the literature.

Capabilities of Human Perception?

We are not able to detect connectedness without serial processing.

Why?

  • The images on the previous slides provide anecdotal evidence that connectedness without holes is easier to detect without serial processing.
  • Perceptrons are the simplest kinds of artificial neural networks. If neural networks exist in our brain, surely perceptrons do!
  • Perceptrons have a simple learning algorithm.

But when an image has holes, perhaps we are!

If true...

We can learn to subitize large numbers
(i.e., answer if there ≥ k objects in an image)!

We can learn to solve 2D mazes at a glance!

These possibilities are highly counter-intuitive and haven't AFAIK been conjectured before!

Copyright © 2025 Alance AB

Perception and Perceptrons

Case Studies

Language Learning Limits

Individual Differences in Learning and the Bias-Variance Tradeoff

Machine Learning

Trying to learn some function \(f\) using dataset \(D = \{(\mathbf{x_i},y_i)\}^n_{i=1}  \sim \mathcal{P}_D\) where \(y_i = f(\bf{x_i})\).

We use a machine learning algorithm that selects a function \(\hat{f}\) given \(D\).

Bias

Opposite of accuracy and validity

Variance

Opposite of precision and reliability

Mean Squared Error

  Mean Squared Error     =       Bias Squared           +           Variance

Bias-Variance Decomposition

Bias-Variance Tradeoff

Bias-Variance Tradeoff

In order to minimize mean squared error,
some algorithms tend to minimize bias
and others tend to minimize variance.

Bias-Variance Tradeoff

In order to minimize mean squared error,
some people tend to minimize bias
and others tend to minimize variance.

Doroudi, S., & Rastegar, S. A. (2023). The bias–variance tradeoff in cognitive science. Cognitive Science, 47(1).

Doroudi, S. (2020). The bias-variance tradeoff: How data science can inform educational debates. AERA Open, 6(4), 2332858420977208.

This could potentially explain different
approaches to developing scientific theories,
teaching strategies, and
learning strategies.

Function Learning

McDaniel, M. A., Cahill, M. J., Robbins, M., & Wiener, C. (2014). Individual differences in learning and transfer: stable tendencies for learning exemplars versus abstracting rules. Journal of Experimental Psychology: General, 143(2), 668.

Function Learning

McDaniel, M. A., Cahill, M. J., Robbins, M., & Wiener, C. (2014). Individual differences in learning and transfer: stable tendencies for learning exemplars versus abstracting rules. Journal of Experimental Psychology: General, 143(2), 668.

Machine learning can generate insights for human learning by:

  • showing what's not possible
  • showing what may be possible
  • providing novel explanations for empirical results

SoE Faculty Seminar

By Shayan Doroudi