Statistical learning theory and the VC dimension
Book club, 30.07.2021
Claudia Merger, Alexandre René
The Nature of statistical learning theory
Statistical Learning theory
Ideas, examples,
context
Proofs, definitions,
"clean" textbook
The Nature of statistical learning theory
Statistical Learning theory
Chapters 1-3
todays topics
Main topics
- Empirical Risk Minimization Inductive Principle
- Consistency of Learning Process
- Falsifiability
- VC Entropy of a set of functions and the VC dimension
Empirical Risk Minimization Inductive Principle
Learning machine:
Given samples
fit some function
to minimize some risk
If the learning machine obeys ERM inductive principle for any given set of observations, call it a learning process.
Empirical Risk Minimization Inductive Principle
Learning machine:
Given samples
fit some function
to minimize some risk
Only minimize the empirical risk?
Empirical Risk Minimization Inductive Principle
Learning machine:
Given samples
fit some function
to minimize some risk
Only minimize the empirical risk?
NO
Example
Task: Learn Data p.d.f.
"Solution": Mixture with one component per data point:
Empirical Risk Minimization Inductive Principle
What we really want to minimize:
according to the true p.d.f.
Empirical Risk Minimization Inductive Principle
What we really want to minimize:
according to the true p.d.f.
but we don't have
How do we know when we're close?
Main topics
- Empirical Risk Minimization Inductive Principle
- Consistency of Learning Process
- Falsifiability
- VC Entropy of a set of functions and the VC dimension
Consistency of the learning process
For each sample of size pick
such that
Consistency of the learning process
For each sample of size pick
such that
the learning process is consistent iff
and
Consistency of the learning process
For each sample of size pick
such that
the learning process is consistent iff
and
choose the right
get correct risk
Example
Task: Learn Data p.d.f.
"Solution": Mixture with one component per data point:
Example
Task: Learn Data p.d.f.
"Solution": Mixture with one component per data point:
inconsistent
Example
for some
Example
for some
Example
for some
trivially
trivial consistency
Example
for some
trivially
Example
for some
trivially
trivial consistency
Exclude trivial consistency
Non-trivial consistency
consistency on any subset
Exclude trivial consistency
Non-trivial consistency
consistency on any subset
in the following consistent = nontrivially consistent
proof: Statistical Learning Theory, p. 89-92
What are the necessary and sufficient conditions?
For some
For some
For some
scaling with ?
For some
scaling with ?
Which properties must have?
Main topics
- Empirical Risk Minimization Inductive Principle
- Consistency of Learning Process
- Falsifiability
- VC Entropy of a set of functions and the VC dimension
Falsifiability
K. Popper: A theory is scientific, if it is falsifiable.
Falsifiability
K. Popper: A theory is scientific, if it is falsifiable.
Example: " What goes up, must come down." - Falsifiable?
Falsifiability
K. Popper: A theory is scientific, if it is falsifiable.
Example: " What goes up, must come down." - Falsifiable?
Example: "Whatever will be, will be." - Falsifiable?
Falsifiability
K. Popper: A theory is scientific, if it is falsifiable.
Example: " What goes up, must come down." - Falsifiable?
Example: "Whatever will be, will be." - Falsifiable?
Example
Task: Learn Data p.d.f.
"Solution": Mixture with one component per data point:
Example
Task: Learn Data p.d.f.
"Solution": Mixture with one component per data point:
not falsifiable
For consistency, the set of functions
may not be too "flexible".
Main topics
- Empirical Risk Minimization Inductive Principle
- Consistency of Learning Process
- Falsifiability
- VC Entropy of a set of functions and the VC dimension
VC Entropy of a set of functions
Given a set of functions
VC Entropy of a set of functions
Given a set of functions
and samples
VC Entropy of a set of functions
Given a set of functions
and samples
construct set of vectors
VC Entropy of a set of functions
Given a set of functions
and samples
construct set of vectors
minimal
Minimal number
of vectors such that all
have at maximum distance .
VC Entropy of a set of functions
Given a set of functions
and samples
construct set of vectors
minimal
Minimal number
of vectors such that all
have at maximum distance .
VC Entropy of a set of functions
Expectation of the diversity of the set of functions on a sample of size .
sufficient
Example:
indicator functions
input space
input space
Example:
Example:
Example:
worst case: for some
Example:
worst case: for some
Not falsifiable + Inconsistent
Problems
- VC entropy depends on distribution
- How fast is the convergence of (2.6)? How many samples are needed?
VC dimension of a set of functions
VC dimension of a set of functions
VC dimension of a set of functions
input space
VC dimension of a set of functions
VC dimension of a set of functions
VC dimension of a set of functions
For some
scaling with ?
the learning process is consistent iff
and
choose the right
get correct risk
All we need to know about learning?
All we need to know about learning?
No
All we need to know about learning?
No
if small
tradeoff between empirical risk minimization and VC dimension
All we need to know about learning?
No
if small
tradeoff between empirical risk minimization and VC dimension
add regularization
Thank you!
deck
By merger
deck
- 57