Intro to Machine Learning
Spring 24 Final Review
Shen Shen
May 14, 2024
We'd love to hear your thoughts on the course: this provides valuable feedback for us and other students, for future semesters! Thank you!🙏
Course Evaluations
Outline
- Rundown
- Past Exam
- Q&A
Week 1 - IntroML
- Terminologies
- Training, validation, testing
- Identifying overfitting, underfitting
- Concrete process
- Learning algorithm
- Cross-validation
- Concept of hyperparamter
Week 2 - Regression
- Problem Setup
- Analytical solution formula \(\theta^*=\left(\tilde{X}^{\top} \tilde{X}\right)^{-1} \tilde{X}^{\top} \tilde{Y}\) (what's \(\tilde{X}\))
- When \(\tilde{X}^{\top} \tilde{X}\) not invertible (solutions still exist; just not via the "formula")
- Practically (two scenarios)
- Visually (obj fun no longer "bowl" shape, "half-pipe" shape)
- Mathematically (loss of solution uniqueness)
- Regularization
- Motivation, how to, when to.
- Cross-validation
Week 3 - Gradient Descent
- Gradient vector
- The algorithm, gradient-descent formula
- How does "stochastic" gradient descent differ
- (Convex + small-enough step-size + gradient descent) guarantees convergence to global min (when global min exists)
- If not convex, can e.g. get stuck in local min
- If step-size too big, can diverge
- If stochastic gradient descent, can be "wild"
Week 4 - Classification
- (Binary) linear classifier (sign based)
- (Binary) Linear logistic classifier
- Sigmoid
- NLL loss
- Linear separator (equation form, pictorial form with normal vector)
- Linear separability
- How to handle multiple classes
- Softmax generalization
- Multiple sigmoids
- One-vs-one, one-vs-all
Week 5 - Features
- Feature transformations
- Applying a fixed feature transformation
- Hand-design a good feature transformation (e.g. towards getting linear separability)
- Interplay between number of features, quality of features, and quality of learning algorithms
- Feature encoding
- One-hot, thermometer, factored, numerical, standardization
Week 6 - Neural Networks
- Forward-pass (for evaluation)
- Backward-pass (via backpropogation, for optimization)
- Source of expressiveness
- Output layer design
- dimension, activation, loss
- Hand-design weights
- to match some given function form
- achieve some goal (e.g. separate a given data set)
Week 7 - CNN
- The convolution operation
- various hyper-parameters (filter size, padding size, stride) in spatial dimension
- the 3rd channel/depth dimension
- reason about in/out shapes.
- Tailored for vision problems: fully-connected nets + weight sharing.
- Convolutional filters: "Pattern matching" template
- Forward pass: convolution operation; backward: backprop to learn filter weights/bias as usual.
- Independent and parallel processing.
- Max-pooling and typical "pyramid" stack.
Week 8 - Transformers
- parallel-processing machines
- a single input (sentence, image) is tokenized into a sequence: \(n\) tokens, each token in \(d\) dimensional
- attention mechanism
- learn three weights \(W_q, W_k, W_v\) to turn raw inputs into query, key, value
- the mechanics; shapes, softmax, number-crunching
- the idea of masking
Week 9 - Non-parametric methods
- Decision trees:
- Flow chart; if/else statement; human-understandable
- Split dimension, split value, tree structure (root/decision node and leaf)
- For classification, weighted-average-entropy/accuracy; for regression, MSE.
- \(k-\)nearest neighbors:
- memorizes data
- inefficient in test/prediction time
Week 10 - MDPs
-
- Definition (the five tuple)
- \(\pi\), \(V,\) and \(Q:\) definition and interpretation
- Policy evaluation: given \(\pi(s)\), calculate \(V(s)\)
- via summation, or
- via Bellman recession (for finite-horizon) or equation (for infinite horizon)
- Policy optimization: finding optimal policy \(\pi^*(s)\)
- Toy setup: solve via heuristics
- More generally: Q value-iteration
- Interpretation of optimal policy:
- how various setup changes optimal policy \(\mathrm{R}, \gamma, h\)
Week 11 - Reinforcement Learning
- How RL setup differs from MDP
- Q-learning algorithm
- Forward thinking: given experiences, work out Q-values.
- Backward thinking: given realized Q-values, work out experiences.
- Two new hyper-parameters (compared with MDP):
- \(\epsilon-\)greedy action selection
- \(\alpha\) the learning rate
- The idea of fitting parameterized Q-functions via regression
- can handle larger/continuous state/action space
Week 12 - Unsupervised Learning
- Unsupervised learning setup
- Clustering:
- The clustering algorithm (cluster assignment; cluster center updates)
- Initialization matters
- The choice of hyper-parameter \(k\) matters
- Auto-encoder:
- The idea of compression->reconstruction
- Mechanically, exactly the same as any vanilla neural architecture.
(The demo won't embed in PDF. But the direct link below works.)
import random
terms= ["fall2023", "spring2023", "fall2022", "spring2022",
"fall2021", "fall2019", "fall2018", "fall2018"]
qunums = range(1,9)
base_URL = "https://introml.mit.edu/_static/spring24/final/review/final-"
term = random.choice(terms)
num = random.choice(qunums)
print("term:", term)
print("question number:", num)
print(f"Link: {base_URL+term}.pdf")
- All the released materials Week 1 - Week 12
- Review Question Sampler
Resources
General problem-solving tips
More detailed CliffsNotes
General exam tips
- Arrive 5min early to get settled in.
- Bring a watch.
- Bring a pencil (and an eraser).
- Look over whole exam and strategize for the order you do problems.
- Bring some water.
Best of luck! 🍀
Thanks for the Sp24 semester!
introml-sp24-final-review
By Shen Shen
introml-sp24-final-review
- 103