Intro to Machine Learning

https://introml.mit.edu/

Midterm Review

Shen Shen

March 15, 2024

Outline

Rundown
Past Exam
Q&A

Week 1 - IntroML

Terminologies
- Training, validation, testing
- Identifying overfitting, underfitting
Concrete process
- Learning algorithm
- Cross-validation
- Concept of hyperparamter

Week 2 - Regression

Problem Setup
Analytical solution formula $\theta^*=\left(\tilde{X}^{\top} \tilde{X}\right)^{-1} \tilde{X}^{\top} \tilde{Y}$ (what's $\tilde{X}$ )
When $\tilde{X}^{\top} \tilde{X}$ not invertible (solutions still exist; just not via the "formula")
- Practically (two scenarios)
- Visually (obj fun no longer "bowl" shape, "half-pipe" shape)
- Mathematically (loss of solution uniqueness)
Regularization
- Motivation, how to, when to.
Cross-validation

Week 3 - Gradient Descent

Gradient vector
The algorithm, gradient-descent formula
How does "stochastic" gradient descent differ
(Convex + small-enough step-size + gradient descent) guarantees convergence to global min (when global min exists)
- If not convex, can e.g. get stuck in local min
- If step-size too big, can diverge
- If stochastic gradient descent, can be "wild"

Week 4 - Classification

(Binary) linear classifier (sign based)
(Binary) Linear logistic classifier
- Sigmoid
- NLL loss
Linear separator (equation form, pictorial form with normal vector)
Linear separability
How to handle multiple classes
- Softmax generalization
- Multiple sigmoids
- One-vs-one, one-vs-all

Week 5 - Features

Feature transformations
- Applying a fixed feature transformation
- Hand-design a good feature transformation (e.g. towards getting linear separability)
- Interplay between number of features, quality of features, and quality of learning algorithms
Feature encoding
- One-hot, thermometer, factored, numerical, standardization

Week 6 - Neural Networks

Forward-pass (for evaluation)
Backward-pass (via backpropogation, for optimization)
Source of expressiveness
Output layer design
- dimension, activation, loss
Hand-design weights
- to match some given function form
- achieve some goal (e.g. separate a given data set)

import random
terms= ["fall2023", "spring2023", "fall2022", "spring2022", 
        "fall2021", "fall2019", "spring2019", "fall2018"]
qunums = range(1,9)
base_URL = "https://introml.mit.edu/_static/spring24/midterm/review/midterm-"

term = random.choice(terms)
num = random.choice(qunums)
print("term:", term)
print("question number:", num)
print(f"Link: {base_URL+term}.pdf")

Review Question Sampler

Past exams
Annotated Lecture Notes Study Questions
Annotated Lab takeaway
Piazza

General problem-solving tips

More detailed CliffsNotes

General exam tips

Arrive 5min early to get settled in.
Bring a watch.
Bring a pencil (and eraser).
Look over whole exam and strategize for the order you do problems.
Bring some water.

Good luck!

introml-sp24-midterm-review

By Shen Shen

introml-sp24-midterm-review

a year ago
140

Shen Shen

shenshen.mit.edu