Fall 24 Final Review

 

Shen Shen

December 10, 2024

Intro to Machine Learning

Outline

  • Rundown
  • Q&A
  • Past Exams Walk-through

Week 1 - IntroML

  • Terminologies
    • Training, validation, testing
    • Identifying overfitting and underfitting
  • Concrete processes
    • Learning algorithm
    • Validation and Cross-validation
    • Concept of hyperparameter

Week 2 - Regression

  • Problem Setup
  • Analytical solution formula \(\theta^*=\left(\tilde{X}^{\top} \tilde{X}\right)^{-1} \tilde{X}^{\top} \tilde{Y}\) (and what's \(\tilde{X}\))
  • When \(\tilde{X}^{\top} \tilde{X}\) not invertible (optimal solutions still exist; just not via the "formula")
    • Practically (two scenarios)
    • Visually (obj fun no longer of "bowl" shape, instead has "half-pipe" shape)
    • Mathematically (loss of solution uniqueness)
  • Regularization
    • Motivation, how to, when to

Week 3 - Gradient Descent

  • The gradient vector (both analytically and conceptually)
  • The gradient-descent algorithm and the key update formula
  • (Convex + small-enough step-size + gradient descent + global min exists + run long enough) guarantee convergence to a global min
    • What happens when any of these conditions is violated
  • How does the stochastic variant differ (Set up, run-time behavior, and conclusion)

Week 4 - Classification  

  • (Binary) linear classifier (sign based)
  • (Binary) Logistic classifiers (sigmoid, NLL loss)
  • Linear separator (the equation form, visual form with normal vector)​
  • Linear separability (interplay with features)
  • How to handle multiple classes
    • Softmax generalization (Softmax, cross-entropy)
    • Multiple sigmoids
    • One-vs-one, one-vs-all

Week 5 - Features

  • Feature transformations
    • Apply a fixed feature transformation
    • Hand-design feature transformation (e.g. towards getting linear separability)
    • Interplay between the number of features, the quality of features, and the quality of learning algorithms
  • Feature encoding
    • One-hot, thermometer, factored, numerical, standardization
    • When and why to use any of those

Week 6 - Neural Networks

  • Forward-pass (for evaluation)
  • Backward-pass (via backpropogation, for optimization)
  • Source of expressiveness
  • Output layer design
    • dimension, activation, loss
  • Hand-designing weights
    • to match some given function form
    • achieve some goal (e.g. separate a given data set)

Week 7 - Auto-encoders

  • Unsupervised learning setup
  • Auto-encoder:
    • The idea of compression and reconstruction
    • Mechanically, can use any vanilla classical or neural architecture

Week 8 - CNN

  • Forward pass: convolution operation; max-pooling and the typical "pyramid" stack.
  • Backward pass: back-propagation to learn filter weights/bias.
  • The convolution/max-pooling operation
    • various hyper-parameters (filter size, padding size, stride) in spatial dimension;
    • the 3rd channel/depth dimension
    • reason about in/out shapes.
  • Conceptually: weight sharing, "pattern matching" template, independent and parallel processing.

Week 9 - Transformers

  • A single input (think one sentence), tokenized into a sequence: \(n\) tokens, each token \(x\) is \(d\) dimensional
  • the attention mechanism (one head)
    • learn weights \(W_q, W_k, W_v\) to turn raw \(x\) inputs into (query, key, value)
    • the mechanics, softmax(raw attention score), shapes
    • masking: why and how
  • parallel-processing machines
    • each head is processed in parallel
    • inside a head, each token is processed in parallel

Week 10 - Clustering

  • Unsupervised learning set up
  • The \(k\)-means algorithm
    • cluster assignment; cluster center updates
    • convergence criterion
  • The initialization matters
  • The choice of hyper-parameter \(k\) matters

Week 11 - MDPs

  • Definition (the five tuple)
    • \(\pi\), \(V,\) and \(Q:\) definition and interpretation
  • Policy evaluation: given \(\pi(s)\), calculate \(V(s)\)
    • via summation, or via Bellman recursion or equation 
  • Policy optimization: finding optimal policy \(\pi^*(s)\)
    • toy setup: solve via heuristics; more generally: Q value-iteration
  • Interpretation of optimal policy
    • how various setup changes optimal policy \(\mathrm{R}, \gamma, h\)

Week 12 - Reinforcement Learning

  • How RL setup differs from MDP
  • Q-learning algorithm
    • Forward thinking: given experiences, work out Q-values.
    • Backward thinking: given realized Q-values, work out experiences.
    • Two new hyper-parameters (compared with MDP value iteration):
      • \(\epsilon-\)greedy action selection
      • \(\alpha\) the learning rate
  • The idea of fitting parameterized Q-functions via regression, can handle larger or continuous state/action space

Week 13 - Non-parametric methods

  • Decision trees:
    • Flow chart; if/else statement; human-understandable
    • Split dimension, split value, tree structure (root/decision node and leaf)
    • Largest leaf size \(k\) matters
    • For classification: weighted-average-entropy or accuracy; for regression, MSE
  • \(k-\)nearest neighbors:
    • memorizes data
    • scaling matters, \(k\) matters
    • inefficient in test/prediction time

We'd love to hear your thoughts on the course: this provides valuable feedback for us and other students, for future semesters!  Thank you!🙏

Course Evaluations

(The demo won't embed in PDF. But the direct link below works.)

import random
terms= ["spring2024", "fall2023", "spring2023", "fall2022", "spring2022", 
        "fall2021", "fall2019", "fall2018", "fall2018"]

qunums = range(1,10)
base_URL = "https://introml.mit.edu/_static/fall24/final/review/final-"

term = random.choice(terms)
num = random.choice(qunums)
print("term:", term)
print("question number:", num)
print(f"Link: {base_URL+term}.pdf")

Resources

General problem-solving tips

More detailed CliffsNotes

Exam-taking tips

  • Arrive 5min early to get settled in.
  • Bring a pencil (and eraser), a watch, and some water.
  • Look over whole exam and strategize for the order you do problems.

Good luck!

Thanks for the Fall24 semester!

introml-fa24-final-review

By Shen Shen

introml-fa24-final-review

  • 18