Intro to Machine Learning

Spring 24 Final Review

Shen Shen

May 14, 2024

We'd love to hear your thoughts on the course: this provides valuable feedback for us and other students, for future semesters!  Thank you!🙏

Course Evaluations

Outline

  • Rundown
  • Past Exam
  • Q&A

Week 1 - IntroML

  • Terminologies
    • Training, validation, testing
    • Identifying overfitting, underfitting
  • Concrete process
    • Learning algorithm
    • Cross-validation
    • Concept of hyperparamter

Week 2 - Regression

 

  • Problem Setup
  • Analytical solution formula \(\theta^*=\left(\tilde{X}^{\top} \tilde{X}\right)^{-1} \tilde{X}^{\top} \tilde{Y}\) (what's \(\tilde{X}\))
  • When \(\tilde{X}^{\top} \tilde{X}\) not invertible (solutions still exist; just not via the "formula")
    • Practically (two scenarios)
    • Visually (obj fun no longer "bowl" shape, "half-pipe" shape)
    • Mathematically (loss of solution uniqueness)
  • Regularization
    • Motivation, how to, when to.
  • Cross-validation

Week 3 - Gradient Descent

 

  • Gradient vector
  • The algorithm, gradient-descent formula
  • How does "stochastic" gradient descent differ
  • (Convex + small-enough step-size + gradient descent) guarantees convergence to global min (when global min exists)
    • If not convex, can e.g. get stuck in local min
    • If step-size too big, can diverge
    • If stochastic gradient descent, can be "wild"

Week 4 - Classification  

 

  • (Binary) linear classifier (sign based)
  • (Binary) Linear logistic classifier
    • Sigmoid
    • NLL loss
  • Linear separator (equation form, pictorial form with normal vector)​
  • Linear separability
  • How to handle multiple classes
    • Softmax generalization
    • Multiple sigmoids
    • One-vs-one, one-vs-all

Week 5 - Features

 

  • Feature transformations
    • Applying a fixed feature transformation
    • Hand-design a good feature transformation (e.g. towards getting linear separability)
    • Interplay between number of features, quality of features, and quality of learning algorithms
  • Feature encoding
    • One-hot, thermometer, factored, numerical, standardization

Week 6 - Neural Networks

  • Forward-pass (for evaluation)
  • Backward-pass (via backpropogation, for optimization)
  • Source of expressiveness
  • Output layer design
    • dimension, activation, loss
  • Hand-design weights
    • to match some given function form
    • achieve some goal (e.g. separate a given data set)

Week 7 - CNN

  • The convolution operation
    • various hyper-parameters (filter size, padding size, stride) in spatial dimension
    • the 3rd channel/depth dimension
    • reason about in/out shapes.
  • Tailored for vision problems: fully-connected nets + weight sharing.
  • Convolutional filters: "Pattern matching" template
  • Forward pass: convolution operation; backward: backprop to learn filter weights/bias as usual.
  • Independent and parallel processing.
  • Max-pooling and typical "pyramid" stack.

Week 8 - Transformers

  • parallel-processing machines
  • a single input (sentence, image) is tokenized into a sequence: \(n\) tokens, each token in \(d\) dimensional
  • attention mechanism
    • learn three weights \(W_q, W_k, W_v\) to turn raw inputs into query, key, value
    • the mechanics; shapes, softmax, number-crunching
    • the idea of masking

Week 9 - Non-parametric methods

 

  • Decision trees:
    • Flow chart; if/else statement; human-understandable
    • Split dimension, split value, tree structure (root/decision node and leaf)
    • For classification, weighted-average-entropy/accuracy; for regression, MSE.
  • \(k-\)nearest neighbors:
    • memorizes data
    • inefficient in test/prediction time

Week 10 - MDPs

    • Definition (the five tuple)
    • \(\pi\), \(V,\) and \(Q:\) definition and interpretation
    • Policy evaluation: given \(\pi(s)\), calculate \(V(s)\)
      • via summation, or 
      • via Bellman recession (for finite-horizon) or equation (for infinite horizon)
    • Policy optimization: finding optimal policy \(\pi^*(s)\)
      • Toy setup: solve via heuristics
      • More generally: Q value-iteration
    • Interpretation of optimal policy:
      • how various setup changes optimal policy \(\mathrm{R}, \gamma, h\)

Week 11 - Reinforcement Learning

  • How RL setup differs from MDP
  • Q-learning algorithm
    • Forward thinking: given experiences, work out Q-values.
    • Backward thinking: given realized Q-values, work out experiences.
    • Two new hyper-parameters (compared with MDP):
      • \(\epsilon-\)greedy action selection
      • \(\alpha\) the learning rate
  • The idea of fitting parameterized Q-functions via regression
    • can handle larger/continuous state/action space

Week 12 - Unsupervised Learning

  • Unsupervised learning setup
  • Clustering:
    • The clustering algorithm (cluster assignment; cluster center updates)
    • Initialization matters
    • The choice of hyper-parameter \(k\) matters
  • Auto-encoder:
    • The idea of compression->reconstruction
    • Mechanically, exactly the same as any vanilla neural architecture. 

(The demo won't embed in PDF. But the direct link below works.)

import random
terms= ["fall2023", "spring2023", "fall2022", "spring2022", 
        "fall2021", "fall2019", "fall2018", "fall2018"]

qunums = range(1,9)
base_URL = "https://introml.mit.edu/_static/spring24/final/review/final-"

term = random.choice(terms)
num = random.choice(qunums)
print("term:", term)
print("question number:", num)
print(f"Link: {base_URL+term}.pdf")

Resources

General problem-solving tips

More detailed CliffsNotes

General exam tips

  • Arrive 5min early to get settled in.
  • Bring a watch.
  • Bring a pencil (and an eraser).
  • Look over whole exam and strategize for the order you do problems.
  • Bring some water.

 Best of luck! 🍀

Thanks for the Sp24 semester!

introml-sp24-final-review

By Shen Shen

introml-sp24-final-review

  • 103