Shen Shen
Sept 6, 2024
(many slides adapted from Tamara Broderick)
6.390-personal@mit.edu
Logistical issues? Personal concerns?
We’d love to help out!
plus ~40 awesome LAs
Optimization + first-principle physics
Recall lab1 intro
Recall lab1 Q1
def random_regress(X, Y, k):
d, n = X.shape
# generate k random hypotheses
ths = np.random.randn(d, k)
th0s = np.random.randn(1, k)
# compute the mean squared error of each hypothesis on the data set
errors = lin_reg_err(X, Y, ths, th0s.T)
# Find the index of the hypotheses with the lowest error
i = np.argmin(errors)
# return the theta and theta0 parameters that define that hypothesis
theta, theta0 = ths[:,i:i+1], th0s[:,i:i+1]
return (theta, theta0), errors[i]
Now training error: \[ J(\theta) = \frac{1}{n} \sum_{i=1}^n\left(\theta^{\top} x^{(i)}-y^{(i)}\right)^2\]
Define
🥰
🥺
Set Gradient \(\nabla_\theta J(\theta) \stackrel{\text { set }}{=} 0\)
The beauty of \( \theta^*=\left(\tilde{X}^{\top} \tilde{X}\right)^{-1} \tilde{X}^{\top} \tilde{Y}\): simple, general, unique minimizer
\(\tilde{X}\) is not full column rank
Quick Summary:
Typically
🥺
🥰
🥰
🥺
good idea to shuffle data first
a way to "reuse" data
it's not to evaluate a hypothesis
rather, it's to evaluate learning algorithm (e.g. hypothesis class choice, hyperparameters)
Could e.g. have an outer loop for picking good hyperparameter or hypothesis class
We'd love to hear your thoughts.