introml-sp24-lec5

new

input $x$

new

prediction $y$

Testing

(predicting)

Recap:

- OLS can have analytical formula and "easy" prediction mechanism

- Regularization

- Cross-validation

- Gradient descent

z = \theta^{\top} x+\theta_0

z = \theta^{\top} x+\theta_0

\{x: \theta^{\top} x+\theta_0>0\}

\{x: \theta^{\top} x+\theta_0>0\}

\{x: \theta^{\top} x+\theta_0<0\}

\{x: \theta^{\top} x+\theta_0<0\}

\{x: \sigma(\theta^{\top} x+\theta_0)>0.5\}

\{x: \sigma(\theta^{\top} x+\theta_0)>0.5\}

\{x: \sigma(\theta^{\top} x+\theta_0)<0.5\}

\{x: \sigma(\theta^{\top} x+\theta_0)<0.5\}

(vanilla, sign-based)

linear classifier

linear

logistic regression (classifier)

Polynomial features for classification

Linearly separable in $\Phi(x) = x^2$ space (e.g., sign( $1.5-\Phi(x)$ ) is one such perfectly-separating classifier)

Not linearly separable in $x$ space

z = \theta^{\top} x+\theta_0

z = \theta^{\top} x+\theta_0

\{x: \theta^{\top} x+\theta_0>0\}

\{x: \theta^{\top} x+\theta_0>0\}

\{x: \theta^{\top} x+\theta_0<0\}

\{x: \theta^{\top} x+\theta_0<0\}

(vanilla, sign-based)

linear classifier

\{x: f(x)<0\}

\{x: f(x)<0\}

z = f(\Phi(x)) \\=\Phi_1+ \Phi_2 \\ = x_1^2 + x_2^2

z = f(\Phi(x)) \\=\Phi_1+ \Phi_2 \\ = x_1^2 + x_2^2

\{x: f(x)>0\}

\{x: f(x)>0\}

using

polynomial feature transformation

Polynomial features for regression

x

x

y

y

9 data points.
Each data has one-dimensional feature $x \in \mathbb{R}$
Label $y \in \mathbb{R}$

Choose $k = 1$
$h_{\theta}(x) = \theta_0 + \theta_1 x$
How many scalar parameters to learn?
2 scalar parameters

x

x

y

y

Choose $k = 4$
$h_{\theta}(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 + \theta_4 x^4$
How many scalar parameters to learn?
5 scalar parameters

9 data points.
Each data has one-dimensional feature $x \in \mathbb{R}$
Label $y \in \mathbb{R}$

Polynomial features for regression

x

x

y

y

Choose $k = 10$
$h_{\theta}(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \dots \theta_k x^k$
How many scalar parameters to learn?
11 scalar parameters

The fit is perfect but "wild" (compared with the true function).
Overfitting.
It occurs when we have too expressive of a model (e.g., too many learnable parameters, too few data points to pin these parameters down).

Polynomial features for regression

First, need goal & data. E.g. diagnose whether people have heart disease based on their available information

y^{(1)}

y^{(1)}

y^{(2)}

y^{(2)}

x^{(1)}

x^{(1)}

x^{(2)}

x^{(2)}

\dots

\dots

\dots

\dots

)

)

Resting heart rate and income are real numbers already
Can directly use, but may not want to (see next slide)

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

+

+

+

+

+

+

+

+

+

+

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

)

)

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

+

+

+

+

+

+

+

+

+

+

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

)

)

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

+

+

+

+

+

+

+

+

+

+

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

What about jobs?

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

+

+

+

+

+

+

+

+

+

+

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

)

)

Problem with this idea:
- Ordering would matter
- Incremental in the "job" would matter (by a fixed $\theta_{\text {job }}$ amount)

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

+

+

+

+

+

+

+

+

+

+

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

)

)

\theta_{\text {job1}} \phi_{\text {job1}} + \theta_{\text {job2}} \phi_{\text {job2}} + \theta_{\text {job3}} \phi_{\text {job3}} + \theta_{\text {job4}} \phi_{\text {job4}} +\theta_{\text {job5}} \phi_{\text {job5}}

\theta_{\text {job1}} \phi_{\text {job1}} + \theta_{\text {job2}} \phi_{\text {job2}} + \theta_{\text {job3}} \phi_{\text {job3}} + \theta_{\text {job4}} \phi_{\text {job4}} +\theta_{\text {job5}} \phi_{\text {job5}}

Better idea: One-hot encoding

)

)

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {age} \\ \text {} }}x_{\substack{\text {age} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}

+

+

+

+

+

+

+

+

+

+

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

y_{\substack{\text { heart } \\ \text {disease}}} = \text{sign}(

\theta_{\text {job1}} \phi_{\text {job1}} + \theta_{\text {job2}} \phi_{\text {job2}} + \theta_{\text {job3}} \phi_{\text {job3}} + \theta_{\text {job4}} \phi_{\text {job4}} +\theta_{\text {job5}} \phi_{\text {job5}}

\theta_{\text {job1}} \phi_{\text {job1}} + \theta_{\text {job2}} \phi_{\text {job2}} + \theta_{\text {job3}} \phi_{\text {job3}} + \theta_{\text {job4}} \phi_{\text {job4}} +\theta_{\text {job5}} \phi_{\text {job5}}

\theta_{\text {combo1}} \phi_{\text {combo1}} + \theta_{\text {combo2}} \phi_{\text {combo2}} + \theta_{\text {combo3}} \phi_{\text {combo3}} + \theta_{\text {combo4}} \phi_{\text {combo4}}

\theta_{\text {combo1}} \phi_{\text {combo1}} + \theta_{\text {combo2}} \phi_{\text {combo2}} + \theta_{\text {combo3}} \phi_{\text {combo3}} + \theta_{\text {combo4}} \phi_{\text {combo4}}

\theta_{\text {combo1}} \phi_{\text {combo1}} + \theta_{\text {combo2}} \phi_{\text {combo2}} + \theta_{\text {combo3}} \phi_{\text {combo3}} + \theta_{\text {combo4}} \phi_{\text {combo4}}

\theta_{\text {combo1}} \phi_{\text {combo1}} + \theta_{\text {combo2}} \phi_{\text {combo2}} + \theta_{\text {combo3}} \phi_{\text {combo3}} + \theta_{\text {combo4}} \phi_{\text {combo4}}

Better idea: factored encoding

\theta_{\text {pain-pill}} \phi_{\text {pain-pill}} + \theta_{\text {beta-pill}} \phi_{\text {beta-pill}}

\theta_{\text {pain-pill}} \phi_{\text {pain-pill}} + \theta_{\text {beta-pill}} \phi_{\text {beta-pill}}

Recall, if used one-hot, need exact combo in data to learn corresponding parameter

Thermometer encoding

Numerical data: order on data values, and differences in value are meaningful
Categorical data: no order on data values, one-hot
Ordinal data: order on data values, but differences not meaningful

Strongly disagree	Disagree	Neutral	Agree	Strongly agree
1	2	3	4	5

Strongly disagree	Disagree	Neutral	Agree	Strongly agree
1,0,0,0,0	1,1,0,0,0	1,1,1,0,0	1,1,1,1,0	1,1,1,1,1

\theta_{\text {strong-disagree-base}} \phi_{\text {strong-disagree-base}} +

\theta_{\text {strong-disagree-base}} \phi_{\text {strong-disagree-base}} +

\theta_{\text {slightly-more-agreement}} \phi_{\text {slightly-more-agreement}} +

\theta_{\text {slightly-more-agreement}} \phi_{\text {slightly-more-agreement}} +

\theta_{\text {from-disagree-to-neutral}} \phi_{\text {from-disagree-to-neutral}} +

\theta_{\text {from-disagree-to-neutral}} \phi_{\text {from-disagree-to-neutral}} +

\theta_{\text {from-neutral-to-agree}} \phi_{\text {from-neutral-to-agree}} +

\theta_{\text {from-neutral-to-agree}} \phi_{\text {from-neutral-to-agree}} +

\theta_{\text {from-agree-to-strongly-agree}} \phi_{\text {from-agree-to-strongly-agree}}

\theta_{\text {from-agree-to-strongly-agree}} \phi_{\text {from-agree-to-strongly-agree}}

\theta_{\text {how-agreed}} \phi_{\text {how-agreed}}

\theta_{\text {how-agreed}} \phi_{\text {how-agreed}}

Intro to Machine Learning

Lecture 5: Features

Midterm exam heads-up

Outline

Outline

Polynomial features for classification

Polynomial Basis Construction

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Polynomial features for regression

Quick summary

Outline

A more-complete/realistic ML analysis

A more-complete/realistic ML analysis

Encode data in usable form

Encoding numerical data

Encoding numerical data

Encoding numerical data

Better idea: One-hot encoding

Better idea: factored encoding

Thermometer encoding

Thermometer encoding

Thermometer encoding

Summary

Thanks!

introml-sp24-lec5

introml-sp24-lec5

Shen Shen

Intro to Machine Learning

Lecture 5: Features

introml-sp24-lec5

More from Shen Shen