Lecture 5: Features

 

Shen Shen

Sept 27, 2024

Intro to Machine Learning

Outline

  • Recap, linear models and beyond
  • Systematic feature transformations
    • Polynomial features
    • Expressive power
  • Hand-crafting features​​
    • One-hot
    • Factored
    • Standardization/normalization
    • Thermometer

linear regressor y=θx+θ0y = \theta^{\top} x+\theta_0

Recap:

the regressor is linear in the feature xx

z=θx+θ0z = \theta^{\top} x+\theta_0
z = \theta^{\top} x+\theta_0
{x:θx+θ0>0}\{x: \theta^{\top} x+\theta_0>0\}
\{x: \theta^{\top} x+\theta_0>0\}
{x:θx+θ0<0}\{x: \theta^{\top} x+\theta_0<0\}
\{x: \theta^{\top} x+\theta_0<0\}

linear (sign-based) classifier

Recap:

separator

{x:θx+θ0=0}\{x: \theta^{\top} x+\theta_0 = 0\}
\{x: \theta^{\top} x+\theta_0 = 0\}

the separator is linear in the feature xx

{x:σ(θx+θ0)>0.5}\{x: \sigma(\theta^{\top} x+\theta_0)>0.5\}
\{x: \sigma(\theta^{\top} x+\theta_0)>0.5\}
{x:σ(θx+θ0)<0.5}\{x: \sigma(\theta^{\top} x+\theta_0)<0.5\}
\{x: \sigma(\theta^{\top} x+\theta_0)<0.5\}

linear logistic classifier

g(x)=σ(θx+θ0)g(x)=\sigma\left(\theta^{\top} x+\theta_0\right)

Recap:

separator

the separator is linear in the feature xx

{x:θx+θ0=0}\{x: \theta^{\top} x+\theta_0 = 0\}
\{x: \theta^{\top} x+\theta_0 = 0\}

Image classification played a pivotal role in kicking off the current wave of AI enthusiasm.

Linear classification played a pivotal role in kicking off the first wave of AI enthusiasm.

👆

Not linearly separable.

👇

Linear tools cannot solve interesting tasks.

Linear tools cannot, by themselves, solve interesting tasks.

Many cool ideas can "help out" linear tools. We'll focus on one today.

Outline

  • Recap, linear models and beyond
  • Systematic feature transformations
    • Polynomial features
    • Expressive power
  • Hand-crafting features​​
    • One-hot
    • Factored
    • Standardization/normalization
    • Thermometer

old features xRdx \in \mathbb{R^d}

\longrightarrow
\longrightarrow

new features ϕ(x)Rd\phi(x) \in \mathbb{R^{d^{\prime}}}

non-linear in xx

linear in ϕ\phi

\longrightarrow
\longrightarrow

non-linear transformation

θ1ϕ1(x)+θ2ϕ2(x)+θdϕd(x)\theta_1\phi_1(x) + \theta_2\phi_2(x) + \dots \theta_{d'}\phi_{d'}(x)
\theta_1\phi_1(x) + \theta_2\phi_2(x) + \dots \theta_{d'}\phi_{d'}(x)

Linearly separable in ϕ(x)=x2\phi(x) = x^2 space

Not linearly separable in xx space

3-3
-3
2-2
-2
1-1
-1
00
0
11
1
22
2
33
3
44
4
55
5
66
6
77
7
88
8
99
9
xx
x
3-3
-3
2-2
-2
1-1
-1
00
0
11
1
22
2
33
3
44
4
55
5
66
6
77
7
88
8
99
9
ϕ(x)\phi(x)
\phi(x)

transform via ϕ(x)=x2\phi(x) = x^2

\Downarrow
\Downarrow

Linearly separated in ϕ(x)=x2\phi(x) = x^2 space, e.g. predict positive if ϕ3\phi \geq 3

Non-linearly separated in xx space,  e.g. predict positive if  x23x^2 \geq 3

3-3
-3
2-2
-2
1-1
-1
00
0
11
1
22
2
33
3
44
4
55
5
66
6
77
7
88
8
99
9
xx
x
3-3
-3
2-2
-2
1-1
-1
00
0
11
1
22
2
33
3
44
4
55
5
66
6
77
7
88
8
99
9
ϕ(x)\phi(x)
\phi(x)
\Downarrow
\Downarrow

transform via ϕ(x)=x2\phi(x) = x^2

{x:x12+x22>0}\{ x: x_1^2+x_2^2>0\}
\{ x: x_1^2+x_2^2>0\}
{x:x12+x22<0}\{ x: x_1^2+x_2^2<0\}
\{ x: x_1^2+x_2^2<0\}
=x12= x_1^2
= x_1^2
ϕ2\phi_2
\phi_2
z=ϕ1+ϕ2z = \phi_1 + \phi_2
z = \phi_1 + \phi_2
=x22=x_2^2
=x_2^2
ϕ1\phi_1
\phi_1
x1x_1
x_1
x2x_2
x_2
z=x12+x22z = x_1^2 + x_2^2
z = x_1^2 + x_2^2

systematic polynomial features construction

d=1d = 1
d = 1
d=2d = 2
d = 2
\dots
\dots
\dots
\dots
  • Elements in the basis are the monomials of original features raised up to power kk
  • With a given dd and a fixed kk, the basis is fixed
1,x11, x_{1}
1, x_{1}
k=1k = 1
k = 1
1,x1,x121, x_{1}, x_{1}^{2}
1, x_{1}, x_{1}^{2}
k=2k = 2
k = 2
1,x1,x12,x131, x_{1}, x_{1}^{2}, x_{1}^{3}
1, x_{1}, x_{1}^{2}, x_{1}^{3}
k=3k = 3
k = 3
11
1
1,x1,x21, x_{1}, x_{2}
1, x_{1}, x_{2}
1,x1,x2,x12,x1x2,x221, x_{1}, x_{2}, x_{1}^{2}, x_{1}x_{2}, x_{2}^{2}
1, x_{1}, x_{2}, x_{1}^{2}, x_{1}x_{2}, x_{2}^{2}
1,x1,x2,x12,x1x2,x22,x13,x12x2,x1x22,x231, x_{1}, x_{2}, x_{1}^{2}, x_{1}x_{2}, x_{2}^{2}, x_{1}^{3}, x_{1}^{2}x_{2}, x_{1}x_{2}^{2}, x_{2}^{3}
1, x_{1}, x_{2}, x_{1}^{2}, x_{1}x_{2}, x_{2}^{2}, x_{1}^{3}, x_{1}^{2}x_{2}, x_{1}x_{2}^{2}, x_{2}^{3}
k=0k = 0
k = 0
11
1

9 data points; each has feature xR,x \in \mathbb{R},  label yRy \in \mathbb{R}

  • Choose k=1k = 1
  • New features ϕ=[1;x]\phi=[1; x]
  • h(x;θ)=θ0+θ1x h(x; \textcolor{blue}{\theta}) = \textcolor{blue}{\theta_0} + \textcolor{blue}{\theta_1} \textcolor{gray}{x}
  • Learn 2 parameters for linear function
xx
x
yy
y
xx
x
yy
y
  • Choose k=2k = 2
  • New features ϕ=[1;x;x2]\phi=[1; x; x^2]
  • h(x;θ)=θ0+θ1x+θ2x2 h(x; \textcolor{blue}{\theta}) = \textcolor{blue}{\theta_0} + \textcolor{blue}{\theta_1} \textcolor{gray}{x} +\textcolor{blue}{\theta_2} \textcolor{gray}{x^2}
  • Learn 3 parameters for quadratic function
xx
x
yy
y
  • Choose k=5k = 5
  • New features ϕ=[1;x;x2;x3;x4;x5]\phi=[1; x; x^2;x^3;x^4;x^5]
  • h(x;θ)=θ0+θ1x+θ2x2+θ3x3+θ4x4+θ5x5 h(x; \textcolor{blue}{\theta}) = \textcolor{blue}{\theta}_0 + \textcolor{blue}{\theta_1} \textcolor{gray}{x} + \textcolor{blue}{\theta_2} \textcolor{gray}{x^2} + \textcolor{blue}{\theta_3} \textcolor{gray}{x^3} + \textcolor{blue}{\theta_4} \textcolor{gray}{x^4} + \textcolor{blue}{\theta_5} \textcolor{gray}{x^5}
  • Learn 6 parameters for degree-5 polynomial function
k=7k=7
k=7
k=8k=8
k=8
k=10k=10
k=10

Underfitting

Appropriate model

Overfitting

high error on train set

high error on test set

low error on train set

low error on test set

very low error on train set 

very high error on test set

k=1k=1
k=1
k=2k=2
k=2
k=10k=10
k=10

Underfitting

Appropriate model

Overfitting

  • kk is a hyperparameter that controls the capacity (expressiveness) of the hypothesis class.
  • Complex models with many rich features and free parameters have high capacity.
  • How to choose k?k? Validation/cross-validation.
k=1k=1
k=1
k=2k=2
k=2
k=10k=10
k=10

Similar overfitting can happen in classification

Using polynomial features of order 3

Quick summary

  • Linear models are mathematically and algorithmically convenient but not expressive enough -- by themselves -- for most jobs.
  • We can express really rich hypothesis classes by performing a fixed non-linear feature transformation first, then applying our linear regression or classification methods.
  • Can think of fixed transformation as "adapters", enabling us to use old tools in broader situations. 
  • Standard feature transformations: polynomials; radial basis functions, absolute-value function.
  • Historically, for a period of time, the gist of ML boils down to "feature engineering".
  • Nowadays, neural networks can automatically extract out features. 

Outline

  • Recap, linear models and beyond
  • Systematic feature transformations
    • Polynomial features
    • Expressive power
  • Hand-crafting features​​
    • One-hot
    • Factored
    • Standardization/normalization
    • Thermometer

A more realistic ML analysis

1. Establish a high-level goal, and find good data.

2. Encode data in useful form for the ML algorithm. 

3. Choose a loss, and a regularizer. Write an objective function to optimize.

4. Optimize the objective function & return a hypothesis.

5. Evaluate, validate, interpret, revisit or revise previous steps as needed. 

so far we've focused on 3-4 only.

Encode data in useful form for the ML algorithm.

Identify relevant info and encode as real numbers

Encode in such a way that's reasonable for the task. 

\dots
\dots

Example: diagnose whether people have heart disease based on their available info.

x(1)x^{(1)}
x^{(1)}
y(1)y^{(1)}
y^{(1)}
\underbrace{\hspace{.7cm}}
\underbrace{\hspace{.7cm}}

label

\underbrace{\hspace{6cm}}
\underbrace{\hspace{6cm}}

features

has heart disease? pain? job medicines resting heart rate (bpm) family income (USD)
p1 no no nurse aspirin 55 133000
p2 no no admin beta blockers, aspirin 71 34000
p3 yes yes nurse beta blockers 89 40000
p4 no no doctor none 67 120000
  • go collect training data.
  • ​Turn binary labels to {0,1}, save mapping to recover predictions of new points 
encoding = {"yes": 1, "no": 0}
zz
z
==
=
has heart disease? pain? job medicines resting heart rate (bpm) family income (USD)
p1 0 no nurse aspirin 55 133000
p2 0 no admin beta blockers, aspirin 71 34000
p3 1 yes nurse beta blockers 89 40000
p4 0 no doctor none 67 120000
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobxjob\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
σ(\sigma(
\sigma(
))
)
has heart disease? pain? job medicines resting heart rate (bpm) family income (USD)
p1 no no nurse aspirin 55 133000
p2 no no admin beta blockers, aspirin 71 34000
p3 yes yes nurse beta blockers 89 40000
p4 no no doctor none 67 120000
σ(\sigma(
\sigma(
))
)

risk factor

prob(heart disease)

  • ​Encode binary feature answers to {0,1}, has nice interpretation
pain? job medicines resting heart rate (bpm) family income (USD)
p1 0 nurse aspirin 55 133000
p2 0 admin beta blockers, aspirin 71 34000
p3 1 nurse beta blockers 89 40000
p4 0 doctor none 67 120000
encoding = {"yes": 1, "no": 0}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobxjob\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpain\theta_{\substack{\text {pain} \\ \text {} }}
\theta_{\substack{\text {pain} \\ \text {} }}
θjobxjob\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

person feeling pain has 

person not feeling pain has 

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θjobxjob\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
z=z =
z =

😍

Outline

  • Recap, linear models and beyond
  • Systematic feature transformations
    • Polynomial features
    • Expressive power
  • Hand-crafting features​​
    • One-hot
    • Factored
    • Standardization/normalization
    • Thermometer

problem with this idea:

  • Ordering matters
  • Incremental in job category affects zz by a fixed θjob \theta_{\text {job }}amount

For "jobs", if use natural number encoding:

encoding = {"nurse": 1, "admin": 2, "pharmacist": 3, "doctor": 4, "social worker": 5}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobxjob\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjob\theta_{\substack{\text {job} \\ \text {} }}
\theta_{\substack{\text {job} \\ \text {} }}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

nurse has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
2θjob2\theta_{\substack{\text {job} \\ \text {} }}
2\theta_{\substack{\text {job} \\ \text {} }}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

admin has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
3θjob3\theta_{\substack{\text {job} \\ \text {} }}
3\theta_{\substack{\text {job} \\ \text {} }}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

pharmacist has

🥺

one_hot_encoding = {
  "nurse":         [1, 0, 0, 0, 0], # Φ{job1}
  "admin":         [0, 1, 0, 0, 0], # Φ{job2}
  "pharmacist":    [0, 0, 1, 0, 0], # Φ{job3}
  "doctor":        [0, 0, 0, 1, 0], # Φ{job4}
  "social_worker": [0, 0, 0, 0, 1]} # Φ{job5}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjob1\theta_{\substack{\text {job1} \\ \text {} }}
\theta_{\substack{\text {job1} \\ \text {} }}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

nurse has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjob2\theta_{\substack{\text {job2} \\ \text {} }}
\theta_{\substack{\text {job2} \\ \text {} }}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

admin has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjob3\theta_{\substack{\text {job3} \\ \text {} }}
\theta_{\substack{\text {job3} \\ \text {} }}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

pharmacist has

😍

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θpillxpill\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
θjobTxjob\theta^T _{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T _{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θjob1ϕjob1+θjob2ϕjob2+θjob3ϕjob3+θjob4ϕjob4+θjob5ϕjob5\theta_{\text {job1}} \phi_{\text {job1}} + \theta_{\text {job2}} \phi_{\text {job2}} + \theta_{\text {job3}} \phi_{\text {job3}} + \theta_{\text {job4}} \phi_{\text {job4}} +\theta_{\text {job5}} \phi_{\text {job5}}
\theta_{\text {job1}} \phi_{\text {job1}} + \theta_{\text {job2}} \phi_{\text {job2}} + \theta_{\text {job3}} \phi_{\text {job3}} + \theta_{\text {job4}} \phi_{\text {job4}} +\theta_{\text {job5}} \phi_{\text {job5}}
one_hot_encoding = {
  "nurse":         [1, 0, 0, 0, 0], # Φ{job1}
  "admin":         [0, 1, 0, 0, 0], # Φ{job2}
  "pharmacist":    [0, 0, 1, 0, 0], # Φ{job3}
  "doctor":        [0, 0, 0, 1, 0], # Φ{job4}
  "social_worker": [0, 0, 0, 0, 1]} # Φ{job5}

😍

pain? job medicines resting heart rate (bpm) family income (USD)
p1 0 [1,0,0,0,0] aspirin 55 133000
p2 0 [0,1,0,0,0] beta blockers, aspirin 71 34000
p3 1 [1,0,0,0,0] beta blockers 89 40000
p4 0 [0,0,0,1,0] none 67 120000

Outline

  • Recap, linear models and beyond
  • Systematic feature transformations
    • Polynomial features
    • Expressive power
  • Hand-crafting features​​
    • One-hot
    • Factored
    • Standardization/normalization
    • Thermometer
one_hot_encoding = {
  "aspirin":      [1, 0, 0, 0], #Φ{combo1}
  "aspirin & bb": [0, 1, 0, 0], #Φ{combo2}
  "bb":           [0, 0, 1, 0], #Φ{combo3}
  "none":         [0, 0, 0, 1]} #Φ{combo4}

What about one-hot encoding?

For medicines, hopefully obvious why natural number encoding isn't a good idea. 

the natural "association" in combo1, combo2, and combo3 are lost

also, if a combo is very rare (which happens), say only 1 out of 1k surveyed person took combo2, then very hard to learn a meaningful θcombo2\theta_{\text{combo2}}

θcombo1ϕcombo1+θcombo2ϕcombo2+θcombo3ϕcombo3+θcombo4ϕcombo4\theta_{\text {combo1}} \phi_{\text {combo1}} + \theta_{\text {combo2}} \phi_{\text {combo2}} + \theta_{\text {combo3}} \phi_{\text {combo3}} + \theta_{\text {combo4}} \phi_{\text {combo4}}
\theta_{\text {combo1}} \phi_{\text {combo1}} + \theta_{\text {combo2}} \phi_{\text {combo2}} + \theta_{\text {combo3}} \phi_{\text {combo3}} + \theta_{\text {combo4}} \phi_{\text {combo4}}

🥺

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
θjobTxjob\theta^T _{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T _{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}

😍

factored_encoding = {
  # encode as answer to
  # [taking aspirin?, taking bb?]
  # [Φ{aspirin}, Φ{bb}]
    "aspirin":      [1, 0],
    "aspirin & bb": [1, 1], 
    "bb":           [0, 1], 
    "none":         [0, 0]}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
θjobTxjob\theta^T _{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T _{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θaspirinϕaspirin+θbeta-blockersϕbeta-blockers\theta_{\text {aspirin}} \phi_{\text {aspirin}} + \theta_{\text {beta-blockers}} \phi_{\text {beta-blockers}}
\theta_{\text {aspirin}} \phi_{\text {aspirin}} + \theta_{\text {beta-blockers}} \phi_{\text {beta-blockers}}
factored_encoding = {
  # encode as answer to
  # [taking aspirin?, taking bb?]
  # [Φ{aspirin}, Φ{bb}]
    "aspirin":      [1, 0],
    "aspirin & bb": [1, 1], 
    "bb":           [0, 1], 
    "none":         [0, 0]}

😍

pain? job medicines resting heart rate (bpm) family income (USD)
p1 0 [1,0,0,0,0] [1,0] 55 133000
p2 0 [0,1,0,0,0] [1,1] 71 34000
p3 1 [1,0,0,0,0] [0,1] 89 40000
p4 0 [0,0,0,1,0] [0,0] 67 120000

Outline

  • Recap, linear models and beyond
  • Systematic feature transformations
    • Polynomial features
    • Expressive power
  • Hand-crafting features​​
    • One-hot
    • Factored
    • Standardization/normalization
    • Thermometer

🥺

resting heart rate (bpm) family income (USD)
p1 55 133000
p2 71 34000
p3 89 40000
p4 67 120000
30k30k
30k
31k31k
31k
32k32k
32k
33k33k
33k
34k34k
34k
2k2k
2k
1k1k
1k
00
0
1k-1k
-1k
2k-2k
-2k
  • Idea: standardize numerical data. For iith feature and data point jj:
ϕi(j)=xi(j)meanistddevi\phi_i^{(j)}=\frac{x_i^{(j)}-\operatorname{mean}_i}{\operatorname{std dev}_i}
\phi_i^{(j)}=\frac{x_i^{(j)}-\operatorname{mean}_i}{\operatorname{std dev}_i}

may also be easier to visualize and interpret learned parameters if we standardize data.

😍

pain? job medicines resting heart rate (bpm) family income (USD)
p1 0 [1,0,0,0,0] [1,0] -1.5 2.075
p2 0 [0,1,0,0,0] [1,1] 0.1 -0.4
p3 1 [1,0,0,0,0] [0,1] 1.9 -0.25
p4 0 [0,0,0,1,0] [0,0] -0.3 1.75

Outline

  • Recap, linear models and beyond
  • Systematic feature transformations
    • Polynomial features
    • Expressive power
  • Hand-crafting features​​
    • One-hot
    • Factored
    • Standardization/normalization
    • Thermometer
pain? job medicines resting heart rate (bpm) family income (USD) agree exercising helps?
p1 0 [1,0,0,0,0] [1,0] -1.5 2.075 strongly disagree
p2 0 [0,1,0,0,0] [1,1] 0.1 -0.4 disagree
p3 1 [1,0,0,0,0] [0,1] 1.9 -0.25 neutral
p4 0 [0,0,0,1,0] [0,0] -0.3 1.75 agree 

Imagine we added another question in survey: "how much do you agree that exercising could help preventing heart disease?"

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
+θdeg ofagreementxdeg ofagreement+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}
+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}

problem with this idea (again):

  • Ordering matters
  • Incremental in job category affects zz by a fixed θdeg ofagreement\theta_{\substack{\text{deg of} \\ \text{agreement}}}amount

For "degree of agreemenet", if use natural number encoding:

encoding = {"strongly agree": 1, "agree": 2, "neutral": 3, "disagree": 4, "strongly disagree": 5}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

🥺

+θdeg ofagreementxdeg ofagreement+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}
+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}

disagreed has

+4θdeg ofagreement+ 4 \theta_{\substack{\text{deg of} \\ \text{agreement}}}
+ 4 \theta_{\substack{\text{deg of} \\ \text{agreement}}}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

neutral has

+3θdeg ofagreement+ 3 \theta_{\substack{\text{deg of} \\ \text{agreement}}}
+ 3 \theta_{\substack{\text{deg of} \\ \text{agreement}}}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

agreed has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
+θdeg ofagreement+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}
+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}
θlevel1ϕlevel1+θlevel2ϕlevel2+θlevel3ϕlevel3+θlevel4ϕlevel4+θlevel5ϕlevel5\theta_{\text {level1}} \phi_{\text {level1}} + \theta_{\text {level2}} \phi_{\text {level2}} + \theta_{\text {level3}} \phi_{\text {level3}} + \theta_{\text {level4}} \phi_{\text {level4}} +\theta_{\text {level5}} \phi_{\text {level5}}
\theta_{\text {level1}} \phi_{\text {level1}} + \theta_{\text {level2}} \phi_{\text {level2}} + \theta_{\text {level3}} \phi_{\text {level3}} + \theta_{\text {level4}} \phi_{\text {level4}} +\theta_{\text {level5}} \phi_{\text {level5}}
one_hot_encoding = {
  "strongly disagree":[1, 0, 0, 0, 0], # Φ{level1}
  "disagree":         [0, 1, 0, 0, 0], # Φ{level2}
  "neutral":          [0, 0, 1, 0, 0], # Φ{level3}
  "agree":            [0, 0, 0, 1, 0], # Φ{level4}
  "strongly agree":   [0, 0, 0, 0, 1]} # Φ{level5}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
+θdeg ofagreementxdeg ofagreement+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}
+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}

disagreed has

+θlevel2+ \theta_{\substack{\text{level2} \\ \text{}}}
+ \theta_{\substack{\text{level2} \\ \text{}}}
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

neutral has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
+θlevel3+ \theta_{\substack{\text{level3} \\ \text{}}}
+ \theta_{\substack{\text{level3} \\ \text{}}}

agreed has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
+θlevel4+ \theta_{\substack{\text{level4} \\ \text{}}}
+ \theta_{\substack{\text{level4} \\ \text{}}}

🥺

thermometer_encoding = {
  "strongly disagree":[1, 0, 0, 0, 0], # Φ{level1}
  "disagree":         [1, 1, 0, 0, 0], # Φ{level2}
  "neutral":          [1, 1, 1, 0, 0], # Φ{level3}
  "agree":            [1, 1, 1, 1, 0], # Φ{level4}
  "strongly agree":   [1, 1, 1, 1, 1]} # Φ{level5}

😍

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
+θdeg ofagreementxdeg ofagreement+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}
+ \theta_{\substack{\text{deg of} \\ \text{agreement}}}x_{\substack{\text{deg of} \\ \text{agreement}}}
θlevel1ϕlevel1+θlevel2ϕlevel2+θlevel3ϕlevel3+θlevel4ϕlevel4+θlevel5ϕlevel5\theta_{\text {level1}} \phi_{\text {level1}} + \theta_{\text {level2}} \phi_{\text {level2}} + \theta_{\text {level3}} \phi_{\text {level3}} + \theta_{\text {level4}} \phi_{\text {level4}} +\theta_{\text {level5}} \phi_{\text {level5}}
\theta_{\text {level1}} \phi_{\text {level1}} + \theta_{\text {level2}} \phi_{\text {level2}} + \theta_{\text {level3}} \phi_{\text {level3}} + \theta_{\text {level4}} \phi_{\text {level4}} +\theta_{\text {level5}} \phi_{\text {level5}}

disagreed has

+(θlevel1+θlevel2)+(\theta_{\substack{\text{level1} \\ \text{}}} + \theta_{\substack{\text{level2} \\ \text{}}})
+(\theta_{\substack{\text{level1} \\ \text{}}} + \theta_{\substack{\text{level2} \\ \text{}}})
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

neutral has

+(θlevel1+θlevel2+θlevel3)+ (\theta_{\substack{\text{level1} \\ \text{}}} + \theta_{\substack{\text{level2} \\ \text{}}} + \theta_{\substack{\text{level3} \\ \text{}}})
+ (\theta_{\substack{\text{level1} \\ \text{}}} + \theta_{\substack{\text{level2} \\ \text{}}} + \theta_{\substack{\text{level3} \\ \text{}}})
θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =

agreed has

θ heart  rate x heart  rate \theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
\theta_{\substack{\text { heart } \\ \text { rate } }}x_{\substack{\text { heart } \\ \text { rate }}}
θpainxpain \theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
\theta_{\substack{\text {pain} \\ \text {} }}x_{\substack{\text {pain } \\ \text {}}}
θjobTxjob\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
\theta^T_{\substack{\text {job} \\ \text {} }}x_{\substack{\text {job} \\ \text {}}}
θpillTxpill\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
\theta^T_{\substack{\text {pill} \\ \text {} }}x_{\substack{\text {pill} \\ \text {}}}
θincomexincome\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
\theta_{\substack{\text {income} \\ \text {} }}x_{\substack{\text {income} \\ \text {}}}
++
+
++
+
++
+
++
+
z=z =
z =
+(θlevel1+θlevel2+θlevel3+θlevel4)+ (\theta_{\substack{\text{level1} \\ \text{}}} +\theta_{\substack{\text{level2} \\ \text{}}} + \theta_{\substack{\text{level3} \\ \text{}}} + \theta_{\substack{\text{level4} \\ \text{}}})
+ (\theta_{\substack{\text{level1} \\ \text{}}} +\theta_{\substack{\text{level2} \\ \text{}}} + \theta_{\substack{\text{level3} \\ \text{}}} + \theta_{\substack{\text{level4} \\ \text{}}})

Summary

  • Linear models are mathematically and algorithmically convenient but not expressive enough -- by themselves -- for most jobs.
  • We can express really rich hypothesis classes by performing a fixed non-linear feature transformation first, then applying our linear (regression or classification) methods.
  • When we “set up” a problem to apply ML methods to it, it’s important to encode the inputs in a way that makes it easier for the ML method to exploit the structure.
  • Foreshadowing of neural networks, in which we will learn complicated continuous feature transformations.

Thanks!

We'd love to hear your thoughts.

6.390 IntroML (Fall24) - Lecture 5 Features

By Shen Shen

6.390 IntroML (Fall24) - Lecture 5 Features

  • 91