講者:羅右鈞
日期:2016.10.27
地點:清華大學
Let's say, we've collected some data with the following 4 attributes,
and aim to build model to perform some prediction tasks...
height | dhuwur | altuera | teitei | |
---|---|---|---|---|
0 | 159 | 4.77 | 5.247 | 62.5983 |
1 | 168 | 5.04 | 5.544 | 66.1416 |
... | ... | ... | ... | ... |
N | 173 | 5.189 | 5.709 | 68.1101 |
In fact ... all these attributes are same as "height" in different languages and be converted to different units
height | dhuwur | altuera | teitei | |
---|---|---|---|---|
0 | 159 | 4.77 | 5.247 | 62.5983 |
1 | 168 | 5.04 | 5.544 | 66.1416 |
... | ... | ... | ... | ... |
N | 173 | 5.189 | 5.709 | 68.1101 |
As dimensionality grows: fewer observations per region.
(sparse data)
1-D: 3 regions
2-D: 9 regions
3-D: 27 regions
1000-D: GG
That's why dimension reduction is important
Goal: represent instances with fewer variables
Feature selection
Feature extraction
Define a set of principle components
First m << d components become m new dimensions
e preserves more structure than e
this computes sample variance (N-1 to estimate unbiased variance)
if computing population variance, you can just divide by N
Interesting observation:
x1
x2
e1
e2
slope: 0.400 0.450 0.454... 0.454...
Want vectors e which aren't rotated: Σe = λe
In fact ... (see Read more for proof)
(-1.0, +1.0)
(-1.2, -0.2)
(-14.1 -6.4)
(-33.3 -15.1)
(-6.0, -2.7)
Finally, project original data points to new dimesions by dot product
L1 sentence
Encoder
L1 encoded vector
L2 Decoder
L2
sentence
Encoder
L2
encoded vector
L1 Decoder
Inference
Training