Deep Multi-task Representation Learning on Tabular Data
Presents: Jacobo G. González León
7th PDTA
Thesis advisors:
- PhD. Miguel Félix Mata Rivera
- PhD. Rolando Menchaca Méndez
![](https://www.upiita.ipn.mx/images/upiita-logo.png)
![](https://vignette.wikia.nocookie.net/doblaje/images/8/89/LOGOTIPO_IPN.png/revision/latest?cb=20160108235248&format=original&path-prefix=es)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/848972/images/5612173/pasted-from-clipboard.png)
![](https://www.upiita.ipn.mx/images/upiita-logo.png)
![](https://vignette.wikia.nocookie.net/doblaje/images/8/89/LOGOTIPO_IPN.png/revision/latest?cb=20160108235248&format=original&path-prefix=es)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/848972/images/5612173/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/848972/images/8667617/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100099/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100104/pasted-from-clipboard.png)
(Mixed) Tabular Data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100105/pasted-from-clipboard.png)
\(X\) : features
\(y^T\) : target
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100106/pasted-from-clipboard.png)
4 categorical, 30 numerical
2 categorical
Correlation
Preprocessed Data (VSM)
\(X\) : features
296 numerical (from categorical), 30 numerical
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100128/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100131/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100143/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100144/pasted-from-clipboard.png)
Long-tail distribution
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100149/pasted-from-clipboard.png)
Correlation
Preprocessed Data (VSM)
\(y\) : targets
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100396/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100397/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10099990/JCR_diagrams-T0.drawio_1_.png)
Isolated Learning
Autoencoder
\(f(X)\)
\(g(h)\)
\( f(\hat{y}_i) = \frac{e^{\hat{y}_i}}{\sum_{i}^C {e^{\hat{y}_i}}} \)
\( \mathcal{L}= {\sum_{i}^C {e^{\hat{y}_i}}}{\log{f(\hat{y}_i)}} \)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100045/JCR_diagrams-T1.drawio_1_.png)
Multi-class Classifier
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100047/JCR_diagrams-T2.drawio.png)
Loss function
Loss function
softmax
cross entropy loss
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100051/pasted-from-clipboard.png)
\( \mathcal{L}= \text{\textbardbl}{X-\hat{X}}\text{\textbardbl} \)
reconstruction error
TASK 0
TASK 1
TASK 2
1)
2)
Research question
Can we design a centralized architecture that learns from multi-task simultaneously ?
Design a low-dimensional multitasking representation for mixed preprocessed data
Problem definition
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100063/JCR_diagrams-MTL.drawio_1_.png)
Loss function
weighted cross entropy loss
Multi-task Learning
TASK 1
TASK 2
\( \mathcal{L}= {\psi_1} {\sum_{i}^C {e^{\hat{y}_i^1}}}{\log{f(\hat{{y}_i^1})}} + {\psi_2} {\sum_{i}^C {e^{\hat{y}_i^2}}}{\log{f(\hat{{y}_i^2})}} \)
TASK 1 + TASK 2
\({\psi_1}\)
\({\psi_2}\)
Deep Multi-task Learning
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100065/JCR_diagrams-DMTL.drawio.png)
ENCODER + TASK 1 + TASK 2
Loss function
weighted cross entropy loss
\( \mathcal{L}= {\psi_1} {\sum_{i}^C {e^{\hat{y}_i^1}}}{\log{f(\hat{{y}_i^1})}} + {\psi_2} {\sum_{i}^C {e^{\hat{y}_i^2}}}{\log{f(\hat{{y}_i^2})}} \)
\({\psi_1}\)
\({\psi_2}\)
Deep Multi-task Representation Learning
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100066/JCR_diagrams-DMTRL.drawio.png)
TASK 0 + TASK 1 + TASK 2
Loss function
weighted reconstruction error with weighted cross entropy loss
\( \mathcal{L}= {\psi_0} \text{\textbardbl}{X-\hat{X}}\text{\textbardbl} + {\psi_1} {\sum_{i}^C {e^{\hat{y}_i^1}}}{\log{f(\hat{{y}_i^1})}} + {\psi_2} {\sum_{i}^C {e^{\hat{y}_i^2}}}{\log{f(\hat{{y}_i^2})}} \)
\({\psi_1}\)
\({\psi_2}\)
\({\psi_0}\)
Validation (Methodology)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100108/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100110/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100111/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100112/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100113/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100114/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100115/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100117/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100118/pasted-from-clipboard.png)
80 %
10 %
10 %
40 %
40 %
20 %
20 %
10 %
10 %
Warm-up
Trainning
Test/Val
60 %
20 %
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100400/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100401/pasted-from-clipboard.png)
Targets
Model Validation Results
SIMILARITIES
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100351/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100353/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100354/pasted-from-clipboard.png)
TASK 1
TASK 2
Model similarities
Train/Val/Test Results
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100162/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100165/pasted-from-clipboard.png)
Warm-up
Trainning
Loss function
Warm-up
Trainning
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100170/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100171/pasted-from-clipboard.png)
Accuracy TASK 1
Train/Val/Test Results
Warm-up
Trainning
Accuracy TASK 2
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100183/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100185/pasted-from-clipboard.png)
Train/Val/Test Results
Representation Results
ISOLATED
T0 \(\psi_0=1\)
\(R2:0.99\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100326/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100335/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100341/pasted-from-clipboard.png)
T1 \(\psi_1=1\)
\(ACC:0.9997\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100344/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100345/pasted-from-clipboard.png)
T2 \(\psi_2=1\)
\(ACC:0.9998\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100212/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100217/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100224/pasted-from-clipboard.png)
DMTL
F \(\psi_1=1 , \psi_2=1\)
E \(\psi_1=0.25, \psi_2=0.25\)
mE \(\psi_1=1E-5, \psi_2=1E-5\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100256/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100258/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100260/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100261/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100262/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100263/pasted-from-clipboard.png)
Representation Results
DMTRL
T0 \(\psi_0=1\)
\(R2:0.986\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100267/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100268/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100270/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100272/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100274/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100275/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100276/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100278/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100280/pasted-from-clipboard.png)
E \(\psi_0=0.1, \psi_1=0.1 , \psi_2=0.1\)
\(R2:0.66\)
F \(\psi_0=1, \psi_1=0.1 , \psi_2=0.1\)
\(R2:0.79\)
Representation Results
DMTRL
mE2 \(\psi_0=0.1, \psi_1=1E-5 , \psi_2=0.0001\)
\(R2:0.90\)
mE3 \(\psi_0=0.01, \psi_1=0.0001 , \psi_2=1E-5\)
\(R2:0.75\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100288/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100289/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100290/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100292/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100293/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100295/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100296/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100298/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100300/pasted-from-clipboard.png)
mF \(\psi_0=1, \psi_1=1E-5 , \psi_2=1E-5\)
\(R2:0.980\)
Representation Results
Validation Results
CORRELATIONS
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100347/pasted-from-clipboard.png)
Global features
- 'TOTAL TRAINNING SECONDS', 'BEST TRIAL SECONDS',
- 'TASK 0 PSI 0', 'TASK 1 PSI 1', 'TASK 2 PSI 2',
- 'TASK 0 TOTAL PARAMETERS', 'TASK 1 TOTAL PARAMETERS', 'TASK 2 TOTAL PARAMETERS', 'TOTAL TRAINABLE PARAMETERS',
- 'BATCH SIZE', 'OPTIMIZER', 'LR',
Models features
-
'TASK 0 ACTIVATION', 'TASK 1 ACTIVATION', 'TASK 2 ACTIVATION',
-
'TASK 0 FC1 ENCODER', 'TASK 0 FC2 ENCODER , 'TASK 0 FC3 ENCODER', 'LATENT SPACE', 'TASK 0 FC1 DECODER', 'TASK 0 FC2 DECODER', 'TASK 0 FC3 DECODER',
-
'TASK 1 FC1', 'TASK 1 FC2', 'TASK 1 FC3',
-
'TASK 2 FC1', 'TASK 2 FC2', 'TASK 2
Global features
- 'TOTAL TRAINNING SECONDS', 'BEST TRIAL SECONDS',
- 'TASK 0 PSI 0', 'TASK 1 PSI 1', 'TASK 2 PSI 2',
- 'TASK 0 TOTAL PARAMETERS', 'TASK 1 TOTAL PARAMETERS', 'TASK 2 TOTAL PARAMETERS', 'TOTAL TRAINABLE PARAMETERS',
- 'BATCH SIZE', 'OPTIMIZER', 'LR',
Warm-up features
-
'TASK 0 TRAIN LOSS WARM-UP', 'TASK 0 VALID LOSS WARM-UP',
-
'TASK 1 TRAIN ACC WARM-UP', 'TASK 1 VALID ACC WARM-UP',
-
'TASK 2 TRAIN ACC WARM-UP', 'TASK 2 VALID ACC WARM-UP',
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100379/pasted-from-clipboard.png)
Validation Results
CORRELATIONS
Global features
- 'TOTAL TRAINNING SECONDS', 'BEST TRIAL SECONDS',
- 'TASK 0 PSI 0', 'TASK 1 PSI 1', 'TASK 2 PSI 2',
- 'TASK 0 TOTAL PARAMETERS', 'TASK 1 TOTAL PARAMETERS', 'TASK 2 TOTAL PARAMETERS', 'TOTAL TRAINABLE PARAMETERS',
- 'BATCH SIZE', 'OPTIMIZER', 'LR',
Task 0 results features
-
'TASK 0 XTEST RMSE', 'TASK 0 XTEST MSE', 'TASK 0 XTEST R2',
-
'TASK 0 XVAL RMSE', 'TASK 0 XVAL MSE', 'TASK 0 XVAL R2',
-
'TASK 0 XTRAIN RMSE', 'TASK 0 XTRAIN MSE', 'TASK 0 XTRAIN R2',
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100384/pasted-from-clipboard.png)
Validation Results
CORRELATIONS
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2058487/images/10100350/pasted-from-clipboard.png)
Global features
-
'TASK 0 XTEST RMSE', 'TASK 0 XTEST MSE', 'TASK 0 XTEST R2',
-
'TASK 0 XVAL RMSE', 'TASK 0 XVAL MSE', 'TASK 0 XVAL R2',
-
'TASK 0 XTRAIN RMSE', 'TASK 0 XTRAIN MSE', 'TASK 0 XTRAIN R2',
-
'TASK 1 YTEST ACC', 'TASK 1 YTEST PRCS', 'TASK 1 YTEST RCLL',
-
'TASK 1 YVAL ACC', 'TASK 1 YVAL PRCS', 'TASK 1 YVAL RCLL',
-
'TASK 1 YTRAIN ACC', 'TASK 1 YTRAIN PRCS', 'TASK 1 YTRAIN RCLL',
-
'TASK 2 YTEST ACC', 'TASK 2 YTEST PRCS', 'TASK 2 YTEST RCLL',
-
'TASK 2 YVAL ACC', 'TASK 2 YVAL PRCS', 'TASK 2 YVAL RCLL',
-
'TASK 2 YTRAIN ACC', 'TASK 2 YTRAIN PRCS', 'TASK 2 YTRAIN RCLL'
Validation Results
CORRELATIONS
- Contributions:
- Models:
- Isolated, Multi-task, Deep Multi-task, Deep Multi-task Representation Learning approaches
- Shared weights contributions
- Methodologies:
- Data train/test/val split strategy
- Models warm-up/trainning strategy
- Metrics & validation strategy
- Results correlations and similarities strategy
- Models:
Conclusions
deck
By Goa J
deck
- 12