Input data
Transform1
Transform2
Transform3
......
TransformN
Result
Model
Input data
Transform1
Transform2
Transform3
......
TransformN
Result
Model
An Iteration
Iteration
Iteration
Iteration
Iteration
......
Iteration
Result
✔️
Iteration
Iteration
Iteration
Iteration
......
Iteration
Result
Time
Time
Time
Time
Time
✔️
Enormous Time
TRAINING!
MODEL'S TRAINING
DATA SCIENTIST
VGG 16 for Image classification
House price prediction
VisML: Visualizing ML Model
Key: Model Intermediates
But: storing intermediates for ten variants of the popular VGG16 network on a dataset with 50K examples requires 350GB of compressed storage.
These intermediates could also be generated anew each time, however, this would require running the model >200 times on the full dataset.
Examples for DNN and traditional ML
Observation: only relative values matters
Observation: only relative values matters
Observation:
Observation:
Observation: non-uniform frequency
Solution:
\(t_{diag} = t_{fetch} + t_{compute}\)
Time to run a diagnosis
Time to fetch the data, recalculating or from materialization
Time to run the computation
\(t_{diag} = t_{fetch} +\) \(t_{compute}\)
Time to run a diagnosis
Time to fetch the data, recalculating or from materialization
Time to run the computation
SAME
\(t_{i,fetch}\)
\(t_{i,read}\)
\(t_{i,re-run}\)
VS
\(t_{i,fetch}\)
\(t_{i,read}\)
\(t_{i,re-run}\)
VS
\(\sum_{s=0}^i \{t_{read\_xformer}(s) + t_{read\_xformer\_input}(s)+t_{exec\_xformer}(s)\}\)
\(t_{i,fetch}\)
\(t_{i,read}\)
\(t_{i,re-run}\)
VS
\(t_{i,read} = \frac{n_{ex} \cdot sizeof(ex)}{\rho_d}\)
\(\gamma = \frac{(t_{i,\text{re-run}}-t_{i,read}) \cdot n_{query}(i)}{S(i)}\)
Whether to store the model intermediate?
The unit of \(\gamma\) is sec/GB, indicates the willingness to use storage in exchange of time.
e.g. 0.5 sec/GB means Bob want to use additional 1 GB to save 0.5 second execution time
We want to verify:
TRAD: Kaggle Zestimate competition (House price prediciton)
DNN: CIFAR10 image recognition
TRAD: 2.5X-390X
CIFAR10: 2X-210X
Input 168MB, Store All: 67GB
Traditional pipeline has many columns shared between pipelines.
Input 170MB, 10 epoch Store All: 242/350GB
Reducing storage footprint is more essential for DNN
There's no visual difference between full precision, LP_QT, KBIT_QT(k=8) and POOL_QT, but KBIT_QT(k=3) and THRESHOLD_QT shows clear discrepancies.
While 8Bit_QT shows nearly no difference comparing to Full precision, it needs 6X more time and 1.5X storage comparing to pool(2).
We set γ to 0.5s/KB (i.e., trade-off 1 KB of storage for a 0.5s speedup in query time)
Pipeline: speed is correlated to data written
DNN: comparing to training time (>30mins), the overhead of logging is acceptable