Once-For-All: Train One Network and Specialize for Effective Deployment
SysDL Reading Group
Two ways of doing Hardware-aware NAS
Once for All: Overview
Once for All: Challenges
Progressive Shrinking: Overview
Problem Formalization
\( \min_{w_0} \sum_{arch_i} L_{val} ( C(W_0, arch_i)) \)
The overall objective is to optimize \( W_o \) to make each supported sub-network maintain the same accuracy as independently training the network
Architecture Space
Units
Gradually reducing feature size and increasing channel numbers
Arbitrary depth, width and kernel size
Depth: {2,3,4}
Ex.Ratio: {3,4,6},
Kern Size: {3,5,7}
\( 2 \times 10^{19} \)
Training the Once-for-all network
Training the Once-for-all network
Training the Once-for-all network
706 extra parameters per layer
Results Once-for-all network
Results Once-for-all network
Results Once-for-all Comparisions
Results Once-for-all Comparisions
Results Once-for-all network Co-Designed Solutions
Results Once-for-all network Co-Designed Solutions