we chose to build a DSA supercomputer instead of clustering CPU hosts with DSA chips.
- Training time is huge. TPUv2 chip would take two to 16 months to train a single Google production application, so a typical application might want to use hundreds of chips.
- DNN wisdom: bigger datasets plus bigger machines lead to bigger breakthroughs. Moreover, results like AutoML use 50x more computation.
Text