Gokulan R - CS15B033
Prof. Pratyush Kumar
30 May 2020
\({}^1\) Shakti Multiplier-Accumulate Accelerator Network
$$x_1$$
$$x_2$$
$$x_3$$
$$w_1$$
$$w_2$$
$$w_3$$
b
\( y \) \(=\) \(\sigma\) \((\sum_{i} \) \(w_i\) \( \cdot\) \(x_i\) \( +\) \( b\) \() \)
$$x_1$$
$$x_2$$
$$x_3$$
$$h_1$$
$$h_2$$
$$h_3$$
$$h_4$$
$$h_5$$
$$y_1$$
$$y_2$$
$$w_{153}$$
$$w_{111}$$
$$w_{225}$$
$$w_{211}$$
Input Layer
Hidden Layer
Output Layer
w11 | w1n | ||||||||
wm1 | wmn | ||||||||
b1 | b2 | bn |
x1 |
xm |
1 |
y1 |
yn |
Input of size m, output of size n
Output computed as vector-matrix multiplication
Input vector, transposed
Weight Matrix
Output vector, transposed
Source: ISCA 2019 Tutorial
Source: ISCA 2019 Tutorial
Simple way to subsample
Max Pooling
2 x 2
stride 2
Average Pooling
2 x 2
stride 2
1 | 4 |
-2 | 7 |
2 | -20 |
31 | 11 |
41 | -8 |
0 | 3 |
-6 | 0 |
-11 | -1 |
7
31
41
0
2
6
9
-5
AlexNet (2012) - Breakthrough in ImageNet dataset
Alex et. al. ImageNet Classification with Deep Convolutional Neural Networks
H.T.Kung, Why systolic architectures?
$$ C = A \times B$$
Image: Samajdar et. al. SCALE-Sim: Systolic CNN Accelerator Simulator
Human-made choices
Exploration performed using task-level simulator
Compile-time exploration by compiler
Image: Samajdar et. al. SCALE-Sim: Systolic CNN Accelerator Simulator
Image: Samajdar et. al. SCALE-Sim: Systolic CNN Accelerator Simulator
$$ C = A \times B $$
LOAD A
LOAD B
GEMM: C = A*B
STORE C
push next
pop prev
push next
pop prev
1
2
3
4
Chen et. al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
for i=1 to 256/64
for j=1 to 512/64
LOAD(input, j)
for l=1 to 3, for m=1 to 3
LOAD(weight, i, j, l, m)
GEMM(input', weight', output')
ALU(output, i)
STORE(output, i)
Module | Status |
---|---|
fetch-decode | Completed |
dependency resolver | Completed |
load | Final stages |
store | Final Stages |
GEMM (16x16) | Completed |
ALU (vec_size=16) | Completed |
Custom compiler | Work-in-progress |
Task level simulator | Work-in-progress |
Module | LUTs | FIFOs |
---|---|---|
fetch-decode | 823 | 1317 |
dependency resolver | 1427 | 858 |
load | * | * |
store | * | * |
GEMM (16x16) | 90464 | 0 |
ALU (vec_size=16) | 1280 | 0 |
*Work-in-progress
Vinod Ganesan
Neel Gala
Arjun Menon
Mohan Prasath
Rohan Kaulgekar
Sadhana
Sujay Pandit
Surya Selvam
Anand Uday Gokhale
Nidesh
Sundar Raman
Shilpa
Selvaraj
Rishabh Jain