conv4, 384, /2
unpool, x2
unpool, x2
unpool, x2
concat
concat
concat
conv 1x1, 128
conv 3x3, 128
conv 1x1, 64
conv 1x1, 32
conv 3x3, 32
conv 3x3, 64
conv 3x3, 32
conv 1x1, 1
conv 1x1, 1
conv 1x1, 4
conv3, 256, /2
conv2, 128, /2
conv4, 64, /2
Image
7x7, 16, /2
\[f_1\]
\[h_1\]
\[f_2\]
\[f_3\]
\[f_4\]
\[h_2\]
\[h_3\]
\[h_4\]
4 distances
1 rotation angle
confidence score
conv1_2
pool_1
conv2_2
pool_2
conv3_3
pool_3
conv4_3
pool_4
conv5_3
pool_5
fc_6
fc_7
upsample
concat
upsample
upsample
concat
concat
concat
Input Image
conv 1x1, 2
conv 1x1, 16
conv 1x1, 2(/16)
conv 1x1, 2(/16)
conv 1x1, 2(/16)
conv 1x1, 2(/16)
conv 1x1, 2(/16)
text/non-text classification
Link prediction