The main contributions discussed in the paper are:
2 . ADG(Attention Guided DropGraph):
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Reference:https://arxiv.org/pdf/1801.07455.pdf
Reference:https://arxiv.org/pdf/1801.07455.pdf
Reference:https://arxiv.org/pdf/1801.07455.pdf
The receptive field in Convolutional Neural Networks (CNN) is the region of the input space that affects a particular unit of the network.
Text
DropBlock: A regularization method for convolutional networks
Reference:https://arxiv.org/pdf/1810.12890.pdf
The main idea of DropGraph is: when we drop one node, we drop its neighbor node set together
if i=3 and from graph we can say that davg=3
Expected number of nodes in the i th order neighborhood of a randomly sampled node is given by
where
The average expanded drop size is estimated as:
For conventional Dropout:
For DropGraph:
Attention-guided drop mechanism:
NTU-RGBD | NTU-RGBD-120 | Northwestern-UCLA |
---|---|---|
56,880 action samples in 60 action classes performed by 40 distinct subjects | 114,480 action samples in 120 action classes performed by 106 distinct subjects | 1494 video clips covering 10 categories performed by 10 different subjects |
Kinect V2 | Kinect V2 | Kinect |
3 cameras from different horizontal angles: −45 , 0 , 45 | 32 setups, and every different setup has a specific location and background | Captured by three Kinect cameras |
Two protocols 1) Cross-Subject (Xsub): Training data comes from 20 subjects, and the remaining 20 subjects are used for validation. 2) Cross-View (X-view): Training data comes from the camera 0 and 45 , and validation data comes from camera −45 |
Two protocols 1) Cross-Subject (X-sub): Training data comes from 53 subjects, and the remaining 53 subjects are used for validation. 2) Cross-Setup (X-setup): picking all the samples with even setup IDs for training, and the remaining samples with odd setup IDs for validation |
One protocol Training data comes from the first two cameras, and samples from the other camera are used for validation |
1) Datasets:
2) Model Setting:
Comparision of regularization methods:
Visulaization of learned Adj Matirces in Coupling GCN and Decoupling GCN
Ablation Study:
Keep_Probability vs Accuracy:
Decoupling groups vs Accuracy:
Comparision with State of the art methods:
a) Using NTU-RGBD:
b) Using NW-UCLA:
a) Using NTU-RGBD-120:
More info on attention based dropout layer: