Hema Swetha Koppula, Rudhir Gupta, Ashutosh Saxena
Okan Yıldıran - 2015700153
In this work, they presented learning algorithms to detect human activities and label them over long time scales. They also detected affordances of the objects in the view.
Segment
Sub-activity
Objects
object nodes
subactivity nodes
object to object interactions
object to sub-activity interactions
object to object between segments
sub-activity to sub-activity between segments
Any MRF can be written as log-linear model
label
weight
features
Training: We know labels and features, find best weights (SSVM)
Inference: We know weights and features, find best labels (MIP Solver)
Training:
Inference:
They performed temporal segmentation in order to represent atomic movements of human skeleton in an activity.
They used three methods
Begin with every frame corresponds to a node, iteratively merge them by one of those methods.
Cumulative binning into 10 bins.
Total 2010 features for each segment
They computed histograms of sub-activities and affordance labels, use them as features.
They trained multi-class SVM classifier over training data.
Cornell Activity Dataset - 60
Cornell Activity Dataset - 120
With object context, activity detection precisions increased.
With sub-activity context, affordance detection precisions increased.
With object-object interactions modeled, affordance detection improved.
With temporal interactions modeled, affordance and sub-activity precisions increased.
With their object tracking algorithm, precisions lower than using ground-truth tracks.
With object context, activity detection precisions increased.
Assisting humans
Using affordances
Labeling activities in RGB-D videos over long time
Formulated model with MRF, learned parameters with SSVM
Affordance labeling by using activities