Deep Video

Daniel Yukimura

yukimura@impa.br

A guide for handling video data

  • Video Processing
  • Video Datasets
  • Challenges on video processing
  • Decord basics
  • Decord with PyTorch
  • Deep Video resources: GluonCV & PySlowFast

Class Overview

Video Processing

Tasks:

  • Action Classification: What?
  • Temporal action detection: When?
  • Spatial Temporal action detection: Where?

Video Datasets

Data Structures:

  • Video
  • Image References
  • Sensory Data
  • Skeleton, Boxes, Masks, Keypoints, etc...

Video Datasets

Datasets:

Video Datasets

Even more variety

  • Trimmed vs Untrimmed
  • Online vs Offline
  • Single person vs group interactions
  • Supervision variety
  • Labels vs Natural Language

Challenges

  • Data collection and annotation
    • Fine grain labels
  • Efficiency:
    • Loading and pre-processing
    • Data storage is huge
    • IO bottleneck
    • Random access
  • Generalization

Decord

pip install decord

A video reader for deep learning:

  • Pythonic interface
  • Efficient access
  • Dedicated methods for video processing

Decord

import decord
from decord import VideoReader

vr = VideoReader('video.mp4')
vr = VideoReader('video.mp4', width=320, height=256)

duration = len(vr)
print(duration)

frame = vr[10]
print(frame.shape)

frame = vr[100]
print(frame.shape)

Decord

import decord
from decord import VideoReader

vr = VideoReader('video.mp4')

frames = vr.get_batch([10, 100])

frame_id_list = range(0, 64, 2) #32 frames skipping one in between
frames = vr.get_batch(frame_id_list)
print(frames.shape)

batch_size = 30
duration = len(vr)
for i in range(0, duration, batch_size):
  batch = vr.get_batch(range(i, min(i + batch_size, duration - 1)))
  print(batch.shape)
  

Decord

import decord
from decord import VideoReader

vr = VideoReader('Javelin_standing_throw_drill.mkv')

key_indices = vr.get_key_indices() # [0, 116, 150, 299] --> [0s, 3.8s, 4.9s, 9.8s]
key_frames = vr.get_batch(key_indices)
print(key_frames.shape)

Decord

import decord
from decord import VideoLoader
from decord import cpu

video_collection = ['video.mp4', 'flipping_a_pancake.mkv', 
                    'Javelin_standing_throw_drill.mkv']

vl = VideoLoader(video_collection,
                  ctx=[cpu(0)], 
                  shape=(10, 320, 240, 3), 
                  interval=1, skip=5, shuffle=1)

print('Total batches:', len(vl))

shuffle = -1  # smart shuffle mode, based on video properties, (not implemented yet)
shuffle = 0  # all sequential, no seeking, following initial filename order
shuffle = 1  # random filename order, no random access for each video, very efficient
shuffle = 2  # random order
shuffle = 3  # random frame access in each video only

Decord

import decord
from decord import VideoReader

from decord.bridge import set_bridge
set_bridge('torch')

vr = VideoReader('flipping_a_pancake.mkv', width=320, height=256)

frame = vr[10]
print(type(frame), frame.shape)

# decord.bridge.set_bridge('mxnet')
# decord.bridge.set_bridge('tensorflow')
# decord.bridge.set_bridge('native')

PyTorch bridge:

Deep Video resources

GluonCV:

  • Computer Vision toolkit
  • Pre-Trained models.
  • State-of-the-art DL algorithms.
  • Pre-processed datasets.​
  • Tutorials.
  • Standard: MXNet
  • PyTorch encoding

Deep Video resources

PySlowFast

  • Codebase specialized on deep video.
  • PyTorch
  • SOTA video backbones
  • Other datasets
  • Audio and Visual perception
  • Flexible
  • Scheduling: Multigrid method

Thanks