Deep Video

Daniel Yukimura

yukimura@impa.br

A guide for handling video data

Video Processing
Video Datasets
Challenges on video processing
Decord basics
Decord with PyTorch
Deep Video resources: GluonCV & PySlowFast

Class Overview

Video Processing

Tasks:

Action Classification: What?
Temporal action detection: When?
Spatial Temporal action detection: Where?

Video Datasets

Data Structures:

Video
Image References
Sensory Data
Skeleton, Boxes, Masks, Keypoints, etc...

Video Datasets

Datasets:

In the wild:
- Movies:
  - HMDB51
- YouTube:
- Surveillance footage
  - HiEve

Controlled collection:
- Weizmann
- KTH

Video Datasets

Even more variety

Trimmed vs Untrimmed
Online vs Offline
Single person vs group interactions
Supervision variety
Labels vs Natural Language

Challenges

Data collection and annotation
- Fine grain labels
Efficiency:
- Loading and pre-processing
- Data storage is huge
- IO bottleneck
- Random access
Generalization

Decord

pip install decord

A video reader for deep learning:

Pythonic interface
Efficient access
Dedicated methods for video processing

Decord

import decord
from decord import VideoReader

vr = VideoReader('video.mp4')
vr = VideoReader('video.mp4', width=320, height=256)

duration = len(vr)
print(duration)

frame = vr[10]
print(frame.shape)

frame = vr[100]
print(frame.shape)

Decord

import decord
from decord import VideoReader

vr = VideoReader('video.mp4')

frames = vr.get_batch([10, 100])

frame_id_list = range(0, 64, 2) #32 frames skipping one in between
frames = vr.get_batch(frame_id_list)
print(frames.shape)

batch_size = 30
duration = len(vr)
for i in range(0, duration, batch_size):
  batch = vr.get_batch(range(i, min(i + batch_size, duration - 1)))
  print(batch.shape)

Decord

import decord
from decord import VideoReader

vr = VideoReader('Javelin_standing_throw_drill.mkv')

key_indices = vr.get_key_indices() # [0, 116, 150, 299] --> [0s, 3.8s, 4.9s, 9.8s]
key_frames = vr.get_batch(key_indices)
print(key_frames.shape)

Decord

import decord
from decord import VideoLoader
from decord import cpu

video_collection = ['video.mp4', 'flipping_a_pancake.mkv', 
                    'Javelin_standing_throw_drill.mkv']

vl = VideoLoader(video_collection,
                  ctx=[cpu(0)], 
                  shape=(10, 320, 240, 3), 
                  interval=1, skip=5, shuffle=1)

print('Total batches:', len(vl))

shuffle = -1  # smart shuffle mode, based on video properties, (not implemented yet)
shuffle = 0  # all sequential, no seeking, following initial filename order
shuffle = 1  # random filename order, no random access for each video, very efficient
shuffle = 2  # random order
shuffle = 3  # random frame access in each video only

Decord

import decord
from decord import VideoReader

from decord.bridge import set_bridge
set_bridge('torch')

vr = VideoReader('flipping_a_pancake.mkv', width=320, height=256)

frame = vr[10]
print(type(frame), frame.shape)

# decord.bridge.set_bridge('mxnet')
# decord.bridge.set_bridge('tensorflow')
# decord.bridge.set_bridge('native')

PyTorch bridge:

Deep Video resources

GluonCV:

Computer Vision toolkit
Pre-Trained models.
State-of-the-art DL algorithms.
Pre-processed datasets.
Tutorials.
Standard: MXNet
PyTorch encoding

Deep Video resources

PySlowFast

Codebase specialized on deep video.
PyTorch
SOTA video backbones
Other datasets
Audio and Visual perception
Flexible
Scheduling: Multigrid method

Deep Video

Class Overview

Video Processing

Video Datasets

Video Datasets

Video Datasets

Challenges

Decord

Decord

Decord

Decord

Decord

Decord

Deep Video resources

Deep Video resources

Thanks

Deep Video

More from Daniel Yukimura