Deep Video
Daniel Yukimura
yukimura@impa.br
A guide for handling video data
- Video Processing
- Video Datasets
- Challenges on video processing
- Decord basics
- Decord with PyTorch
- Deep Video resources: GluonCV & PySlowFast
Class Overview
Video Processing
Tasks:
- Action Classification: What?
- Temporal action detection: When?
- Spatial Temporal action detection: Where?
Video Datasets
Data Structures:
- Video
- Image References
- Sensory Data
- Skeleton, Boxes, Masks, Keypoints, etc...
Video Datasets
Datasets:
-
In the wild:
- Movies:
- YouTube:
- Surveillance footage
Video Datasets
Even more variety
- Trimmed vs Untrimmed
- Online vs Offline
- Single person vs group interactions
- Supervision variety
- Labels vs Natural Language
Challenges
- Data collection and annotation
- Fine grain labels
- Efficiency:
- Loading and pre-processing
- Data storage is huge
- IO bottleneck
- Random access
- Generalization
Decord
pip install decord
A video reader for deep learning:
- Pythonic interface
- Efficient access
- Dedicated methods for video processing
Decord
import decord
from decord import VideoReader
vr = VideoReader('video.mp4')
vr = VideoReader('video.mp4', width=320, height=256)
duration = len(vr)
print(duration)
frame = vr[10]
print(frame.shape)
frame = vr[100]
print(frame.shape)
Decord
import decord
from decord import VideoReader
vr = VideoReader('video.mp4')
frames = vr.get_batch([10, 100])
frame_id_list = range(0, 64, 2) #32 frames skipping one in between
frames = vr.get_batch(frame_id_list)
print(frames.shape)
batch_size = 30
duration = len(vr)
for i in range(0, duration, batch_size):
batch = vr.get_batch(range(i, min(i + batch_size, duration - 1)))
print(batch.shape)
Decord
import decord
from decord import VideoReader
vr = VideoReader('Javelin_standing_throw_drill.mkv')
key_indices = vr.get_key_indices() # [0, 116, 150, 299] --> [0s, 3.8s, 4.9s, 9.8s]
key_frames = vr.get_batch(key_indices)
print(key_frames.shape)
Decord
import decord
from decord import VideoLoader
from decord import cpu
video_collection = ['video.mp4', 'flipping_a_pancake.mkv',
'Javelin_standing_throw_drill.mkv']
vl = VideoLoader(video_collection,
ctx=[cpu(0)],
shape=(10, 320, 240, 3),
interval=1, skip=5, shuffle=1)
print('Total batches:', len(vl))
shuffle = -1 # smart shuffle mode, based on video properties, (not implemented yet)
shuffle = 0 # all sequential, no seeking, following initial filename order
shuffle = 1 # random filename order, no random access for each video, very efficient
shuffle = 2 # random order
shuffle = 3 # random frame access in each video only
Decord
import decord
from decord import VideoReader
from decord.bridge import set_bridge
set_bridge('torch')
vr = VideoReader('flipping_a_pancake.mkv', width=320, height=256)
frame = vr[10]
print(type(frame), frame.shape)
# decord.bridge.set_bridge('mxnet')
# decord.bridge.set_bridge('tensorflow')
# decord.bridge.set_bridge('native')
PyTorch bridge:
Deep Video resources
- Computer Vision toolkit
- Pre-Trained models.
- State-of-the-art DL algorithms.
- Pre-processed datasets.
- Tutorials.
- Standard: MXNet
- PyTorch encoding
Deep Video resources
- Codebase specialized on deep video.
- PyTorch
- SOTA video backbones
- Other datasets
- Audio and Visual perception
- Flexible
- Scheduling: Multigrid method
Thanks
Deep Video
By Daniel Yukimura
Deep Video
- 425