Ben Combee
Hackaday Supercon 2023
A Hacker's Guide to Audio and Video Formats
- How much CPU processing can you do?
- How much RAM can you afford?
- How much storage space do you have?
- Do you just need to decode or also encode?
- What kind of latency is allowed?
- Are there licensing or patent costs?
- Can this be supported using available tools?
Questions to Ask Before Picking a Format
Audio
Audio Questions
- How large is the audio?
- Does it have metadata?
- Is it complex to decode?
- Does it support multiple channels?
- Does your hardware directly support it?
- Bitrate (samples / second)
- Sample size (8/16/24)
- Channels
- Separate vs joint stereo
Audio Formats
- Raw audio (PCM, PWM)
- aLaw / uLaw
- MP3 (MPEG-1 Layer 3)
- AAC (Advanced Audio Format)
- Dolby Digital (AC3/EAC3/AC4)
- Vorbis
- Opus
Deep Dive on MP3
- Most common audio file format ever!
- Patents ran out in 2018, so free to use
- Can go down to low bitrates
- Lots of software to decode
- Hardware solutions may be best for low-power or low-volume, but hard to justify a $12 VS1053B over a $4 Pico
- C library libhelix-mp3
- RealNetworks Public Source License
(BSD-like with patent grant) - about 20K code, 32K RAM on 32-bit ARM
Deep Dive on Opus
- Great open-source royalty-free format
- Especially designed for speech
- Supports low bitrates
- C library libopus (BSD 3-clause)
- ~200K for 32-bit ARM library with encode/decode
Images
Image Formats
- Raw memory dumps (BMP, TGA)
- Tile-based formats
- RLE compression
- Dictionary-based compression (GIF, PNG)
- DCT-based compression (JPEG)
- Based on video codes (WebP, AVIF)
- Memory for encoded form
- Complexity to decode
- Compatibility with display hardware
- Color space (RGB vs YUV)
- Indexed vs Direct Color
- Planar vs Interleaved
- Required width/height restrictions
- Alpha support
- Patent Encumbrance
Deep Dive on RLE and Indexed Color
- Image formats are often custom designed for the application
- Run Length Encoding (RLE) is a simple, low-code technique for compressing large areas of one color
- Indexed color is a way of mapping a small number of color values to a custom palette
- Find The Story (Arduboy)
- RLE compressed 2x2 screen (256x128x8) image
- Index colors mapped to B/W at runtime
- 32K reduced to 15K
Deep Dive on PNG
- Lossless bitmap format with lots of tool support
- Has index color and alpha channel support
- Uses zlib as compression for picture chunks
- Can add arbitrary metadata to images
- PNGdec library (Apache 2.0) by Larry Banks
- uses about 48K of RAM, small code size
- Designed for small projects
- Line-by-line decoding
- OptiPNG (zlib license)
- desktop tool to make PNG files smaller
- also does conversion from other formats
Deep Dive on JPEG
- JPEGDEC (Apache 2.0) by Larry Banks
- embedded-optimized decoder
- Floyd-Steinberg dithering
Deep Dive on YUV
- RGB colors map to how displays emit color with separate red/green/blue elements
- YUV, aka YCrCb, encodes brightness and color differences
- Native output format of JPEG and video codecs
- Since eyes are more sensitive to brightness changes, you can reduce resolution of UV planes by chroma subsampling
- libyuv (BSD) from Chromium for fast conversion
Video
- Motion JPEG
- MPEG-1 / MPEG-2
- H.264 / AVC
- H.265 / HEVC
- VP8 & VP9 & AV1
- Hardware support
- Memory required to decode
- Reference frames
- I frames / P frames / B frames
- Patent Encumbrance
Video Formats
Deep Dive on MPEG-1
- Invented in late 1980's for delivering video on CDs
- Supports I, D, and B frames, but usually I and D only
- MPEG-2 is an extension of this format for broadcast use with higher bitrates and interleaved video support
- MPEG-1 is patent-free since 2008, at least
- pl_mpeg - single C header MPEG-1 decoder!
- also supports MPEG-1 layer 1/2 audio
- outputs Y/U/V planes for your code to process
- used in my BadgerMovie project
Synchronization
Keeping Audio & Video in Sync
- Presentation Time Stamp (PTS)
- 90kHz clock
- Clock rollover
- Device's realtime/system clock unreliable for sync
- Device speed can be subtly different from playback speed
- Video may also be decoded out-of-order
Not Just Audio/Video Synchronization
- Closed captions
- Animations
- DaftPunkWordClock
- synced highlighting of lyrics with timestamps to MP3
- drifted terribly until I modified CircuitPython to expose decoded audio frame count allowing better sync
Thanks!
Creative Commons Acknowledgements
Creative Commons Acknowledgements
Code Links
A Hacker's Guide to Audio and Video Formats
By Ben Combee
A Hacker's Guide to Audio and Video Formats
- 546