OpenOB
utilising Opus for Broadcast Audio Links
The codec, features, potential uses and tools
Developer: James Harrison (@JamesHarrison ) - BBC Research & Development
Speaker: Chris Roberts ( @naxxfish ) - Canterbury Youth and Student Media / BBC Major Projects Infrastructure
I'm a software engineer in the Internet Research and Future Services section of BBC Research and Development.
This is a talk about Opus, an exciting new codec recently standardized by the Internet Engineering Task Force, some use cases it enables, and open source implementations and tools to use with the codec.
A Disclaimer
The BBC does not endorse or recommend Opus, does not use Opus to the best of my knowledge at this time, and does not endorse or recommend any of the software covered in this presentation.
So let's clear something up right away - the BBC does not recommend or endorse Opus. We'll get to some of the why of that a bit later, but I do need to say that the BBC doesn't endorse it and isn't using it, and the same goes for the software mentioned here. We're certainly interested - hence this talk - but Opus is still very new.
Audio over IP - An introduction
Audio over IP, or AoIP, is the term used to describe systems for transporting audio, typically in near-real time with low end-to-end latency, across standard IP networks. The underlying protocols used for audio over IP are quite old and tested, such as the widely used Real Time Protocol. In broadcast IP links are used for both contribution, from other studios in a complex or outside broadcasts, and emission, getting audio from studios to their transmitter sites.
Codecs and Containers
Opus is a codec which does not aim to provide lossless compression. Just to clear this up - in broadcast we quite often refer to physical boxes as codecs. This is pretty old terminology going back to the days of ISDN and before. In the computer world, a codec is an algorithm that converts raw data - in our case, audio - to and from a different format. We sometimes call them coders. We also have containers - these are things like Ogg or Matroska that encapsulate coded data with some headers.
What is Opus?
So what in fact is this Opus thing anyway and why am I so excited about it? Opus is a new codec - typically paired with the Ogg container - for audio data.
Don't we have a few of those already?
AAC-LC, -LD
HE-AACv2
MPEG-2
MP3
G.722
Vorbis
iLBC
Speex
AMR-NB
AMR-WB
... to name a few
So there are loads of codecs out there. We've got AAC low complexity, its low delay variant, high-efficiency AAC and its successor, MPEG-2's audio codec which is still used widely in television, the annoyingly ubiquitous MP3, G.722 - popular in VoIP applications and the codec underlying most ISDN audio links, Vorbis, the internet low bandwidth codec, Speex, and the adaptive multi-rate codec family. And that's not an exhaustive list - there's also proprietary procotols like Worldcast's apt-X series. So why do we need something new?
The Rationale for Opus
Comparison of Codecs
(from opus-codec.org)
Opus is a hybrid codec
SILK + CELT
(LPC + MDCT)
Opus isn't just one codec, but is instead formed of two layers - SILK and CELT. This allows it to cover a wide range of situations, while presenting a consistent API to developers and a consistent bitstream format and stream packaging. This has huge benefits for interoperability, and makes Opus one of the most widely applicable codecs out there.
SILK
Let's look at the parts of Opus. SILK is a narrow-to-superwideband low-bitrate codec typically used for speech coding and very low bitrate operation.
SILK at a glance
Based on linear predictive (LP) coding
4-12kHz input bandwidth (8-24kHz sample rate)
6-40kbit/s bitrates
Supports bitrate adaptation, discontinuous transmission
Low-latency - typically 25ms
SILK was developed as part of work done at Skype to provide a low bitrate codec for conversational use. Widely used in Skype and in other products such as the Steam gaming community platform, it supports low latency in its native form but has been heavily modified as part of its integration with Opus to offer even lower latencies. It is based on linear predictive coding, also the basis of codecs like GSM, and supports bitrate adaptation as well as discontinuous transmission. Discontinous transmission, or DTX, lets a link use practically no bandwidth when no audio is being sent.
SILK in Opus
Opus uses SILK in most low-bandwidth modes
Optimized for speech rather than music
Opus uses SILK in situations where primarily speech is being coded - low bandwidth and bitrate situations. In most implementations, the choice between SILK and CELT can be manually made or can be left to be determined from the encoding parameters specified in use. SILK can be used to carry music but this is not its strong point - LPC is not the right tool for the job.
CELT
The other half of Opus is CELT, the wide-ranging low to high bandwidth codec used within Opus for high quality speech and music coding.
CELT at a glance
Modified discrete cosine transform (MDTC) codec
16-24kHz input bandwidth (32-48kHz sample rate)
24-128kbit/s bitrates
Supports bitrate adaptation, discontinuous transmission and packet loss concealment
Low-latency - typically 2-10ms
Developed by the Xiph foundation, the same organization who produced Ogg, Vorbis and Theora, CELT was originally a standalone codec designed as a low-latency alternative to Vorbis and MP3. The algorithmic approach of a transform codec allows for high quality music and speech transmission with minimal delay, but with more computational complexity. It supports a range of advanced features and has a very low latency, but does not support lower input bandwidths.
CELT in Opus
Opus uses CELT in medium to high bandwidth modes
Good all-round performance for speech and music at higher bitrates (over 32kbps)
Opus uses CELT for its higher bandwidth modes and for higher bitrates, and is thus better suited for applications where music and speech are being transmitted - this is the main mode in which most broadcasters will make use of Opus for contribution and especially emission links.
How does this "hybrid" stuff work?
Can be switched to LPC or MDCT explicitly
Hybrid mode takes sub-8kHz through LPC and above-8kHz through MDCT
Internal filter banks to split and recombine
There is better performance in LPC mode than hybrid mode at lower bitrates for speech
So how does the hybrid part of Opus work? Fundamentally, it uses an internal crossover to split the sub-8kHz material from the above-8kHz material, and sends the low-frequency audio through the LPC process and the high-frequency audio through the MDCT process. On the decoder, this is reversed to recombine the audio into a single stream after decoding. Some extra processing is done to ensure all the frame sizes and sample rates line up properly.
Opus Hybrid Block Diagram
So what about audio quality?
We don't know!
Some subjective testing being done by the EBU Audio Subjective Testing working group
So what does this all add up to, when you get down to it and play things through it? Well, we don't know yet. Certainly to most people's ears in casual observations it performs favourably with other codecs. The EBU in partnership with the BBC, IRT, NRK, RAI and others is working on performing subjective audio testing and publishing a report in September, probably to be released at IBC.
... so does it sound "alright"?
Yeah, pretty much.
Google and Nokia Research have performed listening tests for speech content and found it better than or equivalent to comparable speech codecs.
Hydrogenaudio conducted public listening test, better than AAC-LC and HE-AAC and Vorbis in VBR mode, lower/more consistent bitrate
In a less professional sense, though, we know it sounds pretty good. Google conducted a series of tests for speech and fullband music, and Nokia did a lot of work on speech quality characterization. Opus outperformed G.722, AMR and G.719 in 8 to 48kbps speech, especially when set to the appropriate LPC or MDCT modes explicitly; the hybrid mode did perform poorly below around 36kbps. In a range of fullband stereo music files, Opus at 64kbps equalled AAC-LC at 64kbps and MP3 at 96kbps. At 128kbps, Opus outperformed these both. Google concluded that at 32kbps mono, Opus was practically transparent for speech content
Fixed standard, living code
While the standard is fixed and specifies the behaviour of the decoder, it does not define the performance of the encoder. This means that the encoder software can continue to be developed and improve - so there can be improvements to the quality of the codec over time.
So why is Opus so important?
It's free and royalty-free
Because it's free and royalty-free it's trivially implemented
It's potentially higher-quality-per-bit than existing codecs
So why do we, in the broadcast community, care so much about Opus? The answer mostly boils down to money. AAC, apt-X and other proprietary codecs require a license to encode. This license fee can be huge and so open source, freely licensed codecs offer a solution. There's also potential for improving quality on existing fixed-bitrate circuits.
Latency
Latency is that reason. Let's look at what happens when you inadvertently introduce latency to a broadcast.
Latency is bad
We want less of it in our broadcasts.
Latency is a thing that we want to reduce as much as possible. While it is typically a non-issue for recordings, in live broadcasts and contribution links it can be seriously disruptive to the flow of a programme or in the case of interviews and speech content, seriously affect intelligibility.
Where does latency come from?
Audio device buffers
Coding and muxing
Routing and transit over IP
Jitter buffers
Latency crops up in a number of places. Firstly, the audio device used to turn analog audio into digital audio and get it into the computer will typically add five to ten milliseconds in its internal buffers. Next, we have to encode and mux the audio - this can take whole seconds for MP3 or AAC depending on parameters, and this is where Opus wins hands down with a frame size as low as 2.5 milliseconds. Next we have the delay for actually sending the packets from A to B, and lastly we have a buffer on the receiver which lets us iron out any jitter in that IP link by waiting a little for any packets that arrive out of sequence to show up.
Opus Latency
Frame Size
Algorithmic Delay
Bitrate overhead
20ms
22.5ms
0%
10ms
12.5ms
10%
5ms
7.5ms
32.5%
2.5ms
5ms
75%
2.5ms frame size = 112kbps to match 64kbps
So how does Opus perform? Well, with 2.5ms frame sizes you can achieve latency in the coder as low as 5 milliseconds. However, this requires using smaller transform block sizes so to compensate for this the bitrate must be increased. So we may need a 128kbps link to send 64kbps if we want the lowest possible latency. However, in most cases, we're happy with an extra few milliseconds for only around 30% increase in overhead.
Jitter Buffer Tradeoffs
Larger jitter buffer = more resilience, more latency
Jitter can be induced in the hardware at each end or (more often) the IP link layer
As ever, engineering is about tradeoffs. Low latency links are more susceptible to transient faults and glitches. Increasing the size of the jitter buffer gives higher resilience against packets arriving out of sequence due to jitter. Jitter can be induced within the transmission device - especially on shared devices like laptops running link software, where the CPU isn't just doing AoIP duties - but jitter is more often experienced at the IP link layer, where routes on the internet change or link buffers on the internet vary from packet to packet.
Reliability
...two links lost in the first 20 minutes of the programme, we're off to a good start...
...I'm sorry, we've lost him. He was in Beirut, though, so there's a bit of an excuse...
I'm going to have to cut you off there, we'd love to hear more but I'm afraid your link appears to be broken...
(All from 30 minutes of the 3-hour-long Today programme, BBC Radio 4's flagship current affairs and world news programme)
So where are we now in terms of AoIP reliability? As you can see, these days all is not well in the world of broadcast contribution links. IDSN links, effectively glorified phones, were loved for their reliability - but when they failed they failed hard, often taking hours, days or even weeks to get back on air if a circuit had been broken, even minutes for a transient fault. IP codecs have a tendency to glitch intermittently and are harder to test/debug - people use line-up tone or simple signals to test before going on air, which doesn't represent an accurate test given variable rate codecs. The internet is also much more of a black box to users once you're past your local modem or router. However, well-engineered IP links can recover faster from failures thanks both to the nature of the protocols involved and the nature of the internet.
Reliability in Opus
Forward Error Correction (FEC) - I'm expecting a maximum of 5% loss on this link so put enough information about the previous packet in each subsequent packet to cope with it
Packet Loss Concealment (PLC) - I've lost a packet entirely and can't correct for it with FEC, do I notice it in the audio?
With this in mind, Opus features several features for reliability, the main two being forward error correction and packet loss concealment. Forward error correction lets Opus send an extra quantity of data, which can be adjusted at runtime, to compensate for data being lost in transit - good for links which have known occasional packet loss. Packet loss concealment lets Opus mask missing packets by guessing, effectively, what the audio should be in the gap left by a missing packet. This is more effective with smaller packet sizes but incurs extra bitrate overhead.
Dealing with the effects of loss and jitter
All of Opus' reliability features work no matter what bitrate or frame size
Jitter buffers will still help on internet links but PLC and FEC can allow for reduced buffers versus other codecs
Back to Opus, then. Because Opus has packet loss correction, losing a few packets isn't actually the end of the world. Jitter buffers are still good to have and help massively with internet links even with a relatively small 10 to 20 millisecond buffer out on the wild west of the internet, but these buffers can be much smaller than you'd need for other codecs thanks to packet loss correction and forward error correction.
Dynamic Control
Huge chunks of Opus can be adjusted at runtime and without introducing discontinuities in most cases
Opus also lets you tweak many parameters at runtime. Did your link just lose half its bandwidth, or start experiencing loss? Drop the bandwidth down, increase the FEC expected loss parameter, don't drop audio while doing it. This is another feature that requires a bit of a rethink with regards to how we establish and monitor links within link software compared to older codecs where you set the link up and maybe you adjust it if it fails. Skype is one of the only AoIP tools out there with decent realtime link adjustment at the moment.
Open-Source Tools for Opus
None of the following tools are endorsed by or supported by the BBC or BBC Research and Development - but they are all open source.
opusenc/opusdec, libopus
The reference implementation
Excellent quality, performance acceptable on many platforms
Script-friendly CLI tools for static file encoding
Supports fixed-point as well as default floating point operation
Opusenc and its partner opusdec are part of the libopus reference implementation for Opus. These are the highest quality tools out there for Opus at the moment, and are great for working with static files or if you're a C developer. If you want to use Opus to encode a file before uploading it via 3G - this is the tool to use.
Gstreamer opus elements
GStreamer plugins providing elements for encoding and decoding of audio
RTP payloading/depayloading also supported
Error correction/PLC supported
Very friendly interface for streaming application development
Moving higher up the stack there are now plugins for the GStreamer media streaming framework available. These allow you to encode and decode audio, but also to make use of real time protocol payloading and depayloading, and advanced Opus features like forward error correction and packet loss concealment. The interface for GStreamer is much easier to work with in higher level languages like Python, and makes using Opus much easier, providing you with lots of helpful elements around your Opus elements.
What about end-user friendly tools?
My users are not geeks. How can they use this shiny new technology?
How do we let end-users at this technology? What can your radio station do, especially if your station could really benefit from having versatile IP codecs but has been unable to afford them, or has been using Skype etc in lieu of professional kit?
OpenOB
(Not a project endorsed or supported by BBC R&D)
For a bit more control there's OpenOB. First off, a disclaimer: This is not a BBC or R&D project, it's my own personal project, and again fully open source under the BSD 3-clause license.
OpenOB - An Introduction
Open-source RTP link manager (BSD 3-clause license)
Supports Linear PCM, CELT and Opus transports via GStreamer
Broadcaster-oriented - high reliability, automatic aggressive failure recovery
Currently uses Redis for message passing and session setup
OpenOB is a tool that manages a unidirectional real time protocol audio stream between two points. It supports linear audio over fast networks, as well as legacy CELT and Opus in a range of bitrates. It is designed for reliable operation, with aggressive recovery of dropped links. And last but not least it currently uses Redis to handle message passing and session negotiation. I built OpenOB initially as a studio transmitter link tool because the community station I was working at forgot to budget for a STL when going on air on FM.
OpenOB - What it can do
Configure to link bandwidth, everything else automatic
Sending party configures receiver
Jitter buffer/latency configurable
Total system latency as low as 25-50ms on Linux
So what can it do? At a minimum you just point it at the receiver and tell it how much bandwidth you have. The transmitter defines all the link parameters and the receiver sorts itself out. The total latency for an OpenOB link can be lower than 25 milliseconds over reliable networks, and reliably as low as 50 milliseconds over short hops on the internet.
OpenOB - Building codec hardware
Lightweight - runs on Raspberry Pi and similar, ARM
Programmatic access to link management tools
The longer-term goal - simple message-based API for control surfaces
Using hobbyist parts for low costs - great for community, student and other low-budget stations
Want pro quality? Use industrial computers, pro audio interfaces
Because OpenOB is open source and based on fairly lightweight libraries it can run with quite low overhead on low-power devices including things like the Raspberry Pi. There's a basic API to let other programs set up OpenOB links and take advantage of the manager, though there's still work to do on improving the internal APIs for more control. All in all, once you've added sound cards and power supplies you can set up a Raspberry Pi based link for around 200 euros. If you consider most commercial codecs cost thousands of euros an end, this is a new level of affordability.
What about reliability?
OpenOB also has some features specifically geared towards highly available links.
OpenOB - Reliability Features
If your network link glitches for 10ms, you only lose 10ms of audio
Packet loss concealment support
Forward error correction configurable on link setup
Jitter buffer configurable for high-jitter networks like the internet/3G links
OpenOB does very simple sessions, so recovering after a glitch doesn't require session rebuilding and buffering like HTTP links or poorly-engineered SIP links. There's also support by default for packet loss concealment so some glitches will be entirely hidden in the audio, as well as forward error correction which you can tweak to suit the link you're running across. There's also an ample jitter buffer.
OpenOB - Multipath
Send multiple copies on different connections (eg 2 3G providers)
Receive all copies that make it, deduplicate packets
Previously only available in high-end commercial codecs
Adds considerable latency but great for STLs
... but moderately complex firewall configuration required on originator
OpenOB doesn't try to configure your network for you.
(not in GitHub yet!)
OpenOB is also gaining support, currently in testing, for more advanced reliabile transmission features like multipath sending. In this mode you specify multiple receiver ports and the transmitter will send the same RTP stream to each port - and in your router or PC firewall you can route those source ports over different routes to get to the receiver. On the receiving end, the streams are all received and combined, duplicates are discarded and gaps filled in from any of the sources as part of the jitter buffer. In this mode you can tolerate complete failure of all but one of your network connections without dropping audio, with no specialized hardware on either end! The only comparable product is APT's Worldcast Surestream, which costs lots more than "free".
OpenOB - Applications
Outside broadcast contribution links
Broadcast emission links (studio-transmitter, studio-studio)
OpenOB can be used in most situations where you want to get audio from A to B, and is in production use for outside broadcasts and studio-transmitter links at a few community radio stations around the world already. I built OpenOB because I needed it, and it solved my problem - and many people, it turns out, have similar problems.
OpenOB - Where next?
Multipath and more complex configurations need better configuration management
Easier end-user front-end via web interface
Shift towards message bus architecture using AMQP, decentralized components for flexible integration
NAT autonegotiation via STUN/TURN tunnels/ICE
In-progress link monitoring, loss measurement and reconfiguration
There's a lot still to be done for OpenOB to become a really solid tool. #The demand for multipath streaming and complex configurations like asymmetric bidirectional links for talkback is huge, so the existing command-line option interface is getting replaced. #The control interface and communications using Redis aren't ideal, so I'm currently rewriting the system around a message bus architecture using AMQP and loosely coupled components - for instance, for multipath we need to configure the firewall. If you want OpenOB to manage iptables to do that, we could have a daemon to manage that in response to messages from the link manager. If you're on BSD, you swap it out for something to manage pf. This will also help people developing front-ends for the tool. #I also want to provide NAT punching with STUN/TURN tunnels, and one of the features I'm really looking forward to getting to implement# will be proper link monitoring and automatic reconfiguration of in-progress links to respond to unstable internet conditions.