Jitsi Meet

(in Space)

Issues in video-conferencing

As most of us know, video conferencing can be a boring, tiresome affair

 

Latency, network reliability, visual and audio fidelity can all contribute to a fatiguing experience

 

Can we do better with the reproduction of audio?

Starting with sound

We shouldn't reinvent the wheel, current implementations aren't too bad...

 

but they weren't considering user experience for daily use!

 

Core idea: Let's start with audio  and focus on the goal of telepresence for more realistic conversations

Contributions

  • Open source, web based implementation of binaural video conferencing (with technical evaluation)
     
  • An synopsis of the current literature around spatial audio in video conferencing platforms
     
  • A small scale experiment to record participants experience with the platform (through the lens of telepresence)

Spatial audio

Nearly all videoconferencing apps transmit mono audio in multiparty contexts

 

Spatialization would be difficult from an acoustic perspective, but most of us use headphones anyway...

1. I can't build a teleconferencing system from scratch

A videoconferencing system is a sprawling, complicated beast

 

I needed a free & open-source app that was purely web based

 

Serve as a template for this novel integration

 

 

2. The academic & conceptual angle

There's a lot to say about teleconferencing and how we might best experience each others' voices over the net

 

Similar works that have tested this idea

Before I began - some considerations

A balance between over-programming and over-writing

Jitsi Meet

An open source video conferencing platform

WebAudio

An api in Javascript for working with audio

Jitsi is huge

Working with a mature application, in React (a library built on top of Javscript) is hard

 

Just understanding the ecosystem, finding the audio, was challenging

 

The difficulty was compounded when making changes that needed to propogate

How should audio be represented in group discussions?

  • Panning vs. ambisonics (HRTFs)
  • User control vs. automated
  • Headphones vs. speakers

Lots of interesting stuff

  • Corona has pushed everyone online
    • Estimates of +10-40% (Nokia, Cloudflare)
    • And away from cities!
  • Audio is way more important than video in conferencing and communication
    • Video for social bond forming
  • Lateralizing/spatializing audio can improve intelligibility in many scenarios
    • Double talk!
    • Reduce cognitive load, increase intelligibility, memory, and is generally more favorable

 

 

Where am I now?

  • I have a server running for development and testing
    • It's in Oslo (~300ms ping)
    • Public and readily accessible
       
  • I've integrated WebAudio in Jitsi
    • It should be working well for Firefox and Chrome
    • Both equal-power panning and HRTF based binaural lateralization
    • And automatic spatialization of users
    • Intelligent rearrangement of user's audio tracks

A general idea

A little diagram

A small experiment soon!

To see how binaural audio in a standard conferencing layout might affect a number of metrics and user opinions - stay tuned!

 

 

What's left to do?

A little more serious examination of performance

  • Test CPU utilization
    • More people --> more CPU, all local tracks are spatialized separately
    • Test differently sized groups
  • De-syncing possibilities
    • Audio/video may not sync

Thanks!

Spatial audio in Jitsi Meet

By jacksongoode

Spatial audio in Jitsi Meet

  • 25