• Can another modality help without exact pairs?

Pairing Bottleneck

  • Most multimodal methods assume matched pairs
  • Real clinical data are often unmatched
  • Question: can "extra but unpaired" data still help?
  • Modality-specific encoders + one shared backbone
  • No pair mining
  • Keep the improved target-modality representation at inference

Unpaired Multimodal Learner (UML)

  • Different modalities = different views of the same world
  • Shared weights push toward concept-level features
  • Related modalities transfer useful structure
  • Modalities describing the similar structure, one can help the others

UML - Could this work?

Benchmarks

  • Self-supervised setting: UML beats unimodal across reported tasks

Benchmarks

  • Self-supervised setting: UML beats unimodal across reported tasks

?

  • Extends to audio + vision + text
  • Image help audio
  • Text help audio
  • Best results with all three?
  • Clinical analogy: waveform + EHR + notes, or retinal imaging + outcomes + text

Beyond Image + Text: Three Modalities

  • Extends to audio + vision + text
  • Image + text help audio
  • Audio + text help image
  • Best results with all three
  • Clinical analogy: waveform + EHR + notes, or retinal imaging + outcomes + text

Beyond Image + Text: Three Modalities

  • Scale: ~900K fundus + ~700K OCT
  • CFP and OCT carry complementary signal
  • RETFound limitation: CFP-OCT fusion not investigated

Fundus + OCT After RETFound

Can we learn from fundus + OCT jointly, even with imperfect pairs?

Summary

  • Perfect pairing is not required
  • Shared weights are a strong simple baseline
  • Key caveat: modalities must be semantically related
  • Use retinal data to explore unpaired fundus-OCT learning

Can we learn from fundus + OCT jointly, even with imperfect pairs?

uml-presentation

By Safa Andac

uml-presentation

  • 26