Semi-Supervised Learning for Notes Organization

Problem Statement

1. Notes organization: different topics
 

2. Approach to be used: semi-supervised learning

Data Description & Assumptions

1. Types of multimodal data: text, image/video, audio
 

2. Imbalanced data: could be that notes of some topics are more frequent
 

3. Metadata has information on type of data (#1): multiple types can be contained in one note
 

4. Setting this as a single-label problem: each note can be categorized into only one topic.

 

Some Pipelines

  1. Text can be preprocessed and analyzed in several ways (next slides)
     
  2. What about audio?
        - Either extract features directly from audio (like MFCCs)
        - Or convert audio to text using an ASR system and use text pipeline
     
  3. What about image/video?
    - Captioning (text pipeline)
    - Or use audio only (audio pipeline)

 

 

Preprocessing

  1. Text can be preprocessed by removing stopwords, punctuation, lemmatization, POS tagging etc.
     
  2. Audio can be preprocessed by identifying voiced segments and extracting those windows from audio files.
     
  3. Images can be preprocessed in four ways: pixel brightness transformations, geometric transformations, image filtering and Fourier transform
     
  4. Videos can be decomposed into frames, unwanted frames removed and a sequence formed

 

 

Feature Extraction

  1. Features from text: word2vec embeddings of extracted entities, n-grams, TF-IDF values.
     
  2. Features from audio: MFCCs, deltas, delta-deltas, spectral centroids
     
  3. Features from image/video: generated via deep learning models like ResNet

 

Model Selection/Training

  1. Instead of training separate models for each modality, makes sense to learn them jointly
     
  2. Data for all three modalities for each note will be concatenated, which leads to sparsity if notes are mostly of single modalities
     
  3. Possible solution: Google's Expander SSL model, which involves graph-based learning
     
  4. For sake of convenience, labels will be numeric

 

Imbalanced Classes?

  1. Problem defined as Class-Imbalanced SSL by Hyun et. al (2020)
     
  2. Solution: Suppressed Consistency Loss

 

Overall Pipeline

Made with Slides.com