Semi-Supervised Learning for Notes Organization
Problem Statement
1. Notes organization: different topics
2. Approach to be used: semi-supervised learning
Data Description & Assumptions
1. Types of multimodal data: text, image/video, audio
2. Imbalanced data: could be that notes of some topics are more frequent
3. Metadata has information on type of data (#1): multiple types can be contained in one note
4. Setting this as a single-label problem: each note can be categorized into only one topic.
Some Pipelines
- Text can be preprocessed and analyzed in several ways (next slides)
- What about audio?
- Either extract features directly from audio (like MFCCs)
- Or convert audio to text using an ASR system and use text pipeline
- What about image/video?
- Captioning (text pipeline)
- Or use audio only (audio pipeline)
Preprocessing
- Text can be preprocessed by removing stopwords, punctuation, lemmatization, POS tagging etc.
- Audio can be preprocessed by identifying voiced segments and extracting those windows from audio files.
- Images can be preprocessed in four ways: pixel brightness transformations, geometric transformations, image filtering and Fourier transform
- Videos can be decomposed into frames, unwanted frames removed and a sequence formed
Feature Extraction
- Features from text: word2vec embeddings of extracted entities, n-grams, TF-IDF values.
- Features from audio: MFCCs, deltas, delta-deltas, spectral centroids
- Features from image/video: generated via deep learning models like ResNet
Model Selection/Training
- Instead of training separate models for each modality, makes sense to learn them jointly
- Data for all three modalities for each note will be concatenated, which leads to sparsity if notes are mostly of single modalities
- Possible solution: Google's Expander SSL model, which involves graph-based learning
- For sake of convenience, labels will be numeric
Imbalanced Classes?
- Problem defined as Class-Imbalanced SSL by Hyun et. al (2020)
- Solution: Suppressed Consistency Loss
Overall Pipeline
SSL for Notes
By Anjali Bhavan
SSL for Notes
- 435