OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages
This paper introduce OpenHands, a library where we take four key ideas from the NLP community for low-resource languages and apply them to sign languages for word-level recognition.
It is a open-source library, all models and datasets in OpenHands with a hope that it makes research in sign languages more accessible.
Abstract:
Introduction:
When compared against text and speech-based NLP research, the progress in AI research for sign languages is significantly lagging.
We implement these ideas and release several datasets and models in an open-source library OpenHands with the following key contributions:
Standardizing on pose as the modality.
Standardized comparison of models across languages.
Corpus for self-supervised training.
Effectiveness of self-supervised training.x
Background and Related Work:
Models for ISLR:
Pretraining strategies
Sequence based models
Graph based models
Masking-based pretraining
Contrastive-learning based
Predictive Coding
Datasets available in Openhands library:
Experiment Setup and Results:
PyTorch Lightning to preprocessing data and for training pipelines
Adam Optimizer
For LSTM Batch size as 32 and initial learning rate (LR) as 0.005
For BERT, batch size as 64, and LR of 0.0001.
For ST-GCN and SL-GCN, batch size of 32 and LR of 0.001.
Single NVIDIA Tesla V100 GPU
Comparision of each dataset with its state of the art model and with models available in OpenHands:
Comparision of Inference times across various models and CPUs:
Indian SL Corpus for Self-supervised pretraining
Explains process of collection and storing SL corpus:
Pretraining Startiges:
Effectiveness of pretraining startegies:
Evaluation on low-resource and crosslingual settings:
Conclusion:
Introduced 7 pose-based datasets and 4 different ISLR models across 6 sign languages as part of openhands library.
Also released first large corpus of SL data for self-supervised pretraining.
Among different pretraining strategies and found DPC as effective.
Also showed that pretraining is effective both for in-language and crosslingual transfer.
Future Work:
Evaluating alternative graph-based models.
Efficiently sampling the data from the raw dataset such that the samples are diverse enough.