Reproducible Deep Learning

Automatic animations between code

Dr. Srijith Rajamohan

Weakly-Supervised NLP
Infrastructure
- Data Ingestion
- Deep Learning
Workflow overview
Airflow orchestration with conda environments
Ansible for deployment
- Deploy Visual Analytics App
- Dockerized GPU training environments

Problem

Natural Language Understanding for determining political affiliation
- Understanding intent is difficult
Augmented Intelligence
- Human-in-the-loop: Results of DNNs used to create projections that assist humans in classifying documents
Interpretable
- Self-attention gives insight into the decision-making process of a DNN

Solution Overview

Use PySpark to clean the data

Project affiliation in a 2D space similar to a form of Aspect-Based Sentiment Analysis (ABSA)

Self-attention based BiLSTM with pretrained static and contextual embeddings (Elmo)

Evaluate visualization/cognitive efficiencies of various dimensionality reduction techniques

Interactive web application to help correctly label this weakly-supervised data

Gather social media posts related to certain political hashtags, along with user metadata

Data Ingestion and Preprocessing Pipeline

VT cloud server
- 18 cores,192GB RAM, 200 + 500GB volume
- Conda environments for package management
Python RQ
- Redis-based framework for job scheduling
- Uses Tweepy for interaction with Twitter
- Downloads and stores tweets corresponding to certain hashtags in timestamped files
MongoDB setup for interacting with the data
Metabase used as a dashboard for this DB
- Interactive filtering, visualizing and exploratory analysis from local machine

Training Pipeline

PyTorch code runs on GPUs
- 4 Volta GPU node with 16GB per-GPU memory
Workflow automated with Airflow
- PyTorch code:
  - Model training
  - Generates metrics: accuracy, F1 and ROC scores
  - Dimension reduction for visualization
- Plot.ly used for generating the graphs
  - Generating the plots from the metric files generated by PyTorch
Hyperparameter optimization done with Comet.ml

General Workflow

Hyperparameter Optimization

App

Ansible Notebooks for Deployment

# Python code
# 
- hosts: all
  tasks:
  - name: ping all hosts
    ping:

  - name: Supervisor install
    become: yes
    apt:
      name: supervisor
      state: latest
    tags:
      - supervisor_install

Ansible for deploying:

Conda environments through environment.yml files
Docker containers through Docker files

Conda Environment

$ conda env create -f env.yml -n keras_env2

(keras_env2) ubuntu@test2:~$ head -n 40 env.yml
name: keras_env
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - blas=1.0=mkl
  - brotlipy=0.7.0=py37h7b6447c_1000
  - ca-certificates=2020.1.1=0
  - certifi=2020.4.5.1=py37_0
  - cffi=1.14.0=py37he30daa8_1
  - click=7.1.2=py_0
  - dash=1.4.1=py_0

Dockerfile

FROM nvidia/cuda:10.0-base-ubuntu16.04
# Install some basic utilities
RUN apt-get update && apt-get install -y \
...
RUN curl -so ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && chmod +x ~/miniconda.sh && ~/miniconda.sh -b -p ~/miniconda 
ENV PATH=/home/user/miniconda/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
...

RUN conda install -y -c pytorch && pytorch==1.4 torchvision cudatoolkit=10.1 && conda clean -ya
# Install Deep learning packages
RUN conda install pandas scikit-learn scipy
RUN conda install plotly -c plotly
RUN conda install -c conda-forge spacy
RUN pip install pytorch-nlp torchtext
...

CMD ["python3"]

ubuntu@test2:~$ docker build -t IMAGE_NAME .

Building the Docker Container

ubuntu@test2:~$ docker images
REPOSITORY          TAG                     IMAGE ID            CREATED             SIZE
srijith/pytorch     0                       e5ee0ad6fc76        17 hours ago        8.22GB

Running the Dockerized Application

Use --gpus all flag to use all the Host GPUs inside the container
Map host folder to container with -v HOST:CONTAINER
Use -w to set the working folder to the mapped directory inside
Use --rm flag to remove container on completion of execution

buntu@test2:~$ docker run -it --rm --gpus all -v /home/ubuntu/data:/mnt/ -w /mnt/Pytorch_comet_opt 
  srijith/pytorch:0 python attention_pytorch_opt.py
    
Building vocabulary
/home/user/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py:50: UserWarning:

dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1

Shape of vocabulary torch.Size([75002, 100])
Time spent  7.697644948959351
| Epoch: 01 | Train Loss: 0.240 | Train Acc: 90.10% | Val. Loss: 0.053 | Val. Acc: 98.71% |
Time spent  8.794492483139038
| Epoch: 02 | Train Loss: 0.037 | Train Acc: 99.11% | Val. Loss: 0.031 | Val. Acc: 99.36% |
Time spent  9.919449806213379
| Epoch: 03 | Train Loss: 0.019 | Train Acc: 99.56% | Val. Loss: 0.027 | Val. Acc: 99.50% |
Time spent  9.897263050079346
| Epoch: 04 | Train Loss: 0.011 | Train Acc: 99.74% | Val. Loss: 0.027 | Val. Acc: 99.47% |
Time spent  10.000884771347046
| Epoch: 05 | Train Loss: 0.006 | Train Acc: 99.84% | Val. Loss: 0.032 | Val. Acc: 99.38% |
 Test Loss: and Acc:  2.184105634689331 0.8088597059249878
Number of unknown tokens  3658

Running the GPU container

GPU Usage on Host

Thank you!

Email:srijithr@vt.edu

Website:srijithr.gitlab.io

HPCKK

By sjster

HPCKK

Presentation slides for Stance detection

Reproducible Deep Learning

Contents

Problem

Solution Overview

Data Ingestion and Preprocessing Pipeline

Training Pipeline

General Workflow

Hyperparameter Optimization

App

Running the Dockerized Application

HPCKK

More from sjster