HPCKK

Reproducible Deep Learning

Automatic animations between code

Dr. Srijith Rajamohan

Weakly-Supervised NLP
Infrastructure
- Data Ingestion
- Deep Learning
Workflow overview
Airflow orchestration with conda environments
Ansible for deployment
- Deploy Visual Analytics App
- Dockerized GPU training environments

Problem

Natural Language Understanding for determining political affiliation
- Understanding intent is difficult
Augmented Intelligence
- Human-in-the-loop: Results of DNNs used to create projections that assist humans in classifying documents
Interpretable
- Self-attention gives insight into the decision-making process of a DNN

Solution Overview

Use PySpark to clean the data

Project affiliation in a 2D space similar to a form of Aspect-Based Sentiment Analysis (ABSA)

Self-attention based BiLSTM with pretrained static and contextual embeddings (Elmo)

Evaluate visualization/cognitive efficiencies of various dimensionality reduction techniques

Interactive web application to help correctly label this weakly-supervised data

Gather social media posts related to certain political hashtags, along with user metadata

Data Ingestion and Preprocessing Pipeline

VT cloud server
- 18 cores,192GB RAM, 200 + 500GB volume
- Conda environments for package management
Python RQ
- Redis-based framework for job scheduling
- Uses Tweepy for interaction with Twitter
- Downloads and stores tweets corresponding to certain hashtags in timestamped files
MongoDB setup for interacting with the data
Metabase used as a dashboard for this DB
- Interactive filtering, visualizing and exploratory analysis from local machine

Training Pipeline

PyTorch code runs on GPUs
- 4 Volta GPU node with 16GB per-GPU memory
Workflow automated with Airflow
- PyTorch code:
  - Model training
  - Generates metrics: accuracy, F1 and ROC scores
  - Dimension reduction for visualization
- Plot.ly used for generating the graphs
  - Generating the plots from the metric files generated by PyTorch
Hyperparameter optimization done with Comet.ml

General Workflow

Hyperparameter Optimization

App

Ansible Notebooks for Deployment

# Python code
# 
- hosts: all
  tasks:
  - name: ping all hosts
    ping:

  - name: Supervisor install
    become: yes
    apt:
      name: supervisor
      state: latest
    tags:
      - supervisor_install

Ansible for deploying:

Conda environments through environment.yml files
Docker containers through Docker files

Conda Environment

$ conda env create -f env.yml -n keras_env2

(keras_env2) ubuntu@test2:~$ head -n 40 env.yml
name: keras_env
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - blas=1.0=mkl
  - brotlipy=0.7.0=py37h7b6447c_1000
  - ca-certificates=2020.1.1=0
  - certifi=2020.4.5.1=py37_0
  - cffi=1.14.0=py37he30daa8_1
  - click=7.1.2=py_0
  - dash=1.4.1=py_0

Dockerfile

FROM nvidia/cuda:10.0-base-ubuntu16.04
# Install some basic utilities
RUN apt-get update && apt-get install -y \
...
RUN curl -so ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && chmod +x ~/miniconda.sh && ~/miniconda.sh -b -p ~/miniconda 
ENV PATH=/home/user/miniconda/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
...

RUN conda install -y -c pytorch && pytorch==1.4 torchvision cudatoolkit=10.1 && conda clean -ya
# Install Deep learning packages
RUN conda install pandas scikit-learn scipy
RUN conda install plotly -c plotly
RUN conda install -c conda-forge spacy
RUN pip install pytorch-nlp torchtext
...

CMD ["python3"]

ubuntu@test2:~$ docker build -t IMAGE_NAME .

Building the Docker Container

ubuntu@test2:~$ docker images
REPOSITORY          TAG                     IMAGE ID            CREATED             SIZE
srijith/pytorch     0                       e5ee0ad6fc76        17 hours ago        8.22GB

Running the Dockerized Application

Use --gpus all flag to use all the Host GPUs inside the container
Map host folder to container with -v HOST:CONTAINER
Use -w to set the working folder to the mapped directory inside
Use --rm flag to remove container on completion of execution

buntu@test2:~$ docker run -it --rm --gpus all -v /home/ubuntu/data:/mnt/ -w /mnt/Pytorch_comet_opt 
  srijith/pytorch:0 python attention_pytorch_opt.py
    
Building vocabulary
/home/user/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py:50: UserWarning:

dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1

Shape of vocabulary torch.Size([75002, 100])
Time spent  7.697644948959351
| Epoch: 01 | Train Loss: 0.240 | Train Acc: 90.10% | Val. Loss: 0.053 | Val. Acc: 98.71% |
Time spent  8.794492483139038
| Epoch: 02 | Train Loss: 0.037 | Train Acc: 99.11% | Val. Loss: 0.031 | Val. Acc: 99.36% |
Time spent  9.919449806213379
| Epoch: 03 | Train Loss: 0.019 | Train Acc: 99.56% | Val. Loss: 0.027 | Val. Acc: 99.50% |
Time spent  9.897263050079346
| Epoch: 04 | Train Loss: 0.011 | Train Acc: 99.74% | Val. Loss: 0.027 | Val. Acc: 99.47% |
Time spent  10.000884771347046
| Epoch: 05 | Train Loss: 0.006 | Train Acc: 99.84% | Val. Loss: 0.032 | Val. Acc: 99.38% |
 Test Loss: and Acc:  2.184105634689331 0.8088597059249878
Number of unknown tokens  3658

Running the GPU container

GPU Usage on Host

Thank you!

Email:srijithr@vt.edu

Website:srijithr.gitlab.io