Reproducible Deep Learning
Automatic animations between code
Dr. Srijith Rajamohan
Contents
- Weakly-Supervised NLP
-
Infrastructure
- Data Ingestion
- Deep Learning
- Workflow overview
- Airflow orchestration with conda environments
-
Ansible for deployment
- Deploy Visual Analytics App
- Dockerized GPU training environments
Problem
-
Natural Language Understanding for determining political affiliation
- Understanding intent is difficult
-
Augmented Intelligence
- Human-in-the-loop: Results of DNNs used to create projections that assist humans in classifying documents
-
Interpretable
- Self-attention gives insight into the decision-making process of a DNN
Solution Overview
Use PySpark to clean the data
Project affiliation in a 2D space similar to a form of Aspect-Based Sentiment Analysis (ABSA)
Self-attention based BiLSTM with pretrained static and contextual embeddings (Elmo)
Evaluate visualization/cognitive efficiencies of various dimensionality reduction techniques
Interactive web application to help correctly label this weakly-supervised data
Gather social media posts related to certain political hashtags, along with user metadata
Data Ingestion and Preprocessing Pipeline
- VT cloud server
- 18 cores,192GB RAM, 200 + 500GB volume
- Conda environments for package management
- Python RQ
- Redis-based framework for job scheduling
- Uses Tweepy for interaction with Twitter
- Downloads and stores tweets corresponding to certain hashtags in timestamped files
- MongoDB setup for interacting with the data
- Metabase used as a dashboard for this DB
- Interactive filtering, visualizing and exploratory analysis from local machine
Training Pipeline
- PyTorch code runs on GPUs
- 4 Volta GPU node with 16GB per-GPU memory
- Workflow automated with Airflow
- PyTorch code:
- Model training
- Generates metrics: accuracy, F1 and ROC scores
- Dimension reduction for visualization
- Plot.ly used for generating the graphs
- Generating the plots from the metric files generated by PyTorch
- PyTorch code:
- Hyperparameter optimization done with Comet.ml
General Workflow
Hyperparameter Optimization
App
Ansible Notebooks for Deployment
# Python code
#
- hosts: all
tasks:
- name: ping all hosts
ping:
- name: Supervisor install
become: yes
apt:
name: supervisor
state: latest
tags:
- supervisor_install
Ansible for deploying:
- Conda environments through environment.yml files
- Docker containers through Docker files
Conda Environment
$ conda env create -f env.yml -n keras_env2
(keras_env2) ubuntu@test2:~$ head -n 40 env.yml
name: keras_env
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- blas=1.0=mkl
- brotlipy=0.7.0=py37h7b6447c_1000
- ca-certificates=2020.1.1=0
- certifi=2020.4.5.1=py37_0
- cffi=1.14.0=py37he30daa8_1
- click=7.1.2=py_0
- dash=1.4.1=py_0
Dockerfile
FROM nvidia/cuda:10.0-base-ubuntu16.04
# Install some basic utilities
RUN apt-get update && apt-get install -y \
...
RUN curl -so ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& chmod +x ~/miniconda.sh && ~/miniconda.sh -b -p ~/miniconda
ENV PATH=/home/user/miniconda/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
...
RUN conda install -y -c pytorch && pytorch==1.4 torchvision cudatoolkit=10.1 && conda clean -ya
# Install Deep learning packages
RUN conda install pandas scikit-learn scipy
RUN conda install plotly -c plotly
RUN conda install -c conda-forge spacy
RUN pip install pytorch-nlp torchtext
...
CMD ["python3"]
ubuntu@test2:~$ docker build -t IMAGE_NAME .
Building the Docker Container
ubuntu@test2:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
srijith/pytorch 0 e5ee0ad6fc76 17 hours ago 8.22GB
Running the Dockerized Application
- Use --gpus all flag to use all the Host GPUs inside the container
- Map host folder to container with -v HOST:CONTAINER
- Use -w to set the working folder to the mapped directory inside
- Use --rm flag to remove container on completion of execution
buntu@test2:~$ docker run -it --rm --gpus all -v /home/ubuntu/data:/mnt/ -w /mnt/Pytorch_comet_opt
srijith/pytorch:0 python attention_pytorch_opt.py
Building vocabulary
/home/user/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py:50: UserWarning:
dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
Shape of vocabulary torch.Size([75002, 100])
Time spent 7.697644948959351
| Epoch: 01 | Train Loss: 0.240 | Train Acc: 90.10% | Val. Loss: 0.053 | Val. Acc: 98.71% |
Time spent 8.794492483139038
| Epoch: 02 | Train Loss: 0.037 | Train Acc: 99.11% | Val. Loss: 0.031 | Val. Acc: 99.36% |
Time spent 9.919449806213379
| Epoch: 03 | Train Loss: 0.019 | Train Acc: 99.56% | Val. Loss: 0.027 | Val. Acc: 99.50% |
Time spent 9.897263050079346
| Epoch: 04 | Train Loss: 0.011 | Train Acc: 99.74% | Val. Loss: 0.027 | Val. Acc: 99.47% |
Time spent 10.000884771347046
| Epoch: 05 | Train Loss: 0.006 | Train Acc: 99.84% | Val. Loss: 0.032 | Val. Acc: 99.38% |
Test Loss: and Acc: 2.184105634689331 0.8088597059249878
Number of unknown tokens 3658
Running the GPU container
GPU Usage on Host
Thank you!
Email:srijithr@vt.edu
Website:srijithr.gitlab.io
HPCKK
By sjster
HPCKK
Presentation slides for Stance detection
- 180