Stance detection with

Weakly-Supervised NLP

Automatic animations between code

Dr. Srijith Rajamohan

Problem

  • Natural Language Understanding for determining political affiliation
    • Understanding intent is difficult
  • Augmented Intelligence
    • Human-in-the-loop: Results of DNNs used to create projections that assist humans in classifying documents  
  • Interpretable
    • Self-attention gives insight into the decision-making process of a DNN

 

Problem

  • Can we minimize the human effort for labeling?
  • Are there certain types of dimensionality reduction techniques that result in better projections for this purpose?
  • How effective are embeddings at learning intent from noisy text?

Solution Overview

Use PySpark to clean the data

Project affiliation in a 2D space similar to a form of Aspect-Based Sentiment Analysis (ABSA)

Self-attention based BiLSTM with pretrained static and contextual embeddings (Elmo)

Evaluate visualization/cognitive efficiencies of various dimensionality reduction techniques

Interactive web application to help correctly label this weakly-supervised data

Gather social media posts related to certain political hashtags, along with user metadata

Data Ingestion Pipeline

  • VT cloud server
    • 18 cores,192GB RAM, 200 + 500GB volume 
    • Conda environments for package management
  • Python RQ
    • Redis-based framework for job scheduling 
    • Uses Tweepy for interaction with Twitter
    • Downloads and stores tweets corresponding to certain hashtags in timestamped files
  • MongoDB setup for interacting with the data
  • Metabase used as a dashboard for this DB
    • Interactive filtering, visualizing and exploratory analysis from local machine  

Data Cleaning

  • Weakly-supervised data
    • Noisy labels because tweet hashtags do not correspond to affiliation
  • PySpark for data preprocessing
    • John Snow Labs Sparknlp library 
  • Data cleaning
    • Remove hyperlinks, emoticons
    • Remove entries with empty Twitter User Descriptions (TUDs)
  • No stemming or lemmatization was done
    • Not recommended for contextual embeddings  

Aspect-Based Sentiment Analysis

  • Different from sentiment analysis
    • Learn an entity’s opinion towards a number of topics that constitute a person’s political affiliation here
    • Different from simply expressing a positive or negative sentiment
  • ABSA also referred to as stance detection
    • Frame this as inter-point distance in n-dimensional space
  • Stance detection performed on Twitter User Description
    • Training done on these TUDs

Training

  • PyTorch code runs on GPUs
  • 4 Volta GPU node with 16GB per-GPU memory
  • Workflow automated with Airflow
    • PyTorch code:
      • Model training
      • Generates metrics for accuracy, F1-score and ROC scores
      • Dimension reduction for visualization
    • Plot.ly used for generating the graphs
      • Generating the plots from the metric files generated by PyTorch
  • Hyperparameter optimization done with Comet.ml 

Corpus Statistics

  • About 120k documents in the training/validation/test corpus
  • Imbalanced dataset: 90k/30k for classes 
  • Torchtext module
  • Use F1-score as the metric for assessment

 

Network Architecture

Embeddings

  • Static embeddings
    • Glove
    • Glove.twitter
    • Charngrams
    • 100-dimensional embeddings
  • Contextual embeddings
    • Elmo
  • Full training data
    • All the data we have
  • Limited training data
    • Randomly sampled 5k training set
  • Hand-curated set of 1219

 

General Workflow

Hyperparameter Optimization

APP DEMO

Projections - Interaction

  • Entities are color-coded to represent the two ideologies
  • Projections are color and opacity coded for affiliation in corpus, and strength of classification respectively
  • Proximity in the two-dimensional space can be used to represent relative ’political affiliation’ without having to quantify it subjectively.
  • Types of identifiable errors 
    • Type a error due to weakly-supervised label inaccuracy
    • Type b error due to inaccuracy of the predicted label by the model
    • Outliers: do not belong in that particular group

ELMo (Embeddings from Language Models)

  • Contextual embeddings that address polysemy, used to capture semantic meaning
    • Each token embedding is a function of the entire sentence
  • Bidirectional language modeling 
  • Character-based and can use morphological cues to form representations of out-of-vocabulary words
  • Shown to work well on small datasets

Active Learning

  • Identify misclassifications as outliers -
    • "Use supervision whenever possible"
  • Active learning used to correct labels -
    • Done by expert who is provided guidance on where to look
  • In the MetaCost[ref] algorithm, normal class elements that have a reasonable probability are relabeled
    • Cost-based relabeling can worsen the error
  • Do this visually using the entity/cluster proximity
  • How we define reasonable probability is the key...
    • Entities with the greatest uncertainty

A Note on Softmax Scores

A Note on Softmax Scores

Incorrect Classification - Type (a)

The classifier prediction correctly identifies the true label - incorrect document label

Incorrect Classification - Outlier

Classifier output is incorrect, but the projection coordinates identify the incorrect document label

Attention

  • Given a sentence of 'n' words and each word represented by 'd' dimension, each sentence is now a matrix (n,d)
  • Hidden layer outputs of BiLSTM (u) for a sentence has output 'H'(n,2u)
  • Linear combination of the 'n' vectors to form the attention vector 'm' (1,2u) and attention weights vector 'a' (1,n)
    • 'da' is a hyperparameter
    • w_{s2} is a vector of size 'da'
    • W_s1 (da,2u)
    • These weights are learned through training
\textbf{a} = softmax(w_{s2} tanh( W_{s1} H^T)) \\ \textbf{m} = \textbf{a} H

Attention Layer for Interpretability

Weights correspond to the inferred importance of the words for the classifier

Dimensionality Reduction for Visualization

  • PCA
    • linear dimensionality reduction with vanilla PCA
  • MDS
    • Low-dimensional projections that attempts to maintain the inter-point distance in high-dimensional space
    • Similarity calculated using euclidean distance
  • Isomap
    • Similar to MDS, similarity calculated with geodesic distances
    • Can capture non-linear manifold structure
  • t-SNE
    • Similarities as joint probabilities
    • Good separation of points

MDS

  • MDS
    • Low-dimensional projections that attempts to maintain the inter-point distance in high-dimensional space
    • Similarity calculated using euclidean distance (other distance metrics can also be used here)

 

 

  • If [X] is the high dimensional matrix of observations
  • $[Z]$ is a low-dimensional we hope to obtain through MDS.
\textbf{Stress}^2 = \sum_{i=1}^{nrows} \sum_{j=i+1}^{nrows} \biggl( ([ X_i - X_j ] [X_{i} - X_{j}]^{T} - [Z_i - Z_j][Z_{i} - Z_{j}]^T) \biggr)

Isomap

  • Similar to MDS
  • Geodesic distance is used to compute similarities of points
    • Helps to capture the curvature of the manifold 
    • Approximate distances since we don't know the true manifold
    • Build a neighborhood graph using the following formula
    • Using the nearest neighbor graph, use Dijkstra's algorithm to find the shortest path for pairs
  • Apply MDS using the above computed distances

t-SNE

  • Similarities as joint probabilities
  • Minimize the KL divergence between high-d (P) and low-d (Q) joint probabilities
    • For N points, and a high-d space represented by the vector X and low-d space represented by Y
    • The joint probabilities in high-d space to measure similarity
    • Sigma is the bandwidth of the gaussian kernel
e^{-d(x_i, x_j)^2 / 2\sigma_i^2} / \sum_{k \neq i}e^{-d(x_i, x_k)^2 / 2\sigma_i^2} \\ \\ p_{ij} = \dfrac{p_{i|j} + p_{j|i}}{2N}
q_{ij} = \dfrac{(1 + \lVert y_i - y_j \rVert^2)^{-1}}{\sum_{k \neq l}(1 + \lVert y_l - y_k \rVert^2)^{-1}}

The heavy tails of the Student-t kernel allow you to transform the small inter-point distances in high-d to points farther apart in low-d (good separation)

Dimensionality Reduction Techniques

1. PCA
2. MDS
3. Isomap
4. t-SNE

Results

Ansible Notebooks for Deployment

# Python code
# 
- hosts: all
  tasks:
  - name: ping all hosts
    ping:

  - name: Supervisor install
    become: yes
    apt:
      name: supervisor
      state: latest
    tags:
      - supervisor_install

Make your code shine with highlighting + Avbcuto-Animate

Part II - Graph Analytics on Social Networks

Mapping Right-wing Extremism

  • Network mapping of right-wing figures and their supporters
  • Identify influential associates and communities
  • First (RQ, Tweepy) and second degree network information 
  • About 1 million files spread across 24 folders
  • Graph of about 72 million nodes and 1 billion edges

 

      Network Visualization

Make your code shine with highlighting + Avbcuto-Animate

First degree network

Second degree networks: Incremental visualization of large networks

      Network Visualization

Exploratory analysis: Identifying general trends in network relationships

   Graph Visualization - Top 100

Workflow

Read 1 million txt files of friends and followers info across 24 folders

Generate edges and extract metrics

Exploratory analysis and visualizations

Incremental visualization of network in graphtools

Compute Pagerank and centrality measures for all nodes

Interactive filtering of Pagerank results in Pyspark shell

Visualization of the subgraph generated

spark = SparkSession(sc)
sc.setLogLevel("INFO")
sqlcontext = pyspark.sql.SQLContext(sc)

data = read_input()
df = get_edges(data)
df = swap_edges_for_relationship(df)
v_name, userinfo = get_vertices(data)

g = GraphFrame(v_name, df)
results = g.pageRank(resetProbability=0.01, maxIter=1)

go = g.outDegrees
gi = g.inDegrees
gd = g.degrees
g_degree = gi.join(go, on='id', how='full').join(gd, on='id', how='full')

GraphFrames: Performing Pagerank

res_read_cache = res_read_cache.withColumn('friends_of_central_figure', 
                                           find_elem_fr_udf(res_read_cache.id))
res_read_cache_filter = res_read_cache.filter(res_read_cache
                                           ['friends_of_central_figure'].isNotNull())
res_read_cache_filter = res_read_cache_filter.cache()
... operation to trigger caching ...

# Faster interactive querying
res_read_cache_filter.sort(size(col("friends_of_central_figure")),
                           ascending=True).show()
res_read_cache_filter.sort(size(col("friends_of_central_figure")), 
                           ascending=False).show()
# Send output files
rclone sync /home/vt/page_rank_mapped_query_single_reordered.out 
			remote_google:liuqing_processed/query_reordered -v

Fast Interactive Querying

res_read_cache = res_read_cache.withColumn('friends_of_central_figure',
                               find_elem_fr_udf(res_read_cache.id))
res_read_cache_filter = res_read_cache.filter(res_read_cache
                               ['friends_of_central_figure'].isNotNull())
res_read_cache_filter = res_read_cache_filter.cache()
... operation to trigger caching ...

# Faster interactive querying
res_read_cache_filter.sort(size(col("friends_of_central_figure")), 
                           ascending=True).show()
res_read_cache_filter.sort(size(col("friends_of_central_figure")), 
                           ascending=False).show()
# Send output files
rclone sync /home/vt/page_rank_mapped_query_single_reordered.out 
			remote_google:liuqing_processed/query_reordered -v

Send Results

Work in progress...

Tweet evaluation

  • Cardniffnlp Tweeteval
    • Twitter-roberta-base masked language model embeddings
    • Task-specific embeddings
  • Four tasks were evaluated
    • Hate speech
    • Offensive tweets
    • Emotion recognition
  • Pytorch Distributed Data Parallel code using HuggingFace

 

Thank you!

Made with Slides.com