Automatic animations between code
Dr. Srijith Rajamohan
Use PySpark to clean the data
Project affiliation in a 2D space similar to a form of Aspect-Based Sentiment Analysis (ABSA)
Self-attention based BiLSTM with pretrained static and contextual embeddings (Elmo)
Evaluate visualization/cognitive efficiencies of various dimensionality reduction techniques
Interactive web application to help correctly label this weakly-supervised data
Gather social media posts related to certain political hashtags, along with user metadata
The classifier prediction correctly identifies the true label - incorrect document label
Classifier output is incorrect, but the projection coordinates identify the incorrect document label
Weights correspond to the inferred importance of the words for the classifier
The heavy tails of the Student-t kernel allow you to transform the small inter-point distances in high-d to points farther apart in low-d (good separation)
1. PCA
2. MDS
3. Isomap
4. t-SNE
Ansible Notebooks for Deployment
# Python code
#
- hosts: all
tasks:
- name: ping all hosts
ping:
- name: Supervisor install
become: yes
apt:
name: supervisor
state: latest
tags:
- supervisor_install
Part II - Graph Analytics on Social Networks
Mapping Right-wing Extremism
Network Visualization
First degree network
Second degree networks: Incremental visualization of large networks
Network Visualization
Exploratory analysis: Identifying general trends in network relationships
Graph Visualization - Top 100
Workflow
Read 1 million txt files of friends and followers info across 24 folders
Generate edges and extract metrics
Exploratory analysis and visualizations
Incremental visualization of network in graphtools
Compute Pagerank and centrality measures for all nodes
Interactive filtering of Pagerank results in Pyspark shell
Visualization of the subgraph generated
spark = SparkSession(sc)
sc.setLogLevel("INFO")
sqlcontext = pyspark.sql.SQLContext(sc)
data = read_input()
df = get_edges(data)
df = swap_edges_for_relationship(df)
v_name, userinfo = get_vertices(data)
g = GraphFrame(v_name, df)
results = g.pageRank(resetProbability=0.01, maxIter=1)
go = g.outDegrees
gi = g.inDegrees
gd = g.degrees
g_degree = gi.join(go, on='id', how='full').join(gd, on='id', how='full')
GraphFrames: Performing Pagerank
res_read_cache = res_read_cache.withColumn('friends_of_central_figure',
find_elem_fr_udf(res_read_cache.id))
res_read_cache_filter = res_read_cache.filter(res_read_cache
['friends_of_central_figure'].isNotNull())
res_read_cache_filter = res_read_cache_filter.cache()
... operation to trigger caching ...
# Faster interactive querying
res_read_cache_filter.sort(size(col("friends_of_central_figure")),
ascending=True).show()
res_read_cache_filter.sort(size(col("friends_of_central_figure")),
ascending=False).show()
# Send output files
rclone sync /home/vt/page_rank_mapped_query_single_reordered.out
remote_google:liuqing_processed/query_reordered -v
Fast Interactive Querying
res_read_cache = res_read_cache.withColumn('friends_of_central_figure',
find_elem_fr_udf(res_read_cache.id))
res_read_cache_filter = res_read_cache.filter(res_read_cache
['friends_of_central_figure'].isNotNull())
res_read_cache_filter = res_read_cache_filter.cache()
... operation to trigger caching ...
# Faster interactive querying
res_read_cache_filter.sort(size(col("friends_of_central_figure")),
ascending=True).show()
res_read_cache_filter.sort(size(col("friends_of_central_figure")),
ascending=False).show()
# Send output files
rclone sync /home/vt/page_rank_mapped_query_single_reordered.out
remote_google:liuqing_processed/query_reordered -v
Send Results
Tweet evaluation
Thank you!