Digital Media Analysis Using Information Fusion and Machine Learning

Dima Kagan

Advisor: Dr. Michael Fire

2020

Publications

  • Dima Kagan, Thomas Chesney, and Michael Fire, "Using data science to understand the film industry’s gender gap."  Humanit Soc Sci Commun ( formerly Palgrave Communications) 6.1 (2020): 1-16.
  • Dima Kagan, Jacob Moran-Gilad, Michael Fire, Scientometric trends for coronaviruses and other emerging viral infections, GigaScience, Volume 9, Issue 8, August 2020, giaa085, https://doi.org/10.1093/gigascience/giaa085.
  • Dima Kagan, Galit Fuhrmann Alpert, and Michael Fire, Zooming Into Video Conferencing Privacy and Security Threats, Under Review in Communications of the ACM.

Scientometric trends for coronaviruses and other emerging viral infections

Dima kagan, Jacob-MORAN Gilad, Michael fire

Motivation

  • COVID-19 is the most rapidly expanding coronavirus outbreak in the past two decades.
     
  • To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential.
     
  • Much can be learned from past infectious disease outbreaks to improve preparedness and response to future public health threats.

Research GOAL

Three key questions arise in light of the COVID-19 outbreak:

  1. To what extent were the previous human coronavirus SARS and MERS) outbreaks studied? 
  2. Is research on emerging viruses being sustained, aiming to understand and  prevent future epidemics?
  3. Are there lessons from academic publications on previous emerging viruses that could be applied to the current COVID-19 epidemic?

Analysis

Datasets

  • PubMed -Academic publications on the topics of medicine, nursing, dentistry, veterinary medicine, health care systems, and preclinical sciences.
  • Microsoft Academic Graph - is a dataset containing “scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study”.
  • SJR - contains the information and ranking of academic journals.
  • Wikidata - stores metadata about items, and each item has an identifier and can be associated with other items.

Results

  • Our results demonstrate that previous coronavirus outbreaks have been understudied compared to other viruses.
  • We also show that the research volume of emerging infectious diseases is very high after an outbreak and drops drastically upon the containment of the disease.
  • This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention. 

Zooming Into Video Conferencing Privacy and Security Threats

Dima kagan, Galit Fuhrmann Alpert, Michael fire

Motivation

  • The COVID-19 pandemic outbreak, with its related social distancing and shelter-in-place measures, has been associated with unprecedented growth in the use of video conferencing applications, exposing users to novel privacy and security threats in both virtual and real world.
  • Millions of people around the world have replaced face-to-face interactions with video conferencing platforms for collaboration, education and social meetings with co-workers, family and friends. Yet, users are not aware of multiple levels of privacy risks that are associated with the uncareful use of conference meetings.

Data Collection

DATA PROCESSING

Results

  • Dataset of 15,783 Zoom collage images.

  • 142,000 face extracted with gender and age metadata.

  • We identified 1,153 faces that likely appeared in several different Zoom meeting images.

  • Using the cross-referenced images, we constructed a large-scale social network of Zoom users with 16,842 nodes (participants) and 197,765 edges.

  • The network consists of distinct 345 connected components.
    On average, each separate component consisted of 48.8 participant nodes and 573.2 joint meeting edges respectively. The largest component consisted of 3,066 nodes and 55,035 edges .

  • By inspection of randomly selected participants, we were able to manually locate their personal social network profiles. We also observed networks where all participants were co-workers.

Future work

2021

Courses

 

  • Semester A - The Art of Analyzing Big Data - The Data Scientist’s Toolbox (372.2.5401)
  • Semester B - Introduction to Deep Learning (372.2.6101)

PROPOSED Research

  • Utilization of deep learning to study diversity in the film industry analyzing:
    • Films
    • Posters
    • Trailers

Research Projects

 

  • Weevils Infestation Detection Utilizing Deep Learning and Sensor Fusion.
  • Using Data Science to Analyze Infectious Disease Trends.
  • Ethnic Diversity in the Film Industry Through Poster Analysis.
  • Gender Bias of Objective Measures for Academic Career Success.
  • Improving Israeli Public Transportation Using Artificial Intelligence.
  • etc.

Thank you questions?

Status Report

By Dima Kagan

Status Report

  • 122