PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Course Overview

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

January 8, 2018

Syllabus

 

Virtual machine setup

 

Python crash course

Syllabus

Key Points on Syllabus

  • Instructor Information and Office Hours
     
  • Course Content
     
  • Attendance
     
  • Homework
     
  • Course Project
    -  Proposal
    -  Final report

Instructor Information and Office Hours

  • Hui Hu, PhD
     
  • Office: CTRB 4224
     
  • Email: huihu@ufl.edu
     
  • Phone: 352-294-5944
     
  • Office hours: By appointment

Tell us

  • your name
     
  • what program you are in
     
  • what programming languages you are familiar with
     
  • what data engineering skills you want to learn from this course

Course Content

  • Course Overview
    - VM setup, Python crash course
  • SQL
    - Basic SQL
    - Data models and relational SQL
    - Many-to-many relationships in SQL
  • NoSQL
    - NoSQL databases
    - Access web data
  • Different Types of Data
    - Spatial data
    - Text data
    - Image and time-series data
    - Big data

Attendance

  • Attendance is mandatory
     
  • UF policy for excused absences applies (must notify instructor in writing before class when possible)
     
  • Each unexcused absence results in a 1.5% deduction from the final grade
     
  • >3 unexcused absences results in failure

Homework

  • 6 homework assignments
    -  10% each
    -  the highest 5 grades will count towards the final grade
     
  • Often simple programming exercises
     
  • Requirements:
    -  turn in assignment no later than 11:59 pm on the day it is due
    -  late assignment will NOT be accepted
    -  no handwritten assignment
    -  DO NOT copy others' work

Course Project

  • Requirements:
    -  must include at least 1 non-traditional data source (i.e. spatial data, text data, image data, time-series data)
    -  uses of semi-structured and unstructured data are encouraged
    -  uses of web data accessed by API or scraping are encouraged
     
  • Some examples (from last year):
    -  Side Effects and Adverse Reactions to Painkillers: Analysis with FDA Adverse Event Reporting System
    -  Utilizing nontraditional data sources for near real-time estimation of Zika virus case trends during the 2016 Florida USA Zika outbreak
    -  Twitter Mining for Cocaine Use
    -  Medical Marijuana Laws and Change of Number of Tweets towards Marijuana: A Time Series Analysis Using Data from Twitter

Course Project (continued)

  • You can work individually or work as a team
     
  • If choose to work as a team:
    -  each team can have up to 2 members
    -  clearly delineate roles and responsibilities of each team member
     
  • Project Due:
    -  Feb 5, 2018: form a project team
    -  March 12, 2018: midterm presentation and project proposal
    -  Apr 16, 2018: final presentation
    -  Apr 23, 2018: final project report

Midterm

  • Project proposal:
    -  Abstract: up to 1 page
    -  Project description: up to 5 pages
        ~ Specific Aims/Objectives
        ~ Background and Significance
        ~ Approach/Research Design
        ~ Timeline
    -  Citations: no page limit, use the Vancouver style
    -  Single column, single spacing; Arial or Times New Roman font; font size no smaller than 11 point; tables and figure labels can be in 10 points; minimum 0.5 inch margins
     
  • Proposal presentations:
    -  up to 15 slides
    -  up to 15 minutes presentation with 5 minutes Q&A
    -  send the slides to instructor at least 3 days in advance

Final

  • Final Report: up to ten pages (including references)
    -  Abstract: no more than 250 words summarizing the project
    -  Introduction: a short background and objective(s) of the study
    -  Methods: design, setting, dataset, approaches, and main outcome measurements
    -  Results: key findings
    -  Discussion: key conclusions with direct reference to the implications of the methods and/or results
    -  References: please follow the Vancouver style
     
  • Final presentations:
    -  up to 15 slides
    -  up to 15 minutes presentation with 5 minutes Q&A
    -  send the slides to instructor at least 3 days in advance
     
  • Note: analyses are required, but you should focus more on the data accessing and engineering part.

Grading

  • Attendance: 5%
     
  • Homework: 50%
     
  • Midterm (project proposal and presentation): 15%
     
  • Final (project report and presentation): 30%

Virtual Machine Setup

Python Crash Course

git clone https://github.com/benhhu/PHC7065SPR2018.git

PHC7065-Spring2018-Lecture1

By Hui Hu

PHC7065-Spring2018-Lecture1

Slides for Lecture 1, Spring 2018, PHC7065 Critical Skills in Data Manipulation for Population Science

  • 562