PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Course Overview

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

January 6, 2020

Syllabus

 

Virtual machine setup
 

Linux terminal and Git

Syllabus

Key Points on Syllabus

  • Instructor Information and Office Hours
     
  • Course Content
     
  • Attendance and Participation
     
  • Mid-term Exam
     
  • Homework
     
  • Course Project
    -  Proposal
    -  Final report

Instructor Information and Office Hours

  • Hui Hu, PhD
     
  • Office: CTRB 4224
     
  • Email: huihu@ufl.edu
     
  • Phone: 352-294-5944
     
  • Office hours: By appointment

Course Content

  • Course Overview
    - VM setup, R crash course
  • SQL
    - Basic SQL
    - Data models and relational SQL
    - Many-to-many relationships in SQL
  • Access web data
    - Query APIs
  • Spatial data
    - PostGIS
  • NoSQL databases
    - MongoDB
  • Big Data
    - Spark

Attendance

  • Attendance is mandatory
     
  • UF policy for excused absences applies (must notify instructor in writing before class when possible)
     
  • Each unexcused absence results in a 1.5% deduction from the final grade
     
  • >3 unexcused absences results in failure

Homework

  • 6 homework assignments
    -  5% each
    -  the highest 5 grades will count towards the final grade
     
  • Often simple programming exercises
     
  • Requirements:
    -  turn in assignment no later than 11:59 pm on the day it is due
    -  late assignment will NOT be accepted
    -  no handwritten assignment
    -  DO NOT copy others' work

Mid-term Exam

  • 35%
     
  • Focus on SQL
     
  • More details will be shared with the class in early Feb

Course Project

Option A

Option B

Pick one article from a list of publications (will be made available mid-February), and write codes to reproduce its data engineering and descriptive analyses

Come up your own ideas for the course project. It is required to include at least ONE non-traditional data source (e.g. spatial data, web data, etc.) other than the traditional survey data

Course Project (continued)

  • You can work individually or work as a team
     
  • If choose to work as a team:
    -  each team can have up to 2 members
    -  clearly delineate roles and responsibilities of each team member
     
  • Project Due:
    -  Feb 17, 2020: form a project team
    -  March 9, 2020: project proposal
    -  Apr 13, 2020: final presentation
    -  Apr 20, 2020: final project report

Midterm

  • Project proposal:

    -  Cover Page: Include title and list of team members.

    -  Project description: Up to one (1) page.










    -  Literature cited (no page limit); please follow the Vancouver style.
    Proposals must use single column and single spacing; Arial or Times New Roman font; font size no smaller than 11 point; tables and figure labels can be in 10 point; 0.5 inch margins.

o Specific Aims/Objectives: for those choosing option A, please cite the article you’d like to reproduce and briefly summarize the specific aims/objectives of the article. For those choosing option B, please state your aims/objectives.

o Data Source: please provide details about the data and how it can be accessed

o Preliminary Data Pipelines: please briefly describe the data engineering steps involved in this project

o Timeline

Final

  • For those choosing option A, the project report should be structured as an R Markdown Notebook, with all the codes and explanations to the codes.
     
  • For those choosing option B, please structure the report to include:
    -  Title (14 point typeface) and names of each team member
    -  Abstract: no more than 250 words summarizing the project.
    -  Introduction: a short background and objective(s) of the study.
    -  Methods: design, setting, dataset, approaches, and main outcome measurements.
    -  Results: key findings
    -  Discussion: key conclusions with direct reference to the implications of the methods and/or results.
    -  References: please follow the Vancouver style.

Grading

  • Attendance and participation: 5%
     
  • Homework: 25%
     
  • Mid-term exam: 35%
     
  • Project proposal: 5%
     
  • Final project presentation: 10%
     
  • Final project report: 20%

Virtual Machine Setup

Linux Terminal and Git