PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Course Overview

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

January 11, 2021

Syllabus

 

Virtual machine setup
 

Linux terminal and Git

Syllabus

Key Points on Syllabus

  • Instructor Information and Office Hours
     
  • Course Format and Content
     
  • Attendance and Participation
     
  • Mid-term Exam
     
  • Homework
     
  • Course Project
    -  Proposal
    -  Final report

Instructor Information and Office Hours

  • Hui Hu, PhD
     
  • Office: CTRB 4224
     
  • Email: huihu@ufl.edu
     
  • Phone: 352-294-5944
     
  • Office hours: By appointment

Course Format

  • All lectures and labs will be recorded and released on the canvas every Monday.
     
  • Live recitation sessions:
    -  ~ 1 hour each week
    -  TA will answer your questions and explain solutions to your homework and exam
    -  Participation is mandatory (except for students in the CPE program)

Course Content

  • Course Overview
    - VM setup, R crash course
  • SQL
    - Basic SQL
    - Data models and relational SQL
    - Many-to-many relationships in SQL
  • Access web data
    - Query APIs
  • Spatial data
    - PostGIS
  • NoSQL databases
    - MongoDB
  • Big Data
    - Spark

Homework

  • 6 homework assignments
    -  5% each
    -  the highest 5 grades will count towards the final grade
     
  • Often simple programming exercises
     
  • Requirements:
    -  turn in assignment no later than the due time
    -  late assignment will NOT be accepted
    -  no handwritten assignment
    -  DO NOT copy others' work

Mid-term Exam

  • 35%
     
  • Take-home, 1 week to complete
     
  • Focus on SQL

Course Project

Option A

Option B

Pick one article from a list of publications (will be made available mid-February), and write codes to reproduce its data engineering and descriptive analyses

Come up your own ideas for the course project. It is required to include at least ONE non-traditional data source (e.g. spatial data, web data, etc.) other than the traditional survey data

Course Project (continued)

  • You can work individually or work as a team
     
  • If choose to work as a team:
    -  each team can have up to 2 members
    -  clearly delineate roles and responsibilities of each team member
     
  • Project Due:
    -  Feb 22, 2021: form a project team
    -  March 8, 2021: project proposal
    -  Apr 12, 2021: final presentation
    -  Apr 19, 2021: final project report

Midterm

  • Project proposal:

    -  Cover Page: Include title and list of team members.

    -  Project description: Up to one (1) page.










    -  Literature cited (no page limit); please follow the Vancouver style.
    Proposals must use single column and single spacing; Arial or Times New Roman font; font size no smaller than 11 point; tables and figure labels can be in 10 point; 0.5 inch margins.

o Specific Aims/Objectives: for those choosing option A, please cite the article you’d like to reproduce and briefly summarize the specific aims/objectives of the article. For those choosing option B, please state your aims/objectives.

o Data Source: please provide details about the data and how it can be accessed

o Preliminary Data Pipelines: please briefly describe the data engineering steps involved in this project

o Timeline

Final

  • For those choosing option A, the project report should be structured as an R Markdown Notebook, with all the codes and explanations to the codes.
     
  • For those choosing option B, please structure the report to include:
    -  Title (14 point typeface) and names of each team member
    -  Abstract: no more than 250 words summarizing the project.
    -  Introduction: a short background and objective(s) of the study.
    -  Methods: design, setting, dataset, approaches, and main outcome measurements.
    -  Results: key findings
    -  Discussion: key conclusions with direct reference to the implications of the methods and/or results.
    -  References: please follow the Vancouver style.

Grading

  • Attendance and participation: 5%
     
  • Homework: 25%
     
  • Mid-term exam: 35%
     
  • Project proposal: 5%
     
  • Final project presentation: 10%
     
  • Final project report: 20%

Virtual Machine Setup

Linux Terminal and Git

PHC7065-Spring2021-Lecture1

By Hui Hu

PHC7065-Spring2021-Lecture1

Slides for Lecture 1, Spring 2021, PHC7065 Critical Skills in Data Manipulation for Population Science

  • 926