PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE
Course Overview
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
January 6, 2020
Syllabus
Virtual machine setup
Linux terminal and Git
Syllabus
Key Points on Syllabus
- Instructor Information and Office Hours
- Course Content
- Attendance and Participation
- Mid-term Exam
- Homework
- Course Project
- Proposal
- Final report
Instructor Information and Office Hours
Course Content
- Course Overview
- VM setup, R crash course - SQL
- Basic SQL
- Data models and relational SQL
- Many-to-many relationships in SQL - Access web data
- Query APIs - Spatial data
- PostGIS - NoSQL databases
- MongoDB - Big Data
- Spark
Attendance
- Attendance is mandatory
- UF policy for excused absences applies (must notify instructor in writing before class when possible)
- Each unexcused absence results in a 1.5% deduction from the final grade
- >3 unexcused absences results in failure
Homework
- 6 homework assignments
- 5% each
- the highest 5 grades will count towards the final grade
- Often simple programming exercises
- Requirements:
- turn in assignment no later than 11:59 pm on the day it is due
- late assignment will NOT be accepted
- no handwritten assignment
- DO NOT copy others' work
Mid-term Exam
- 35%
- Focus on SQL
- More details will be shared with the class in early Feb
Course Project
Option A
Option B
Pick one article from a list of publications (will be made available mid-February), and write codes to reproduce its data engineering and descriptive analyses
Come up your own ideas for the course project. It is required to include at least ONE non-traditional data source (e.g. spatial data, web data, etc.) other than the traditional survey data
Course Project (continued)
- You can work individually or work as a team
- If choose to work as a team:
- each team can have up to 2 members
- clearly delineate roles and responsibilities of each team member
- Project Due:
- Feb 17, 2020: form a project team
- March 9, 2020: project proposal
- Apr 13, 2020: final presentation
- Apr 20, 2020: final project report
Midterm
- Project proposal:
- Cover Page: Include title and list of team members.
- Project description: Up to one (1) page.
- Literature cited (no page limit); please follow the Vancouver style.
- Proposals must use single column and single spacing; Arial or Times New Roman font; font size no smaller than 11 point; tables and figure labels can be in 10 point; 0.5 inch margins.
o Specific Aims/Objectives: for those choosing option A, please cite the article you’d like to reproduce and briefly summarize the specific aims/objectives of the article. For those choosing option B, please state your aims/objectives.
o Data Source: please provide details about the data and how it can be accessed
o Preliminary Data Pipelines: please briefly describe the data engineering steps involved in this project
o Timeline
Final
- For those choosing option A, the project report should be structured as an R Markdown Notebook, with all the codes and explanations to the codes.
- For those choosing option B, please structure the report to include:
- Title (14 point typeface) and names of each team member
- Abstract: no more than 250 words summarizing the project.
- Introduction: a short background and objective(s) of the study.
- Methods: design, setting, dataset, approaches, and main outcome measurements.
- Results: key findings
- Discussion: key conclusions with direct reference to the implications of the methods and/or results.
- References: please follow the Vancouver style.
Grading
- Attendance and participation: 5%
- Homework: 25%
- Mid-term exam: 35%
- Project proposal: 5%
- Final project presentation: 10%
- Final project report: 20%
Virtual Machine Setup
Linux Terminal and Git
PHC7065-Spring2020-Lecture1
By Hui Hu
PHC7065-Spring2020-Lecture1
Slides for Lecture 1, Spring 2020, PHC7065 Critical Skills in Data Manipulation for Population Science
- 870