PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Course Overview
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
January 6, 2020
Syllabus
Virtual machine setup
 
Linux terminal and Git
Syllabus
Key Points on Syllabus
- Instructor Information and Office Hours
 
- Course Content
 
- Attendance and Participation
 
- Mid-term Exam
 
- Homework
 
- Course Project
 - Proposal
 - Final report
Instructor Information and Office Hours
Course Content
- Course Overview
 - VM setup, R crash course
- SQL
 - Basic SQL
 - Data models and relational SQL
 - Many-to-many relationships in SQL
- Access web data
 - Query APIs
- Spatial data
 - PostGIS
- NoSQL databases
 - MongoDB
- Big Data
 - Spark
Attendance
- Attendance is mandatory
 
- UF policy for excused absences applies (must notify instructor in writing before class when possible)
 
- Each unexcused absence results in a 1.5% deduction from the final grade
 
- >3 unexcused absences results in failure
Homework
- 6 homework assignments
 - 5% each
 - the highest 5 grades will count towards the final grade
 
- Often simple programming exercises
 
- Requirements:
 - turn in assignment no later than 11:59 pm on the day it is due
 - late assignment will NOT be accepted
 - no handwritten assignment
 - DO NOT copy others' work
Mid-term Exam
- 35%
 
- Focus on SQL
 
- More details will be shared with the class in early Feb
Course Project
Option A
Option B
Pick one article from a list of publications (will be made available mid-February), and write codes to reproduce its data engineering and descriptive analyses
Come up your own ideas for the course project. It is required to include at least ONE non-traditional data source (e.g. spatial data, web data, etc.) other than the traditional survey data
Course Project (continued)
- You can work individually or work as a team
 
- If choose to work as a team:
 - each team can have up to 2 members
 - clearly delineate roles and responsibilities of each team member
 
- Project Due:
 - Feb 17, 2020: form a project team
 - March 9, 2020: project proposal
 - Apr 13, 2020: final presentation
 - Apr 20, 2020: final project report
Midterm
- Project proposal:
	- Cover Page: Include title and list of team members. - Project description: Up to one (1) page. 
 
 
 
 
 
 
 
 
 
 
 - Literature cited (no page limit); please follow the Vancouver style.
 - Proposals must use single column and single spacing; Arial or Times New Roman font; font size no smaller than 11 point; tables and figure labels can be in 10 point; 0.5 inch margins.
o Specific Aims/Objectives: for those choosing option A, please cite the article you’d like to reproduce and briefly summarize the specific aims/objectives of the article. For those choosing option B, please state your aims/objectives.
o Data Source: please provide details about the data and how it can be accessed
o Preliminary Data Pipelines: please briefly describe the data engineering steps involved in this project
o Timeline
Final
- For those choosing option A, the project report should be structured as an R Markdown Notebook, with all the codes and explanations to the codes.
 
- For those choosing option B, please structure the report to include:
 - Title (14 point typeface) and names of each team member
 - Abstract: no more than 250 words summarizing the project.
 - Introduction: a short background and objective(s) of the study.
 - Methods: design, setting, dataset, approaches, and main outcome measurements.
 - Results: key findings
 - Discussion: key conclusions with direct reference to the implications of the methods and/or results.
 - References: please follow the Vancouver style.
Grading
- Attendance and participation: 5%
 
- Homework: 25%
 
- Mid-term exam: 35%
 
- Project proposal: 5%
 
- Final project presentation: 10%
 
- Final project report: 20%

Virtual Machine Setup
Linux Terminal and Git
PHC7065-Spring2020-Lecture1
By Hui Hu
PHC7065-Spring2020-Lecture1
Slides for Lecture 1, Spring 2020, PHC7065 Critical Skills in Data Manipulation for Population Science
- 998
 
   
   
  