PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE
Course Overview
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
January 8, 2018
Syllabus
Virtual machine setup
Python crash course
Syllabus
Key Points on Syllabus
- Instructor Information and Office Hours
- Course Content
- Attendance
- Homework
- Course Project
- Proposal
- Final report
Instructor Information and Office Hours
Tell us
- your name
- what program you are in
- what programming languages you are familiar with
- what data engineering skills you want to learn from this course
Course Content
- Course Overview
- VM setup, Python crash course - SQL
- Basic SQL
- Data models and relational SQL
- Many-to-many relationships in SQL - NoSQL
- NoSQL databases
- Access web data - Different Types of Data
- Spatial data
- Text data
- Image and time-series data
- Big data
Attendance
- Attendance is mandatory
- UF policy for excused absences applies (must notify instructor in writing before class when possible)
- Each unexcused absence results in a 1.5% deduction from the final grade
- >3 unexcused absences results in failure
Homework
- 6 homework assignments
- 10% each
- the highest 5 grades will count towards the final grade
- Often simple programming exercises
- Requirements:
- turn in assignment no later than 11:59 pm on the day it is due
- late assignment will NOT be accepted
- no handwritten assignment
- DO NOT copy others' work
Course Project
- Requirements:
- must include at least 1 non-traditional data source (i.e. spatial data, text data, image data, time-series data)
- uses of semi-structured and unstructured data are encouraged
- uses of web data accessed by API or scraping are encouraged
- Some examples (from last year):
- Side Effects and Adverse Reactions to Painkillers: Analysis with FDA Adverse Event Reporting System
- Utilizing nontraditional data sources for near real-time estimation of Zika virus case trends during the 2016 Florida USA Zika outbreak
- Twitter Mining for Cocaine Use
- Medical Marijuana Laws and Change of Number of Tweets towards Marijuana: A Time Series Analysis Using Data from Twitter
Course Project (continued)
- You can work individually or work as a team
- If choose to work as a team:
- each team can have up to 2 members
- clearly delineate roles and responsibilities of each team member
- Project Due:
- Feb 5, 2018: form a project team
- March 12, 2018: midterm presentation and project proposal
- Apr 16, 2018: final presentation
- Apr 23, 2018: final project report
Midterm
- Project proposal:
- Abstract: up to 1 page
- Project description: up to 5 pages
~ Specific Aims/Objectives
~ Background and Significance
~ Approach/Research Design
~ Timeline
- Citations: no page limit, use the Vancouver style
- Single column, single spacing; Arial or Times New Roman font; font size no smaller than 11 point; tables and figure labels can be in 10 points; minimum 0.5 inch margins
- Proposal presentations:
- up to 15 slides
- up to 15 minutes presentation with 5 minutes Q&A
- send the slides to instructor at least 3 days in advance
Final
- Final Report: up to ten pages (including references)
- Abstract: no more than 250 words summarizing the project
- Introduction: a short background and objective(s) of the study
- Methods: design, setting, dataset, approaches, and main outcome measurements
- Results: key findings
- Discussion: key conclusions with direct reference to the implications of the methods and/or results
- References: please follow the Vancouver style
- Final presentations:
- up to 15 slides
- up to 15 minutes presentation with 5 minutes Q&A
- send the slides to instructor at least 3 days in advance
- Note: analyses are required, but you should focus more on the data accessing and engineering part.
Grading
- Attendance: 5%
- Homework: 50%
- Midterm (project proposal and presentation): 15%
- Final (project report and presentation): 30%
Virtual Machine Setup
Python Crash Course
git clone https://github.com/benhhu/PHC7065SPR2018.git
PHC7065-Spring2018-Lecture1
By Hui Hu
PHC7065-Spring2018-Lecture1
Slides for Lecture 1, Spring 2018, PHC7065 Critical Skills in Data Manipulation for Population Science
- 562