PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Spatial Data

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

March 19, 2018

Introduction to Spatial Data


Lab: Spatial Data

Introduction to Spatial Data

Introduction to Spatial Data

Data Models

A geographic data model is a structure for organizing geospatial data so that it can be easily stored and retrieved.

Geographic coordinates

Tabular attributes

Spatial Data Models

Vector Model

- points, lines, polygons

Raster Model
- exhaustive regular or irregular partitioning of space

Points

Lines

Shapefiles

.shp - the file that stores the geometry of the feature

.shx - the file that stores the index of the feature geometry

.dbf - the dBASE file that stores the attribute information

.prj  - the file that defines the shapefile's projection

.html, .htm, .xml - the files that usually contains metadata

.sbn and .sbx - store additional indices 

Coordinate Systems and Projections

3D sphere

Geographic Coordinate System

2D flat

Projected Coordiate System

Geographic Coordinate Systems

  • Longitude and latitude
  • Units: Degrees (DMS or DD)

Shape of the Earth

  • Surface: The Earth's real surface
  • Ellipsoid: Ideal, smooth surface
  • Geoid: Bumpy surface, where gravity is equal for all locations

Datum

  • Defines the position of the spheroid relative to the center of the earth.
     
  • Global datum:
    - uses the earth's center of mass as the origin
     
  • Local datum:
    - aligns its spheroid to closely fit the earth's surface in a particular area
    - a point on the surface of the spheroid is matched to a particular position on the surface of the earth
    - the coordinate system origin of a local datum is not at the center of the earth

Datum

Common Local Datum: North American Datum (NAD)

Common Global Datum: World Geodetic System (WGS)

Projected Coordinate Systems

  • A projected coordinate system is defined on a flat, two-dimensional surface
     
  • Unlike a geographic coordinate system, a projected coordinate system has constant lengths, angles, and areas across the two dimensions
     
  • A projected coordinate system is always based on a geographic coordinate system

The systematic rendering of a graticule on a flat map surface

Distortion

Converting a sphere to a flat surface results in distortion

  • Shape (conformal) - If a map preserves shape, then feature outlines (like county boundaries) look the same on the map as they do on the earth.
    - Lambert Conformal Conic
    - UTM
  • Area (equal-area) - If a map preserves area, then the size of a feature on a map is the same relative to its size on the earth.
    - Alerts Equal Area Conic
  • Distance (equidistant) - An equidistant map is one that preserves true scale for all straight lines passing through a single, specified point.  If a line from a to b on a map is the same distance that it is on the earth, then the map line has true scale.  No map has true scale everywhere.  
  • Direction/Azimuth (azimuthal) – An azimuthal projection is one that preserves direction for all straight lines passing through a single, specified point.

Universal Transverse Mercator Coordinate System

  • World divided into 60 six-degree-wide zones
  • From 80S to 84N
  • Zones numbered 1-60 (N&S), W to E, starting at 180W

 

Differences between Projections

SRID

  • Spatial Reference IDentifier
    -  It defines all the parameters of our data’s geographic coordinate system and projection.
    -  An SRID is convenient because it packs all the information about a map projection (which can be quite complex) into a single number.
     
  • http://spatialreference.org/ref/epsg/26918/
     
  • What if you do not know the SRID?
    -  upload the .prj file and get the SRID here: http://prj2epsg.org/search

Lab: Spatial Data

Data

nyc_census_blocks

blkid    A 15-digit code that uniquely identifies every census block. Eg: 360050001009000
popn_total    Total number of people in the census block
popn_white    Number of people self-identifying as “White” in the block
popn_black    Number of people self-identifying as “Black” in the block
popn_nativ    Number of people self-identifying as “Native American” in the block
popn_asian    Number of people self-identifying as “Asian” in the block
popn_other    Number of people self-identifying with other categories in the block
boroname    Name of the New York borough. Manhattan, The Bronx, Brooklyn, Staten Island, Queens
geom    Polygon boundary of the block

Number of records: 36592

Data

nyc_neighborhoods

name    Name of the neighborhood
boroname    Name of the New York borough. Manhattan, The Bronx, Brooklyn, Staten Island, Queens
geom    Polygon boundary of the neighborhood

Number of records: 129
 

New York has a rich history of neighborhood names and extent

Data

nyc_streets

name    Name of the street
oneway    Is the street one-way? “yes” = yes, “” = no
type    Road type (primary, secondary, residential, motorway)
geom    Linear centerline of the street

Number of records: 19091

Data

nyc_subway_stations

name    Name of the station
borough    Name of the New York borough. Manhattan, The Bronx, Brooklyn, Staten Island, Queens
routes    Subway lines that run through this station
transfers    Lines you can transfer to via this station
express    Stations where express trains stop, “express” = yes, “” = no
geom    Point location of the station

Number of records: 491

git pull

PHC7065-Spring2018-Lecture8

By Hui Hu

PHC7065-Spring2018-Lecture8

Slides for Lecture 8, Spring 2018, PHC7065 Critical Skills in Data Manipulation for Population Science

  • 610