PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Spatial Data

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

March 30, 2020

Introduction to Spatial Data


Lab: Spatial Data

Introduction to Spatial Data

Introduction to Spatial Data

Data Models

A geographic data model is a structure for organizing geospatial data so that it can be easily stored and retrieved.

Geographic coordinates

Tabular attributes

Spatial Data Models

Vector Model

- points, lines, polygons

Raster Model
- exhaustive regular or irregular partitioning of space

Points

Lines

Common Formats

  • Well-known binary (WKB) and well-known text (WKT)
    -  the most common formats for spatial objects
     
  • Keyhole Markup Language (KML)
    -  an XML-based format, used by Google
    -  SRS is always SRID 4326
     
  • Geography Markup Language (GML)
    -  an XML-based format used in Web Feature Service
     
  • Geometry JaveScript Object Notation (GeoJSON)
    -  a format based on JSON
     
  • Scalable Vector Graphics (SVG)
    -  popular among high-end rendering or drawing tools
     
  • Extensible 3D Graphics (X3D)

Shapefiles

.shp - the file that stores the geometry of the feature

.shx - the file that stores the index of the feature geometry

.dbf - the dBASE file that stores the attribute information

.prj  - the file that defines the shapefile's projection

.html, .htm, .xml - the files that usually contains metadata

.sbn and .sbx - store additional indices 

Coordinate Systems and Projections

3D sphere

Geographic Coordinate System

2D flat

Projected Coordiate System

Geographic Coordinate Systems

  • Longitude and latitude
  • Units: Degrees (DMS or DD)

Shape of the Earth

  • Surface: The Earth's real surface
  • Ellipsoid: Ideal, smooth surface
  • Geoid: Bumpy surface, where gravity is equal for all locations

Shape of the earth (cont'd)

  • Gauss determined in the early 19th century that the surface of the earth can be defined using gravitational measurements
    -  geoid: where gravity is equal for all locations
     
  • Geoid is far from spherical
    -  the core of the earth is not homogenous
    -  mass is distributed unevenly

     
  • Geoid is the foundation of both planar and
    geodetic models

Ellipsoid

  • Simplifications of the geoid which are generally good enough for most geographic modeling needs
     
  • An ellipsoid is merely a 3D ellipse
     
  • Instead of one ellipsoid to rule us all, people on different continents wanted their own ellipsoids to better reflect the regional curvature of the earth
     
  • Today, the world is settling on the World Geodetic System (WGS 84) and Geodetic Reference System (GRS 80) ellipsoids
    - WGS 84 is the standard of choice, and is what all GPS systems are based on

Common ellipsoids and their ellipsoidal parameters

  • Lon/lat with different ellipsoid are not the same
    -  they use different grounding points
    -  it's important to not just call things lon/lat: you can have NAD27 lon/lat, NAD80 lon/lat, etc. Each will be subtly different

Datum

  • Ellipsoid only models the overall shape of the earth
    -  after picking out an ellipsoid, you need to anchor it to use it for real-world navigation
    -  even if two reference systems use the same ellipsoid, they can still have different anchors, or datum, on earth
  • Defines the position of the spheroid relative to the center of the earth.
     
  • Global datum:
    - uses the earth's center of mass as the origin
     
  • Local datum:
    - aligns its spheroid to closely fit the earth's surface in a particular area
    - a point on the surface of the spheroid is matched to a particular position on the surface of the earth
    - the coordinate system origin of a local datum is not at the center of the earth

Coordinate Reference System

  • A coordinate reference system is only one necessary ingredient that goes into the making of an SRS and isn't SRS itself
    -  used to identify a point on your reference ellipsoid
     
  • Most popular coordinate reference system for use is the geographical coordinate system
    -  also known as geodetic coordinate system or simply lon/lat
  • Longitude and latitude
  • Units: Degrees (DMS or DD)

Projection

Taking an ellipsoidal earth and squashing it onto a flat surface

  • Projection has distortion built in
    - because geodetic and 3D globes are ellipsoidal, they by definition do not refer to a flat surface
     
  • Why do we need to have 2D projections?
    -  the mathematical and visual simplicity that comes with planar (Euclidean) geometry

Distortion

  • How exactly you squash an ellipsoidal earth on a flat surface depends on what you are trying to optimize for
     
  • In creating a projection, we try to balance four conflicting features:
    -  measurement
    -  shape: how accurately does it represent angles
    -  direction: is north really north
    -  range of area supported
     
  • E.g. if you want to span a large area, you have to either give up measurement accuracy or deal with the pain of maintaining multiple SRSs and some mechanisms to shift among them

Projection Types

Cylindrical projections 

Conic projections

Azimuthal projections

Orientation of the paper roll around the globe

Main classes of planar coordinate systems

  • Lambert Azimuthal Equal Area (LAEA)
    -  good for measurement and can cover large areas, but not great for shape
    -  US National Atlas (EPSG:2163)
     
  • Lambert Conformal Conic (LCC)
    -  preserve shape more than area, good for measurement for the regions they serve, and distort poles
    -  best used for middle latitudes with east-west orientation
     
  • Universal Trans Mercator (UTM)
    -  good for measurement, shape, and direction, but only span six-degree longitudinal strips, cannot be used for the polar regions
     
  • Mercator
    -  good for preserve shape and direction, and spanning the globe, but not good for measurement
    -  common favorites for web map display since we only need to maintain one SRID
     
  • National grid systems
    -  variant of UTM or LAEA, but are used to define a restricted region, such as a country
     
  • State plane
    -  US spatial reference systems, usually designed for a specific state
    -  most are derived from UTM

Universal Transverse Mercator Coordinate System

  • World divided into 60 six-degree-wide zones
  • From 80S to 84N
  • Zones numbered 1-60 (N&S), W to E, starting at 180W

 

Differences between projections

Spatial Reference System

  • SRS is the production of geodetics and cartography
    -  geodetics: the science of measuring and modeling the earth
    -  cartography: the science of representing the earth on flat maps
     
  • Why do we need SRS?
    -  to bring in data from multiple sources and be able to overlay one atop another
     
  • Many standards of SRS:
    -  most common one is the European Petroleum Survey Group (EPSG) numbering system
    -  take any two sources of data with the same EPSG number, and they will overlay perfectly

SRID

  • Spatial Reference IDentifier
    -  It defines all the parameters of our data’s geographic coordinate system and projection.
    -  An SRID is convenient because it packs all the information about a map projection (which can be quite complex) into a single number.
     
  • http://spatialreference.org/ref/epsg/4326/
     
  • EPSG is a very recent SRS numbering system
    -  If you are using data from a few decades ago, you won't find EPSG number
     

  • The constituent pieces that form an SRS:
    -  ellipsoid
    -  datum
    -  projection

What spatial reference system is appropriate?

  • Excellent: covers the globe
  • Good: covers a large country like the US; the measurements for the area served are usually within a meter for length, area, and distance calculations
  • Medium: covers several degrees or a large state; measurements are accurate within meters, but can be as much as 10 meters off
  • Bad: measurements don't have useful units

Lab: Spatial Data

git pull

PHC7065-Spring2020-Lecture7

By Hui Hu

PHC7065-Spring2020-Lecture7

Slides for Lecture 7, Spring 2020, PHC7065 Critical Skills in Data Manipulation for Population Science

  • 778