PHC6194 SPATIAL EPIDEMIOLOGY
Spatial Data Engineering and Linkage
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
January 29, 2020
Spatial Data Engineering and Linkage
Lab: PostGIS Part 2
Spatial Data Engineering and Linkage
Spatial Data Engineering
 Geometry and geography functions
 Geometry relationships
 Proximity analysis
 Geometry and geography processing
Geometry and Geography Functions
 Output functions
 output data in various standard formats
 Constructor functions
 create PostGIS objects from wellknown formats
 Accessor and setter functions
 work against a single spatial object and return or set attributes of the object
 Measurement functions
 return scalar measurements of a spatial object
 Decomposition functions
 extract other spatial objects from an input object
Common Formats and Output Functions
 Wellknown binary (WKB) and wellknown text (WKT)
 the most common formats for spatial objects
 WKT: ST_AsText and ST_AsEWKT
 WKB: ST_AsBinary and ST_AsEWKB
 Keyhole Markup Language (KML)
 an XMLbased format, used by Google
 SRS is always SRID 4326
 ST_AsKML

Geometry JaveScript Object Notation (GeoJSON)
 a format based on JSON
 ST_AsGeoJSON
 Geography Markup Language (GML)
 an XMLbased format used in Web Feature Service
 ST_AsGML
 Scalable Vector Graphics (SVG)
 popular among highend redering or drawing tools
 ST_AsSVG
 Extensible 3D Graphics (X3D)
 ST_AsX3D
Shapefiles
.shp  the file that stores the geometry of the feature
.shx  the file that stores the index of the feature geometry
.dbf  the dBASE file that stores the attribute information
.prj  the file that defines the shapefile's projection
.html, .htm, .xml  the files that usually contains metadata
.sbn and .sbx  store additional indices
Constructor Functions
 Create geometries from text and binary formats
 ST_GeomFromText, ST_GeomFromWKB, ST_GeomFromEWKB, ST_GeomFromGML, ST_GeomFromGeoJSON, ST_GeomFromKML
 Create geographies from text and binary formats
 ST_GeogFromText, ST_GeogFromWKB, ST_GeogFromKML, ST_GeogFromGML, ST_GeogFromGeoJSON
Accessor and Setter Functions
 Functions that can access or set the characteristics of an object
 A few characteristics of spatial objects:
 spatial reference identifiers: SRID
 subtype: the finer categorization of geometry and geography types, such as points, polygons, etc.
 coordinate dimension: 2, 3, or 4
 geometric dimension: minimal dimension of the vector space necessary to fully contain the geometry  0 (points), 1(linestrings), or 2(polygons)
Accessor and Setter Functions
SRID and Transformation for Geometry

ST_SRID and ST_SetSRID
 retrieves and sets the SRID

ST_Transform
 transform geometry to different spatial references
 e.g. take a geometry in lon/lat and transform it to a planar SRS so that you can take meaningful measurements

Differences between ST_SetSRID and ST_Transform
 ST_SetSRID doesn't change the coordinates of a geometry; it only sets an attribute called SRID. You should ONLYuse it when you realize that you made a mistake (or forgot to specify the SRID) during data import
Accessor and Setter Functions
Using transformation with the geography type
 The geography type does not have ST_Transform, ST_SetSRID, or ST_SRID functions
 because it always uses WGS84 lon/lat
 However, the ST_Transform function is crucial when working with geography type
 e.g. if you want to use geometry functions that are not available for geography, then you can cast objects to geometry, use the geometry function, and then cast back to geography
Accessor and Setter Functions
Geometry Type Function
 When importing data with heterogeneous geometry columns, you may not be aware of the geometry types.
 GeometryType and ST_GeometryType
Accessor and Setter Functions
Geometry and Coordinate Dimensions
 ST_CoordDim
 coordinate dimension
 the dimension of the space that the geometry lives in
 ST_Dimension
 geometry dimension
 the smallest dimensional space that will fully contain the geometry
Measurement Functions
 Planar measurements
 treats the earth as essentially flat
 generally in units of meters or feet
 better supported by GIS tools and are faster to process
 Geodetic measurements
 when measurements are cross continents and oceans
 consider the spherical nature of the earth
Measurement Functions
Geometry Planar Measurements
 All the planar measurement functions are in the same units as the SRS that's defined for the geometry
 Common functions:
 ST_Length and ST_3DLength
 ST_Area and ST_3DArea
 ST_Perimeter and ST_3DPerimeter: calculate the length of all the rings for multiringed polygons
Measurement Functions
Geodetic Measurements
 If you use functions on geography type objects, it will generate geodetic measurements which consider the spherical nature of the earth
 For geometry type objects, you can use the spherical family of functions in geometry to take advantage of spheroidal computation
 e.g. ST_LengthSpheroid
Bounding Box
 Bounding boxes
 the smallest rectangular box with edges parallel to the axes of the coordinate plane that completely encloses the object
 boxbased comparisons embedded in PostGIS makes relationship queries really fast
 Example:
 check whether the state of Washington is northwest of Florida
Decomposition Functions
Bounding Box of Geometries
 When comparing the relative spatial relationships of two or more geometries, the question can be sufficiently answered much more quickly by comparing the bounding boxes of the geometries
 you only need to work with rectangles and can ignore the details of the geometries within
 The bounding box of a 2D geometry is a box2D object (we also have box3D object for 3D geometry)
 All geometries have boxes, even points
 boxes are not geometries, but you can cast boxes into geometries
Decomposition Functions
Boundaries and Converting Polygons to Linestrings
 ST_Boundary
 returns the geometry that determines the separation between the points in the geometry and the rest of the coordinate space
 a common use is to break apart polygons and multipolygons into their constituents rings
 ST_ExteriorRing and ST_InteriorRingN
Decomposition Functions
Centroid and Point on Surface

ST_Centroid
 the center of gravity (assume every point in the geometry had equal mass)
 the centroid may not lie within the geometry itself

ST_PointOnSurface
 always returns an arbitrary point on the boundary geometry
Decomposition Functions
Returning points defining a geometry
 ST_PointN
 only works with linestrings and circularstrings
 returns the nth point on the linestring, with indexing starting at 1
 ST_DumpPoints
 extract all or many points of a geometry
 returns a set of geometry_dump objects which have two components: a onedimensional path array (lists the sequence in which the points were dumped) and a geometry (always a point in this case)
Decomposition Functions
Decomposing Multigeometries and Geometry Collections
 ST_Dump
 recursively dumps all contained geometries
 returns a set of geometry_dump objects
 ST_GeometryN
 drills down only a single level
 extracts the nth geometry from a multigeometry or collection geometry
 returns a single extracted geometry, doesn't recurse, and therefore doesn't report depth
Lab: PostGIS Part 2
git pull
Moving beyond Single Geometries
Geometry Relationships
 Bounding boxes
 Intersections
 Relationships
 The meaning of equality
Geometry Comparators for Bounding Boxes
Intersections
 Interior, exterior, and boundary of a geometry
 Interior: the space inside a geometry and not on the boundary
 Exterior: the space outside a geometry and not on the boundary
 Boundary: the space that's neither interior nor exterior
 Intersections
 two geometries intersect when they have interior or boundary points in common
 the set of all shared points is called intersection
 ST_Intersects: returns true or false
 ST_Intersection: returns the geometry of the intersected region
Relating Two Geometries
 Contains
 when geometry A contains geometry B, no points of B lie in the exterior of A, and at least one point of B must lie in the interior of A
 if B lies only on the boundary of A, A does NOT contain B
 ST_Contains
 Within
 contains and within are inverse relationships
 if A is within geometry B, then B contains A
 ST_Within
Relating Two Geometries (cont'd)
 Covers
 contain with boundary
 when geometry A covers geometry B, no points of B lie in the exterior of A, and at least one point of B must lie in the interior or boundary of A
 if B lies only on the boundary of A, A DOES cover B
 ST_Covers
 Covered by
 ST_Coveredby
Relating Two Geometries (cont'd)
 Overlapping geometries
 two geometries overlap when they have the same geometry dimension, they intersect, and one is not completely contained in the other
 ST_Overlaps
 Touching geometries
 two geometries touch if they have at least one point in common and none of the common points lie in the interior of both geometries
 ST_Touches
 Crossing geometries
 two geometries cross each other if they have some interior points in common but not all
 ST_Crosses
 Disjoint geometries
 the antithesis of the intersects relationship
 two geometries disjointed if they have no shared interiors or boundaries
 ST_Disjoint
 ST_Disjoint cannot use an index, therefore usually slower than ST_Intersects
Equality

Boundingbox equality
 the bounding boxes of the two geometries share the same space
 this is what is tested when you use the = operator
 this also applies to deduping operations such as UNION, DISTINCT, and GROUP BY

Spatial equality
 two geometries occupy the same space
 e.g. a linestring that starts at point A and runs to point B spatially equals to a linestring that starts at point B and runs to point A
 ST_Equals

Geometric equality
 stronger than spatial equality and means that two geometries occupy the same space and have the same underlying representation
 important for routing
 ST_OrderingEquals
Proximity Analysis
 How far something is located from something else:
 how far is my house from the nearest expressway?
 how many burger joints are within a mile drive?
 what's the average distance that people have to commute to work?
 Nearest neighbor searches
 KNN distance operators
 Using KNN with geography
 Geotagging
Nearest Neighbor Searches
 Which places are within X distance?
 ST_DWithin
 can be used on both geometry and geography types
 What are the N closest places?
 use ST_DWithin with ST_Distance (through ORDER BY)
 Find the closest locations
 use ST_DWithin and DISTINCT ON to find closest locations
 DISTINCT ON performs an implicit GROUP BY, but it's not limited to returning just the fields that you grouped on
 DISTINCT ON (expression) only keeps the first row of each set of rows where the given expression evaluate to equal
Nearest Neighbor Searches (cont'd)
 Intersects with tolerance
 use ST_DWithin to check for intersections when you have two geometries that fail to intersect because of differences caused by the number of significant digits
 e.g. LINESTRING(1 2, 3 4) and Point(3.00001, 4.00001)
 this is used very often when working with real data where not everything lines up perfectly
KNN Distance Operators
 Finding N closest places using KNN distance boundingbox operators
 good enough for geometries that tend to fill up their bounding boxes or that are very small
 <#>
 this is the KNN boundingbox distance operator
 A <#> B returns the minimum distance between the bounding boxes of A and B
 <>
 the KNN boundingbox centroid distance operator
 A<>B returns the distance between the centroids of the bounding boxes of A and B
 These can only be used with geometry type
 Much faster than ST_Distance
Use KNN with Geography Types
 KNN distance operators cannot be used with geography types directly
 Steps:
 create a functional geometry index to the table
 temporarily convert the geography to geometry
 use the KNN operators
 finally convert the results back to geography
Geotagging
 Situate points located within the context of another geometry
 Region tagging:
 tag a geometry, such as a point of interest, with the name of a region it's in, such as a state
 Linear referencing
 refer to a point of interest by its closest point along a linestring (the tag can be the closest point on the linestring, or a measure such as a mile marker or fractional percent measured from the start of the linestring to the point on the linestring closest to your point of interest)
 steps: 1) use ST_DWithin to narrow choices, 2) for every pairing of point and linestring, use ST_ClosestPoint to pinpoint the closest point on the linestring, and 3) use DISTINCT ON and ST_Distance to keep only the paired point and linestring that are closest
Geometry and Geography Processing
 Aggregation
 rolling up several rows of data into one
 COUNT, SUM, MIN, MAX, AVG
 Spatial aggregation
 ST_MakeLine
 ST_Union: the most commonly used one
 ST_Collect
 ST_Polygonize
 No spatial aggregates for the geography type, therefore, need to cast geography to geometry
Geometry and Geography Processing (cont'd)
 Clipping
 remove unwanted sections of a geometry
 ST_Difference(A,B): returns the portion of A that's not shared with B
 ST_SymDifference(A,B): returns the portion of A and B that's not shared
Geometry and Geography Processing (cont'd)
 Splitting
 use a linestring to slice a polygon
 ST_Split: can only be used with single geometries, not collections, and the blade you use to cut has to be one dimension lower than what you are cutting up
Lab: PostGIS Part 2
(continued)
PHC6194Spring2020Lecture4
By Hui Hu
PHC6194Spring2020Lecture4
Slides for Lecture 4, Spring 2020, PHC6194 Spatial Epidemiology
 276
Loading comments...