**PHC6194 SPATIAL EPIDEMIOLOGY**

**Spatial Data Engineering and Linkage**

Hui Hu Ph.D.

*Department of Epidemiology*

*College of Public Health and Health Professions & College of Medicine*

January 30, 2019

###
**Spatial Data Engineering and Linkage**

**Lab: PostGIS Part 2**

**Spatial Data Engineering and Linkage**

**Spatial Data Engineering**

- Geometry and geography functions

- Geometry relationships

- Proximity analysis

- Geometry and geography processing

**Geometry and Geography Functions**

- Output functions

- output data in various standard formats

- Constructor functions

- create PostGIS objects from well-known formats

- Accessor and setter functions

- work against a single spatial object and return or set attributes of the object

- Measurement functions

- return scalar measurements of a spatial object

- Decomposition functions

- extract other spatial objects from an input object

**Common Formats and Output Functions**

- Well-known binary (WKB) and well-known text (WKT)

- the most common formats for spatial objects

- WKT: ST_AsText and ST_AsEWKT

- WKB: ST_AsBinary and ST_AsEWKB

- Keyhole Markup Language (KML)

- an XML-based format, used by Google

- SRS is always SRID 4326

- ST_AsKML

- Geography Markup Language (GML)

- an XML-based format used in Web Feature Service

- ST_AsGML

- Geometry JaveScript Object Notation (GeoJSON)

- a format based on JSON

- ST_AsGeoJSON

- Scalable Vector Graphics (SVG)

- popular among high-end redering or drawing tools

- ST_AsSVG

- Extensible 3D Graphics (X3D)

- ST_AsX3D

**Shapefiles**

**.shp** - the file that stores the geometry of the feature

**.shx **- the file that stores the index of the feature geometry

**.dbf **- the dBASE file that stores the attribute information

**.prj ** - the file that defines the shapefile's projection

**.html, .htm, .xml **- the files that usually contains metadata

**.sbn **and **.sbx** - store additional indices

**Constructor Functions**

- Two common ways:

- Build new spatial objects from scratch using raw data in various formats

- Utilize existing spatial objects and decompose, splice, slice, dice, or morph them to form new ones

- Create geometries from text and binary formats

- ST_GeomFromText, ST_GeomFromWKB, ST_GeomFromEWKB, ST_GeomFromGML, ST_GeomFromGeoJSON, ST_GeomFromKML

- Create geographies from text and binary formats

- ST_GeogFromText, ST_GeogFromWKB, ST_GeogFromKML, ST_GeogFromGML, ST_GeogFromGeoJSON

**Accessor and Setter Functions**

- Any function that accesses or sets the intrinsic properties of an object

- A few defining characteristics of spatial objects:

- spatial reference identifiers: SRID

- subtype: the finer categorization of geometry and geography types, such as points, polygons, etc.

- coordinate dimension: the dimension of the vector space in which your geometry lives, which can be 2, 3, or 4

- geometric dimension: minimal dimension of the vector space necessary to fully contain the geometry, which can be 0 (points), 1(linestrings), or 2(polygons)

**Accessor and Setter Functions**

SRID and Transformation for Geometry

SRID and Transformation for Geometry

- ST_SRID and ST_SetSRID

- retrieves and sets the SRID

- ST_Transform

- transform geometry to different spatial references

- e.g. take a geometry in lon/lat and transform it to a planar SRS so that you can take meaningful measurements

- Differences between ST_SetSRID and ST_Transform

- ST_SetSRID doesn't change the coordinates of a geometry; it only sets an attribute called SRID, which comes in useful when you realize that you made a mistake during data import

**Accessor and Setter Functions**

Using transformation with the geography type

Using transformation with the geography type

- The geography type does not have ST_Transform, ST_SetSRID, or ST_SRID functions

- because it always uses WGS84 lon/lat

- However, the ST_Transform function is crucial when working with geography type

- e.g. if you want to use geometry functions that are not available for geography, then you can cast objects to geometry, use the geometry function, and then cast back to geography

**Accessor and Setter Functions**

Geometry Type Function

Geometry Type Function

- When importing data with heterogeneous geometry columns, you may not be aware of the geometry types.

- GeometryType and ST_GeometryType

**Accessor and Setter Functions**

Geometry and Coordinate Dimensions

Geometry and Coordinate Dimensions

- ST_CoordDim

- coordinate dimension

- the dimension of the space that the geometry lives in

- ST_Dimension

- geometry dimension

- the smallest dimensional space that will fully contain the geometry

**Measurement Functions**

- Planar measurements

- treats the earth as essentially flat

- generally in units of meters or feet

- better supported by GIS tools and are faster to process

- Geodetic measurements

- once measures start to cross continents and oceans

- consider the spherical nature of the earth

**Measurement Functions**

Geometry Planar Measurements

Geometry Planar Measurements

- All the planar measurement functions are in the same units as the SRS that's defined for the geometry

- Common functions:

- ST_Length and ST_3DLength

- ST_Area and ST_3DArea

- ST_Perimeter and ST_3DPerimeter: calculate the length of all the rings for multi-ringed polygons

**Measurement Functions**

Geodetic Measurements

Geodetic Measurements

- If you use functions on geography type objects, it will generate geodetic measurements which consider the spherical nature of the earth

- If you have geometry type objects, you can use the spherical family of functions in geometry to take advantage of spheroidal computation

- e.g. ST_LengthSpheroid

**Decomposition Functions**

Bounding Box of Geometries

Bounding Box of Geometries

- Often when comparing the relative spatial relationships of two or more geometries, the question can be sufficiently answered much more quickly by comparing the bounding boxes of the geometries

- you only need to work with rectangles and can ignore the details of the geometries within

- The bounding box of a 2D geometry is a box2D object (we also have box3D object for 3D geometry)

- All geometries have boxes, even points

- boxes are not geometries, but you can cast boxes into geometries

**Decomposition Functions**

Boundaries and Converting Polygons to Linestrings

Boundaries and Converting Polygons to Linestrings

- ST_Boundary

- returns the geometry that determines the separation between the points in the geometry and the rest of the coordinate space

- a common use is to break apart polygons and multipolygons into their constituents rings

- ST_ExteriorRing and ST_InteriorRingN

**Decomposition Functions**

Centroid and Point on Surface

Centroid and Point on Surface

- ST_Centroid

- you can think of the cenroid of a geometry as the center of gravity, as if every point in the geometry had equal mass

- the centroid may not lie within the geometry itself

- ST_PointOnSurface

- always returns an arbitrary point on the boundary geometry

**Decomposition Functions**

Returning points defining a geometry

Returning points defining a geometry

- ST_PointN

- only works with linestrings and circularstrings

- returns the nth point on the linestring, with indexing starting at 1

- ST_DumpPoints

- if you want to extract all or many points of a geometry

- returns a set of geometry_dump objects which have two components: a one-dimensional path array (lists the sequence in which the points were dumped) and a geometry (always a point in this case)

**Decomposition Functions**

Decomposing Multi-geometries and Geometry Collections

Decomposing Multi-geometries and Geometry Collections

- ST_Dump

- recursively dumps all contained geometries

- returns a set of geometry_dump objects

- ST_GeometryN

- drills down only a single level

- extracts the nth geometry from a multi-geometry or collection geometry

- returns a single extracted geometry, doesn't recurse, and therefore doesn't report depth

**Lab: PostGIS Part 2**

**Moving beyond Single Geometries**

Geometry Relationships

Geometry Relationships

- Bounding boxes

- Intersections

- Relationships

- The meaning of equality

**Bounding Box**

- Bounding boxes

- the smallest rectangular box with edges parallel to the axes of the coordinate plane that completely encloses the object

- box-based comparisons embedded in PostGIS makes relationship queries really fast

- Example:

- check whether the state of Washington is northwest of Florida

**Geometry Comparators**

**Intersections**

- Interior, exterior, and boundary of a geometry

- Interior: the space inside a geometry and not on the boundary

- Exterior: the space outside a geometry and not on the boundary

- Boundary: the space that's neither interior nor exterior

- Intersections

- two geometries intersect when they have interior or boundary points in common

- the set of all shared points is called intersection

- ST_Intersects: returns true or false

- ST_Intersection: returns the geometry of the intersected region

**Relating Two Geometries**

- Contains

- when geometry A contains geometry B, no points of B lie in the exterior of A, and at least one point of B must lie in the interior of A

- if B lies only on the boundary of A, A does NOT contain B

- ST_Contains

- Within

- contains and within are inverse relationships

- if A is within geometry B, then B contains A

- ST_Within

**Relating Two Geometries (cont'd)**

- Covers

- contain with boundary

- when geometry A covers geometry B, no points of B lie in the exterior of A, and at least one point of B must lie in the interior or boundary of A

- if B lies only on the boundary of A, A DOES cover B

- ST_Covers

- Covered by

- ST_Coveredby

**Relating Two Geometries (cont'd)**

- Overlapping geometries

- two geometries overlap when they have the same geometry dimension, they intersect, and one is not completely contained in the other

- ST_Overlaps

- Touching geometries

- two geometries touch if they have at least one point in common and none of the common points lie in the interior of both geometries

- ST_Touches

- Crossing geometries

- two geometries cross each other if they have some interior points in common but not all

- ST_Crosses

- Disjoint geometries

- the antithesis of the intersects relationship

- two geometries disjointed if they have no shared interiors or boundaries

- ST_Disjoint

- ST_Disjoint cannot use an index, therefore usually slower than ST_Intersects

**Equality**

- Bounding-box equality

- the bounding boxes of the two geometries share the same space

- this is what is tested when you use the = operator

- this also applies to deduping operations such as UNION, DISTINCT, and GROUP BY

- Spatial equality

- two geometries occupy the same space

- e.g. a linestring that starts at point A and runs to point B spatially equals to a linestring that starts at point B and runs to point A

- ST_Equals

- Geometric equality

- stronger than spatial equality and means that two geometries occupy the same space and have the same underlying representation

- important for routing

- ST_OrderingEquals

**Proximity Analysis**

- How far something is located from something else:

- how far is my house from the nearest expressway?

- how many burger joints are within a mile drive?

- what's the average distance that people have to commute to work?

- Nearest neighbor searches

- KNN distance operators

- Using KNN with geography

- Geotagging

**Nearest Neighbor Searches**

- Which places are within X distance?

- ST_DWithin

- can be used on both geometry and geography types

- What are the N closest places?

- use ST_DWithin with ST_Distance (through ORDER BY)

- Find the closest locations

- use ST_DWithin and DISTINCT ON to find closest locations

- DISTINCT ON performs an implicit GROUP BY, but it's not limited to returning just the fields that you grouped on

- DISTINCT ON (expression) only keeps the first row of each set of rows where the given expression evaluate to equal

**Nearest Neighbor Searches (cont'd)**

- Intersects with tolerance

- use ST_DWithin to check for intersections when you have two geometries that fail to intersect because of differences caused by the number of significant digits

- e.g. LINESTRING(1 2, 3 4) and Point(3.00001, 4.00001)

- this is used very often when working with real data where not everything lines up perfectly

**KNN Distance Operators**

- Finding N closest places using KNN distance bounding-box operators

- good enough for geometries that tend to fill up their bounding boxes or that are very small

- <#>

- this is the KNN bounding-box distance operator

- A <#> B returns the minimum distance between the bounding boxes of A and B

- <->

- the KNN bounding-box centroid distance operator

- A<->B returns the distance between the centroids of the bounding boxes of A and B

- These can only be used with geometry type

- Much faster than ST_Distance

**Use KNN with Geography Types**

- KNN distance operators cannot be used with geography types directly

- Steps:

- create a functional geometry index to the table

- temporarily convert the geography to geometry

- use the KNN operators

- finally convert the results back to geography

**Geotagging**

- Situate points located within the context of another geometry

- Region tagging:

- tag a geometry, such as a point of interest, with the name of a region it's in, such as a state

- Linear referencing

- refer to a point of interest by its closest point along a linestring (the tag can be the closest point on the linestring, or a measure such as a mile marker or fractional percent measured from the start of the linestring to the point on the linestring closest to your point of interest)

- steps: 1) use ST_DWithin to narrow choices, 2) for every pairing of point and linestring, use ST_ClosestPoint to pinpoint the closest point on the linestring, and 3) use DISTINCT ON and ST_Distance to keep only the paired point and linestring that are closest

**Geometry and Geography Processing**

- Aggregation

- rolling up several rows of data into one

- COUNT, SUM, MIN, MAX, AVG

- Spatial aggregation

- ST_MakeLine

- ST_Union: the most commonly used one

- ST_Collect

- ST_Polygonize

- No spatial aggregates for the geography type, therefore, need to cast geography to geometry

**Geometry and Geography Processing (cont'd)**

- Clipping

- remove unwanted sections of a geometry

- ST_Difference(A,B): returns the portion of A that's not shared with B

- ST_SymDifference(A,B): returns the portion of A and B that's not shared

**Geometry and Geography Processing (cont'd)**

- Splitting

- use a linestring to slice a polygon

- ST_Split: can only be used with single geometries, not collections, and the blade you use to cut has to be one dimension lower than what you are cutting up

**Lab: PostGIS Part 2**

**(continued)**

#### PHC6194-Spring2019-Lecture4

By Hui Hu

# PHC6194-Spring2019-Lecture4

Slides for Lecture 4, Spring 2019, PHC6194 Spatial Epidemiology

- 486

Loading comments...