PHC6194 SPATIAL EPIDEMIOLOGY

Geocoding

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

February 5, 2020

Geocoding

 

Lab: PostGIS TIGER Geocoder


Online Geocoding Service
 

Lab: Google Map Geocoding API

Geocoding

Geocoding

  • The process of transforming a description of a location—such as a pair of coordinates, an address, or a name of a place—to a location on the earth's surface
     
  • An important step in spatial epidemiology

Address

2004 Mowry Road, 4th Floor, Gainesville, FL 32610

Street Number, Street Name, Street Type, Unit Number, City, Zip code

Source of Reference Map

  • Topologically Integrated Geographic Encoding and Referencing (TIGER)
    -  developed by US Census Bureau
    -  free, nation-wide data
     
  • TIGER include key features of geographic interest in the entire US:
    -  political boundaries
    -  lakes
    -  reservations
    -  major and minor roads, rivers, etc.

Enabling TIGER Geocoder within PostGIS

sudo -u postgres psql -c "CREATE EXTENSION fuzzystrmatch; CREATE EXTENSION postgis_tiger_geocoder;" phc6194spr18db

  • Relies on string matching
    -  to find streets with similar spellings
    -  fuzzy-string-match extension

sudo -u postgres psql -c "GRANT USAGE ON SCHEMA tiger TO PUBLIC; GRANT USAGE ON SCHEMA tiger_data TO PUBLIC;
GRANT SELECT, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA tiger TO PUBLIC; GRANT SELECT, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA tiger_data TO PUBLIC;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA tiger TO PUBLIC; ALTER DEFAULT PRIVILEGES IN SCHEMA tiger_data
GRANT SELECT, REFERENCES ON TABLES TO PUBLIC;" phc6194spr18db

  • Grant permissions to TIGER

Loading TIGER data

  • PostGIS has several built-in functions that will generate scripts to download, decompress, and load TIGER data to database directly
     
  • Steps:
    -  create a folder to store all the TIGER data
    -  generate scripts to download, decompress, and load data
    -  execute the scripts

Normalizing Addresses

  • Address standardization / normalization:
    -  A preparatory step before geocoding is to parse the address into components such as street numbers, directional prefixes, street numbers, suffixes, etc.
     
  • Normalizers:
    -  normalize_address
    -  pagc_normalize_address
    -  postal address geocoder

Geocoding

  • Geocoding using address text:
    -  uses normalize_address function to normalize the address by deafult
    -  can also switch to pagc_normalize_address
     
  • Geocoding using normalized addresses
     
  • Batch geocoding

Lab: PostGIS TIGER Geocoder

git pull

Online Geocoding Service

Application Program Interface

  • Application programming interface (API) is a set of subroutine definitions, protocols, and tools for building application software
     
  • In general terms, it's a set of clearly defined methods of communication between various software components
     
  • Common web service technologies:
    -  SOAP - Simple Object Access Protocol
    -  REST - Representational State Transfer

Google Geocoding API

Security and Rate Limiting

  • The data provided by these APIs is usually valuable
     
  • The data providers might
    -  limit the number of requests per day,
    -  or demand an API "key",
    -  or charge for usage

Lab: Google Map Geocoding API

Increase the limit here: e.g. 1000