Ann Gledson, Douglas Lowe, Manuele Reani, Caroline Jay, Dave Topping

The University of Manchester

Methods for dealing with sparse and incomplete environmental datasets

An open source tool-set for obtaining and working with environmental data sets

Filling the gaps 

Part 1: Doug 

Part 2: Ann

Part 2: Visualisation tool 

Part 1: Doug 

Cleaning and Imputation 

  • Remove duplicate/unphysical values​
  • Select sites by minimum temporal data coverage
  • scikit-learn (python) used to impute missing data using hourly time series
  • Imputation method
    • Bayesian Ridge
    • Quantile Transformer preprocessing
  • Final data: daily mean / maximum values (or simple daily count)

Part 2: Ann 

Regional estimations 

Concentric Regions method illustrated on fictional postcode regions

  • Regions where sensors exist: take mean
  • Regions with no sensors: take mean of surrounding regions
    • Working outwards until sensors found

All code on Github (see links) 

Regional estimations 

Simple

Distance

Estimator

Concentric

Regions

Estimator

  • Current implementations are only baselines
  • Open source MIT license
  • Use as-is or extend as above
  • Improve estimation techniques

Other

Estimators...

RegionEstimator

Part 2: Visualisation tool 

  • Open source MIT license
  • Usage:
    • Use web-app as-is (our data)
    • Contribute to and extend web-app code
    • Clone the repository and load in your data
  • Quick overview of data
  • visual patterns
    • estimation methods
    • filtering
  • download data
  • time-series
  • Compare with own data

Part 2: Minethegaps demo 

Links 

  • 2016-2019 datasets:
    • measurements (original and imputed)
      • https://zenodo.org/record/4416028
      • includes link to extraction and imputation tool set
    • regional estimations (from original and imputed)
      • https://zenodo.org/record/4475652
      • includes link to region_estimators tool
Made with Slides.com