Ann Gledson, Douglas Lowe, Manuele Reani, Caroline Jay, Dave Topping

The University of Manchester

Methods for dealing with sparse and incomplete environmental datasets

An open source tool-set for obtaining and working with environmental data sets

Filling the gaps 

Part 1: Doug 

Part 2: Ann

Part 2: Visualisation tool 

Part 1: Doug 

Cleaning and Imputation 

  • Remove duplicate/unphysical values​
  • Select sites by minimum temporal data coverage
  • scikit-learn (python) used to impute missing data using hourly time series
  • Imputation method
    • Bayesian Ridge
    • Quantile Transformer preprocessing
  • Final data: daily mean / maximum values (or simple daily count)

Part 2: Ann 

Regional estimations 

Concentric Regions method illustrated on fictional postcode regions

  • Regions where sensors exist: take mean
  • Regions with no sensors: take mean of surrounding regions
    • Working outwards until sensors found

All code on Github (see links) 

Regional estimations 

Simple

Distance

Estimator

Concentric

Regions

Estimator

  • Current implementations are only baselines
  • Open source MIT license
  • Use as-is or extend as above
  • Improve estimation techniques

Other

Estimators...

RegionEstimator

Part 2: Visualisation tool 

  • Open source MIT license
  • Usage:
    • Use web-app as-is (our data)
    • Contribute to and extend web-app code
    • Clone the repository and load in your data
  • Quick overview of data
  • visual patterns
    • estimation methods
    • filtering
  • download data
  • time-series
  • Compare with own data

Part 2: Minethegaps demo 

Links 

  • 2016-2019 datasets:
    • measurements (original and imputed)
      • https://zenodo.org/record/4416028
      • includes link to extraction and imputation tool set
    • regional estimations (from original and imputed)
      • https://zenodo.org/record/4475652
      • includes link to region_estimators tool

Environmental Intelligence REs and mine-the-gaps

By Ann Gledson

Environmental Intelligence REs and mine-the-gaps

Whilst the importance of quantifying the impacts of detrimental air quality remains a global priority for both researchers and policy makers, transparent methodologies that support the collection and manipulation of such data are currently lacking. In support of the Britain Breathing citizen science project, aiming to investigate the possible interactions between meteorological or air quality events and seasonal allergy symptoms, we have built a comprehensive data-set, and a web application: ‘Mine the Gaps’, which present daily air quality, pollen and weather readings from the Automatic Urban and Rural Network (AURN) and Met Office monitoring stations in the years 2016 to 2019 inclusive, for the United Kingdom.  Measurement time series are rarely fully complete so we have used machine learning techniques to fill in gaps in these records to ensure as good coverage as possible. To address sparse regional coverage, we propose a simple baseline method called concentric regions. ‘Mine the Gaps’ can be used for graphically exploring and comparing the imputed dataset and the regional estimations. The application code is designed to be reusable and flexible so it can be used to interrogate other geographical datasets. 

  • 606