Environmental data:
extraction and interpolation

An open source tool-set for obtaining environmental data sets and aligning to your research requirements

Ann Gledson, Douglas Lowe, Manuele Reani, Caroline Jay, Dave Topping

The University of Manchester

  • Automatic Urban and Rural Network (AURN)
    • NOx, SO2, O3, NO2, PM10, PM2.5
  • ​Medical and Environmental Data (Mash-up) Infrastructure (MEDMI)
    • Meteorological: temp, pressure, dewpoint temp, relative humidity
    • Pollens: alnus, ambrosia, artemesia, ..., urtica
  • ​​European Monitoring and Evaluation Programme (EMEP)
    • Model forecast data
    • ​NOx, NO2, SO2, O3, PM10, PM2.5
  • Complex extraction process:
    • Multiple data sources
    • Missing data (e.g. sensor down-time)
    • Variable UK area coverage

Available data 

Cleaning and Imputation 

  • Remove duplicate/unphysical values​
  • Select sites by minimum temporal data coverage
  • scikit-learn (python) used to impute missing data using hourly time series
  • Imputation method
    • Bayesian Ridge
    • Quantile Transformer preprocessing
  • Final data: daily mean / maximum values (or simple daily count)

Regional estimations 

Diffusion method illustrated on fictional postcode regions

  • Regions where sensors exist: take mean
  • Regions with no sensors: take mean of surrounding regions
    • Working outwards until sensors found

Filling the gaps 

Links 

  • 2016-2019 dataset:   https://doi.org/10.5281/zenodo.4315224
    • Download the dataset
    • Links to extraction and imputation tool set
  • Mine the Gaps:   minethegaps.manchester.ac.uk
    • Data visualisation and example use case
Made with Slides.com