Environmental data:
extraction and interpolation

An open source tool-set for obtaining environmental data sets and aligning to your research requirements

Ann Gledson, Douglas Lowe, Manuele Reani, Caroline Jay, Dave Topping

The University of Manchester

  • Automatic Urban and Rural Network (AURN)
    • NOx, SO2, O3, NO2, PM10, PM2.5
  • ​Medical and Environmental Data (Mash-up) Infrastructure (MEDMI)
    • Meteorological: temp, pressure, dewpoint temp, relative humidity
    • Pollens: alnus, ambrosia, artemesia, ..., urtica
  • ​​European Monitoring and Evaluation Programme (EMEP)
    • Model forecast data
    • ​NOx, NO2, SO2, O3, PM10, PM2.5
  • Complex extraction process:
    • Multiple data sources
    • Missing data (e.g. sensor down-time)
    • Variable UK area coverage

Available data 

Cleaning and Imputation 

  • Remove duplicate/unphysical values​
  • Select sites by minimum temporal data coverage
  • scikit-learn (python) used to impute missing data using hourly time series
  • Imputation method
    • Bayesian Ridge
    • Quantile Transformer preprocessing
  • Final data: daily mean / maximum values (or simple daily count)

Regional estimations 

Diffusion method illustrated on fictional postcode regions

  • Regions where sensors exist: take mean
  • Regions with no sensors: take mean of surrounding regions
    • Working outwards until sensors found

Filling the gaps 

Links 

  • 2016-2019 dataset:   https://doi.org/10.5281/zenodo.4315224
    • Download the dataset
    • Links to extraction and imputation tool set
  • Mine the Gaps:   minethegaps.manchester.ac.uk
    • Data visualisation and example use case

Environmental Intelligence POSTER

By Ann Gledson

Environmental Intelligence POSTER

Whilst the importance of quantifying the impacts of detrimental air quality remains a global priority for both researchers and policy makers, transparent methodologies that support the collection and manipulation of such data are currently lacking. To support research investigating the inter-play between common gaseous and particulate pollutants with meteorology and biological particles, we present a comprehensive data-set containing daily air quality, pollen and weather readings from the Automatic Urban and Rural Network (AURN) and Met Office monitoring stations in the years 2016 to 2019 inclusive, for the United Kingdom. We describe the data sources, how the data has been cleaned, and how we dealt with missing values and sparse regional coverage. The resulting dataset, which integrates supplementary regional data with other relevant variables, including urban-ness and altitude, is designed to maximise its utility to those using air quality data in their research. Alongside the data we provide the tools used for collecting, cleaning and estimation, anticipating that others may want not just to use our data as is, but also extend and modify the approach for their own research. In addition, we introduce our 'Mine the Gaps' web application, providing an interesting and graphical demonstration of how this dataset can be utilised.

  • 789