Ann Gledson, Douglas Lowe, Manuele Reani, Caroline Jay, Dave Topping

The University of Manchester

Making UK weather and air quality data available to a diverse research community

Filling the gaps 

Extraction and cleaning

Estimating Regions

Ready to combine with BB data

Cleaning and Imputation 

  • Remove duplicate/unphysical values​
  • Select sites by minimum temporal data coverage
  • scikit-learn (python) used to impute missing data using hourly time series
  • Imputation method
    • Bayesian Ridge
    • Quantile Transformer preprocessing
  • Final data: daily mean / maximum values (or simple daily count)

Regional estimations 

Concentric Regions method illustrated on fictional postcode regions

  • Regions where sensors exist: take mean
  • Regions with no sensors: take mean of surrounding regions
    • Working outwards until sensors found

Regional estimations 

Simple

Distance

Estimator

Concentric

Regions

Estimator

  • Current implementations are only baselines
  • Open source MIT license
  • Use as-is or extend as above
  • Improve estimation techniques

Other

Estimators...

RegionEstimator

Visualisation tool 

  • Open source MIT license
  • Usage:
    • Use web-app as-is (our data)
    • Contribute to and extend web-app code
    • Docker version: re-deploy and load in your data
  • Quick overview of data
  • visual patterns
    • estimation methods
    • filtering
  • download data
  • time-series
  • Compare with own data

Visualisation tool 

Data on Zenodo (open repository) 

Code on Github 

Code on Zenodo 

Link Github version to DOI 

Scientific Data paper  

Links

  • 2016-2019 environment datasets:
    • measurements (original and imputed)
      • https://zenodo.org/record/4416028
      • includes link to extraction and imputation tool set
    • regional estimations (from original and imputed)
      • https://zenodo.org/record/4475652
      • includes link to region_estimators tool
    • Scientific Data paper:
      • https://www.nature.com/articles/s41597-022-01135-6​
  • Visualisation Tool:
    • http://minethegaps.manchester.ac.uk/​
    • https://github.com/UoMResearchIT/mine-the-gaps
  • Britain Breathing:
    • http://britainbreathing.org/

Open Research Case Study

By Ann Gledson

Open Research Case Study

Whilst the importance of quantifying the impacts of detrimental air quality remains a global priority for both researchers and policy makers, transparent methodologies that support the collection and manipulation of such data are currently lacking. In support of the Britain Breathing citizen science project, aiming to investigate the possible interactions between meteorological or air quality events and seasonal allergy symptoms, we have built a comprehensive data-set, and a web application: ‘Mine the Gaps’, which present daily air quality, pollen and weather readings from the Automatic Urban and Rural Network (AURN) and Met Office monitoring stations in the years 2016 to 2019 inclusive, for the United Kingdom.  Measurement time series are rarely fully complete so we have used machine learning techniques to fill in gaps in these records to ensure as good coverage as possible. To address sparse regional coverage, we propose a simple baseline method called concentric regions. ‘Mine the Gaps’ can be used for graphically exploring and comparing the imputed dataset and the regional estimations. The application code is designed to be reusable and flexible so it can be used to interrogate other geographical datasets. 

  • 320