Enabling FAIR access to environmental data: The Research Software Engineer (RSE) role 

 

Ann Gledson, Research Software Engineer

The University of Manchester
 

https://slides.com/anngledson/fair-access-environment-data

FAIR Open Data

  • Reduce repetition of data cleaning and wrangling tasks
  • Focus more on key research questions
  • Reproducibility -> Trust!
  • Visibility:
    • E.g. Nature Scientific Data journal
    • Easy ingest to platforms: NERC Data centres
    • Citations
  • Collaboration, communities and networking 
  • Funding opportunities

Data wrangling

FAIR data principles

Nature Scientific Data: principles

Credit: Scientists who share their data in a FAIR manner deserve appropriate credit. 

Re-use: Standardized and detailed descriptions make data easier to find and reuse. 

Quality: Critical evaluation is needed to verify experimental rigour.

Discovery: Scientists should be able to easily find datasets that are relevant. 

Open: Scientists work best when they can easily connect and collaborate.

Service: Committed to providing excellent service to both authors and readers.

(Full version: https://www.nature.com/sdata/about/principles)

Example project

Environment data

  • Automatic Urban and Rural Network (AURN)
    • NOx, SO2, O3, NO2, PM10, PM2.5
  • ​Medical and Environmental Data (Mash-up) Infrastructure (MEDMI)
    • Meteorological: temp, pressure, dewpoint temp, relative humidity
    • Pollens: alnus, ambrosia, artemesia, ..., urtica
  • ​​European Monitoring and Evaluation Programme (EMEP)
    • Model forecast data
    • ​NOx, NO2, SO2, O3, PM10, PM2.5
  • Complex extraction process:
    • Multiple data sources
    • Missing data (e.g. sensor down-time)
    • Variable UK area coverage

Shared Data

NERC Digital Solutions Hub

  • 5-7 RSE roles at UoM
    • Technical oversight
    • 4-6 to work with the NERC Data Centre RSEs
    • JASMIN Infrastructure RSE
       
  • Aims:
    • Convert a broad range of hub requirements into a set of tools that allow FAIR access to existing and future datasets.
    • Understand ways of working and ensure that the resulting tool-sets are robust and accessible for all.

Open Data links

  • FAIR Data: https://www.go-fair.org/fair-principles/

  • Data sharing platforms:

    • Zenodo: https://zenodo.org/

    • Figshare: https://figshare.com/

    • Dryad: https://datadryad.org/stash

  • Open Science webinar - Helen Glaves (British Geological Survey):

    • https://www.youtube.com/channel/UCv8vRIuTxCP-DgNMCq9KxqA/videos

Environment dataset links

  • 2016-2019 environment datasets:
    • measurements (original and imputed)
      • https://zenodo.org/record/4416028
      • includes link to extraction and imputation tool set
    • regional estimations (from original and imputed)
      • https://zenodo.org/record/4475652
      • includes link to region_estimators tool
    • Scientific Data paper:
      • https://www.nature.com/articles/s41597-022-01135-6
    • Visualisation Tool:
      • http://minethegaps.manchester.ac.uk/
Made with Slides.com