Enabling FAIR access to environmental data: The Research Software Engineer (RSE) role 

 

Ann Gledson, Research Software Engineer

The University of Manchester
 

https://slides.com/anngledson/fair-access-environment-data

FAIR Open Data

  • Reduce repetition of data cleaning and wrangling tasks
  • Focus more on key research questions
  • Reproducibility -> Trust!
  • Visibility:
    • E.g. Nature Scientific Data journal
    • Easy ingest to platforms: NERC Data centres
    • Citations
  • Collaboration, communities and networking 
  • Funding opportunities

Data wrangling

FAIR data principles

Nature Scientific Data: principles

Credit: Scientists who share their data in a FAIR manner deserve appropriate credit. 

Re-use: Standardized and detailed descriptions make data easier to find and reuse. 

Quality: Critical evaluation is needed to verify experimental rigour.

Discovery: Scientists should be able to easily find datasets that are relevant. 

Open: Scientists work best when they can easily connect and collaborate.

Service: Committed to providing excellent service to both authors and readers.

(Full version: https://www.nature.com/sdata/about/principles)

Example project

Environment data

  • Automatic Urban and Rural Network (AURN)
    • NOx, SO2, O3, NO2, PM10, PM2.5
  • ​Medical and Environmental Data (Mash-up) Infrastructure (MEDMI)
    • Meteorological: temp, pressure, dewpoint temp, relative humidity
    • Pollens: alnus, ambrosia, artemesia, ..., urtica
  • ​​European Monitoring and Evaluation Programme (EMEP)
    • Model forecast data
    • ​NOx, NO2, SO2, O3, PM10, PM2.5
  • Complex extraction process:
    • Multiple data sources
    • Missing data (e.g. sensor down-time)
    • Variable UK area coverage

Shared Data

NERC Digital Solutions Hub

  • 5-7 RSE roles at UoM
    • Technical oversight
    • 4-6 to work with the NERC Data Centre RSEs
    • JASMIN Infrastructure RSE
       
  • Aims:
    • Convert a broad range of hub requirements into a set of tools that allow FAIR access to existing and future datasets.
    • Understand ways of working and ensure that the resulting tool-sets are robust and accessible for all.

Open Data links

  • FAIR Data: https://www.go-fair.org/fair-principles/

  • Data sharing platforms:

    • Zenodo: https://zenodo.org/

    • Figshare: https://figshare.com/

    • Dryad: https://datadryad.org/stash

  • Open Science webinar - Helen Glaves (British Geological Survey):

    • https://www.youtube.com/channel/UCv8vRIuTxCP-DgNMCq9KxqA/videos

Environment dataset links

  • 2016-2019 environment datasets:
    • measurements (original and imputed)
      • https://zenodo.org/record/4416028
      • includes link to extraction and imputation tool set
    • regional estimations (from original and imputed)
      • https://zenodo.org/record/4475652
      • includes link to region_estimators tool
    • Scientific Data paper:
      • https://www.nature.com/articles/s41597-022-01135-6
    • Visualisation Tool:
      • http://minethegaps.manchester.ac.uk/

FAIR Access to Environment Data (RSE role)

By Ann Gledson

FAIR Access to Environment Data (RSE role)

Facilitating access to data using open sustainable software development methods is an important part of the RSE role. Working on a recent Turing Institute-funded project, RSEs at the University of Manchester created an open-source environmental data-set and tool-set that have been published in the Nature Scientific Data journal. Working on the Digital Solutions Hub, we will be continuing this work, converting a broad range of hub requirements into a set of tools that allow FAIR (Findable, Accessible, Interoperable, Reusable) access to existing and future datasets.

  • 425