benchcab

A testing framework for the CABLE land surface model

Open the slide deck at: slides.com/seanbryan/benchcab

Issues with non-standardised testing

  • Sufficient code coverage is not guaranteed
  • Easy to cherry pick test cases
  • Not guaranteed to be reproducible
  • Test results often incomparable
  • Takes time and effort to setup

The ideal testing framework

  • Extensive coverage across test cases
  • Reproducible tests
  • Automated with very minimal overhead to run
  • Results are comparable
  • Test cases are extensible to new model features

benchcab

  • An attempt to converge on the ideal testing framework for CABLE
  • A command line tool written in Python
  • Initially developed by Martin De Kauwe
  • Currently developed and maintained by CABLE's user community and ACCESS-NRI
  • See Github

Usage

1. Getting access on NCI

module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
mkdir -p /scratch/nf33/$USER
cd /scratch/nf33/$USER
git clone git@github.com:CABLE-LSM/bench_example.git
cd bench_example

2. Clone the example work directory

3. Configure the configuration file

vi config.yaml

4. Run the tests

benchcab run

Setup

2. Getting access to benchcab on NCI

module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
mkdir -p /scratch/nf33/$USER
cd /scratch/nf33/$USER
git clone git@github.com:CABLE-LSM/bench_example.git
# git clone https://github.com/CABLE-LSM/bench_example.git
cd bench_example

3. Clone the example work directory

1. Connect to NCI

ssh -Y <userID>@gadi.nci.org.au

Exercise

Running benchcab for:

  • the Tumbarumba experiment
  • using the nf33 project
  • to compare CABLE trunk and branches/Users/ccc561/demo-branch

Exercise

project: nf33

experiment: AU-Tum

realisations: [
  {
    path: "trunk",
  },
  {
    path: "branches/Users/ccc561/demo-branch",
  }
]

modules: [
  intel-compiler/2021.1.1,
  netcdf/4.7.4,
  openmpi/4.1.0
]

1. Edit config.yaml to the following:

2. Run benchcab with the verbose flag enabled

benchcab run --verbose

Automated workflow

Manual step

The example work directory

  • config.yaml
    • Tests are fully reproducible from the config file
    • See the documentation for all configuration options.
  • namelists
    • Contains the "base" namelist files used by all tests. Each test applies a test specific "patch" on top of the base namelist files.
$ tree bench_example/
bench_example/
├── config.yaml
├── LICENSE
├── namelists
│   ├── cable.nml
│   ├── cable_soilparm.nml
│   └── pft_params.nml
└── README.md

Flux site tests

  • Flux site configuration - running CABLE at a single point forced by observed eddy covariance data
  • Test coverage over different science configurations and sites

Directory structure and files

  • benchcab creates the following directory structure on success
.
├── benchmark_cable_qsub.sh
├── benchmark_cable_qsub.sh.o<jobid>
├── rev_number-1.log
├── runs
│   └── site
│       ├── logs
│       │   ├── <task>_log.txt
│       │   └── ...
│       ├── outputs
│       │   ├── <task>_out.nc
│       │   └── ...
│       ├── analysis
│       │   └── bitwise-comparisons
│       └── tasks
│           ├── <task>
│           │   ├── cable (executable)
│           │   ├── cable.nml
│           │   ├── cable_soilparm.nml
│           │   └── pft_params.nml
│           └── ...
└── src
    ├── CABLE-AUX
    ├── <realisation-0>
    └── <realisation-1>

Exercise: model evaluation

  • Upload model outputs from the run directory to modelevaluation.org.
    • See the user guide for detailed instructions. Note: Do not go through step 3 (create a model profile). In step 4.c, use "trunk false_feature demo" for the model profile
  • If your benchcab instance has not finished, you can use the example outputs in:
/scratch/nf33/ccc561/standard_evaluation/runs/site/outputs

Full workflow demo:

Making tests extensible to new features

  • benchcab can support new namelist parameters introduced by a code change through the patch option
  • patch specifies any branch specific namelist parameters which are then applied to namelist files for tasks that run the corresponding branch
patch: {
  cable: {
    cable_user: {
      MY_NEW_FEATURE: True
    }
  }
}

Exercise:

Use a different potential evaporation scheme for one branch only

realisations: [
  {
    path: "trunk",
  },
  {
    path: "branches/Users/sb8430/test-branch",
    patch: {
      cable: {
        cable_user: {
          SSNOW_POTEV: "P-M"
        }
      }
    }
  }
]
  • Run benchcab with the following namelist patch in config.yaml:

Custom science configurations

science_configurations: [
  { # S0 configuration
    cable: {
      cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
      }
    }
  },
  { # S1 configuration
    cable: {
      cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
      }
    }
  }
]

Users can specify their own science configurations in config.yaml

Running with CABLE version 2.x

See the documentation for potential gotchas.

Future work

🚧  Test suites for:

      ◦ Global/regional simulations (offline CABLE)

      ◦ Global/regional simulations (online CABLE)

      ◦ CABLE-CASA-CNP

🚧  A standard set of science configurations.

🚧  Fortran code coverage analysis.

🚧  Automated model evaluation step.

🚧  Tests for different compilers and compiler flags.

🚧  Updates to analysis plots for flux site tests.

🚧  Model evaluation with ILAMB

Summary

  • benchcab is a testing framework that is:
    • Easy to use
    • Standardised
    • Automated
    • Reproducible
  • benchcab should be used by everyone who contributes to CABLE development!

We need your feedback!

Model evaluation

Regression testing

  • By default, benchcab runs bitwise comparison checks on model outputs across model versions.

Demo

realisations: [
  {
    path: "trunk",
    name: "trunk_head"
  },
  {
    path: "trunk",
    name: "trunk_r9468",
    revision: 9468
  }
]

Motivation / rationale

  1. Evaluation: We want CABLE developers to easily evaluate the impact of their code additions to CABLE

  2. Standardisation: We want a standardised evaluation of CABLE to allow for comparison

  3. Automation: CABLE is highly configurable, running tests manually for every possible configuration is time consuming

What benchcab does (hopefully):

  1. benchcab provides a fast, standardised way for developers of CABLE to evaluate how code changes affect the model output.
  2. benchcab automates running tests against a lot of possible configurations of CABLE
  3. checks out, builds and runs all tests in a reproducible environment

Configurability

  • benchcab is intended to have limited configurability by design
  • Current configurable options include:
    • Running specific steps of the workflow in isolation
    • Choosing the modules used for compiling CABLE
    • Running CABLE with a subset of the configurations
    • Adding a "branch specific" namelist patch

Modes currently supported

Regression mode:

run 2 models with the same science options

New feature mode:

run 2 models. One with a science patch added to the science options

Ensemble mode:

run any number of models with custom science options

Will be required for code submissions

Necessary to support old versions

Better set of analysis plots

  • Summary plots: measure improvement of new branch vs control
    • for all sites, all metrics, all variables
      • same per variable
      • same per PFT
      • same per PFT, per variable
  • "PLUMBER" plots
  • for outputs that contain a single flux site: timeseries, seasonal and diurnal cycle, model/obs scatter plots, ...
  • Use full set of variables available at flux sites

Demo: flux site tests

Running benchcab

  • benchcab is executed via the command line
  • sub-commands allow specific steps of the workflow to be run in isolation
$ benchcab -h
usage: benchcab [-h] [-V] command ...

benchcab is a tool for evaluation of the CABLE land surface model.

positional arguments:
  command
    run                 Run all test suites for CABLE.
    fluxnet             Run the fluxnet test suite for CABLE.
    checkout            Run the checkout step in the benchmarking workflow.
    build               Run the build step in the benchmarking workflow.
    fluxnet-setup-work-dir
                        Run the work directory setup step of the fluxnet command.
    fluxnet-run-tasks   Run the fluxnet tasks of the main fluxnet command.
    spatial             Run the spatial tests only.

optional arguments:
  -h, --help            Show this help message and exit.
  -V, --version         Show program's version number and exit.

Running benchcab

  • To run the full test suite, use the benchcab run command:
$ benchcab run
Creating src directory: /scratch/tm70/sb8430/bench_example/src
Checking out repositories...
Successfully checked out trunk at revision 9550
Successfully checked out test-branch at revision 9550
Successfully checked out CABLE-AUX at revision 9550
Writing revision number info to rev_number-1.log
Compiling CABLE serially for realisation trunk...
Successfully compiled CABLE for realisation trunk
Compiling CABLE serially for realisation test-branch...
Successfully compiled CABLE for realisation test-branch
Setting up run directory tree for FLUXNET tests...
Creating runs/site/logs directory: /scratch/tm70/sb8430/bench_example/runs/site/logs
Creating runs/site/outputs directory: /scratch/tm70/sb8430/bench_example/runs/site/outputs
Creating runs/site/tasks directory: /scratch/tm70/sb8430/bench_example/runs/site/tasks
Creating task directories...
Setting up tasks...
Successfully setup FLUXNET tasks
Creating PBS job script to run FLUXNET tasks on compute nodes: benchmark_cable_qsub.sh
PBS job submitted: 82479088.gadi-pbs
The CABLE log file for each task is written to runs/site/logs/<task_name>_log.txt
The CABLE standard output for each task is written to runs/site/tasks/<task_name>/out.txt
The NetCDF output for each task is written to runs/site/outputs/<task_name>_out.nc

Using modelevaluation.org

  1. Open and log into modelevaluation.org
  2. Navigate to the NRI Land testing workspace
  3. Create a model profile for the two model branches you are using
  4. Create a model output and upload the outputs in runs/sites/outputs/ under your work directory
  5. Launch the analysis

benchcab

By seanbryan

benchcab

ACCESS-NRI workshop benchcab presentation + demo/training

  • 152