benchcab

A testing framework for the CABLE land surface model

Open the slide deck at: slides.com/seanbryan/benchcab

Issues with non-standardised testing

Sufficient code coverage is not guaranteed
Easy to cherry pick test cases
Not guaranteed to be reproducible
Test results often incomparable
Takes time and effort to setup

The ideal testing framework

Extensive coverage across test cases
Reproducible tests
Automated with very minimal overhead to run
Results are comparable
Test cases are extensible to new model features

benchcab

An attempt to converge on the ideal testing framework for CABLE
A command line tool written in Python
Initially developed by Martin De Kauwe
Currently developed and maintained by CABLE's user community and ACCESS-NRI
See Github

Usage

1. Getting access on NCI

module use /g/data/hh5/public/modules
module load conda/analysis3-unstable

mkdir -p /scratch/nf33/$USER
cd /scratch/nf33/$USER
git clone git@github.com:CABLE-LSM/bench_example.git
cd bench_example

2. Clone the example work directory

3. Configure the configuration file

vi config.yaml

4. Run the tests

benchcab run

Setup

2. Getting access to benchcab on NCI

module use /g/data/hh5/public/modules
module load conda/analysis3-unstable

mkdir -p /scratch/nf33/$USER
cd /scratch/nf33/$USER
git clone git@github.com:CABLE-LSM/bench_example.git
# git clone https://github.com/CABLE-LSM/bench_example.git
cd bench_example

3. Clone the example work directory

1. Connect to NCI

ssh -Y <userID>@gadi.nci.org.au

Exercise

Running benchcab for:

the Tumbarumba experiment
using the nf33 project
to compare CABLE trunk and branches/Users/ccc561/demo-branch

Exercise

project: nf33

experiment: AU-Tum

realisations: [
  {
    path: "trunk",
  },
  {
    path: "branches/Users/ccc561/demo-branch",
  }
]

modules: [
  intel-compiler/2021.1.1,
  netcdf/4.7.4,
  openmpi/4.1.0
]

1. Edit config.yaml to the following:

2. Run benchcab with the verbose flag enabled

benchcab run --verbose

Automated workflow

Manual step

The example work directory

config.yaml
- Tests are fully reproducible from the config file
- See the documentation for all configuration options.
namelists
- Contains the "base" namelist files used by all tests. Each test applies a test specific "patch" on top of the base namelist files.

$ tree bench_example/
bench_example/
├── config.yaml
├── LICENSE
├── namelists
│   ├── cable.nml
│   ├── cable_soilparm.nml
│   └── pft_params.nml
└── README.md

Flux site tests

Flux site configuration - running CABLE at a single point forced by observed eddy covariance data
Test coverage over different science configurations and sites

Directory structure and files

benchcab creates the following directory structure on success

.
├── benchmark_cable_qsub.sh
├── benchmark_cable_qsub.sh.o<jobid>
├── rev_number-1.log
├── runs
│   └── site
│       ├── logs
│       │   ├── <task>_log.txt
│       │   └── ...
│       ├── outputs
│       │   ├── <task>_out.nc
│       │   └── ...
│       ├── analysis
│       │   └── bitwise-comparisons
│       └── tasks
│           ├── <task>
│           │   ├── cable (executable)
│           │   ├── cable.nml
│           │   ├── cable_soilparm.nml
│           │   └── pft_params.nml
│           └── ...
└── src
    ├── CABLE-AUX
    ├── <realisation-0>
    └── <realisation-1>

See the user guide for a description of all files/directories

Exercise: model evaluation

Upload model outputs from the run directory to modelevaluation.org.
- See the user guide for detailed instructions. Note: Do not go through step 3 (create a model profile). In step 4.c, use "trunk false_feature demo" for the model profile

If your benchcab instance has not finished, you can use the example outputs in:

/scratch/nf33/ccc561/standard_evaluation/runs/site/outputs

Full workflow demo:

Making tests extensible to new features

benchcab can support new namelist parameters introduced by a code change through the patch option
patch specifies any branch specific namelist parameters which are then applied to namelist files for tasks that run the corresponding branch

patch: {
  cable: {
    cable_user: {
      MY_NEW_FEATURE: True
    }
  }
}

Exercise:

Use a different potential evaporation scheme for one branch only

realisations: [
  {
    path: "trunk",
  },
  {
    path: "branches/Users/sb8430/test-branch",
    patch: {
      cable: {
        cable_user: {
          SSNOW_POTEV: "P-M"
        }
      }
    }
  }
]

Run benchcab with the following namelist patch in config.yaml:

Custom science configurations

science_configurations: [
  { # S0 configuration
    cable: {
      cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
      }
    }
  },
  { # S1 configuration
    cable: {
      cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
      }
    }
  }
]

Users can specify their own science configurations in config.yaml

Running with CABLE version 2.x

See the documentation for potential gotchas.

Future work

🚧 Test suites for:

◦ Global/regional simulations (offline CABLE)

◦ Global/regional simulations (online CABLE)

◦ CABLE-CASA-CNP

🚧 A standard set of science configurations.

🚧 Fortran code coverage analysis.

🚧 Automated model evaluation step.

🚧 Tests for different compilers and compiler flags.

🚧 Updates to analysis plots for flux site tests.

🚧 Model evaluation with ILAMB

Summary

benchcab is a testing framework that is:
- Easy to use
- Standardised
- Automated
- Reproducible
benchcab should be used by everyone who contributes to CABLE development!

We need your feedback!

GitHub issues: github.com/CABLE-LSM/benchcab/issues

ACCESS-Hive forum: forum.access-hive.org.au

Model evaluation

Comparison analysis plots done on modelevaluation.org
T he benchcab documentation has detailed instructions.

Regression testing

By default, benchcab runs bitwise comparison checks on model outputs across model versions.

Demo

realisations: [
  {
    path: "trunk",
    name: "trunk_head"
  },
  {
    path: "trunk",
    name: "trunk_r9468",
    revision: 9468
  }
]

Motivation / rationale

Evaluation: We want CABLE developers to easily evaluate the impact of their code additions to CABLE
Standardisation: We want a standardised evaluation of CABLE to allow for comparison
Automation: CABLE is highly configurable, running tests manually for every possible configuration is time consuming

What `benchcab` does (hopefully):

benchcab provides a fast, standardised way for developers of CABLE to evaluate how code changes affect the model output.
benchcab automates running tests against a lot of possible configurations of CABLE
checks out, builds and runs all tests in a reproducible environment

Configurability

benchcab is intended to have limited configurability by design
Current configurable options include:
- Running specific steps of the workflow in isolation
- Choosing the modules used for compiling CABLE
- Running CABLE with a subset of the configurations
- Adding a "branch specific" namelist patch

Modes currently supported

Regression mode:

run 2 models with the same science options

New feature mode:

run 2 models. One with a science patch added to the science options

Ensemble mode:

run any number of models with custom science options

Will be required for code submissions

Necessary to support old versions

Better set of analysis plots

Summary plots: measure improvement of new branch vs control
- for all sites, all metrics, all variables
  - same per variable
  - same per PFT
  - same per PFT, per variable

"PLUMBER" plots

for outputs that contain a single flux site: timeseries, seasonal and diurnal cycle, model/obs scatter plots, ...

Use full set of variables available at flux sites

Demo: flux site tests

Running `benchcab`

benchcab is executed via the command line
sub-commands allow specific steps of the workflow to be run in isolation

$ benchcab -h
usage: benchcab [-h] [-V] command ...

benchcab is a tool for evaluation of the CABLE land surface model.

positional arguments:
  command
    run                 Run all test suites for CABLE.
    fluxnet             Run the fluxnet test suite for CABLE.
    checkout            Run the checkout step in the benchmarking workflow.
    build               Run the build step in the benchmarking workflow.
    fluxnet-setup-work-dir
                        Run the work directory setup step of the fluxnet command.
    fluxnet-run-tasks   Run the fluxnet tasks of the main fluxnet command.
    spatial             Run the spatial tests only.

optional arguments:
  -h, --help            Show this help message and exit.
  -V, --version         Show program's version number and exit.

Running `benchcab`

To run the full test suite, use the benchcab run command:

$ benchcab run
Creating src directory: /scratch/tm70/sb8430/bench_example/src
Checking out repositories...
Successfully checked out trunk at revision 9550
Successfully checked out test-branch at revision 9550
Successfully checked out CABLE-AUX at revision 9550
Writing revision number info to rev_number-1.log
Compiling CABLE serially for realisation trunk...
Successfully compiled CABLE for realisation trunk
Compiling CABLE serially for realisation test-branch...
Successfully compiled CABLE for realisation test-branch
Setting up run directory tree for FLUXNET tests...
Creating runs/site/logs directory: /scratch/tm70/sb8430/bench_example/runs/site/logs
Creating runs/site/outputs directory: /scratch/tm70/sb8430/bench_example/runs/site/outputs
Creating runs/site/tasks directory: /scratch/tm70/sb8430/bench_example/runs/site/tasks
Creating task directories...
Setting up tasks...
Successfully setup FLUXNET tasks
Creating PBS job script to run FLUXNET tasks on compute nodes: benchmark_cable_qsub.sh
PBS job submitted: 82479088.gadi-pbs
The CABLE log file for each task is written to runs/site/logs/<task_name>_log.txt
The CABLE standard output for each task is written to runs/site/tasks/<task_name>/out.txt
The NetCDF output for each task is written to runs/site/outputs/<task_name>_out.nc

Using modelevaluation.org

Open and log into modelevaluation.org
Navigate to the NRI Land testing workspace
Create a model profile for the two model branches you are using
Create a model output and upload the outputs in runs/sites/outputs/ under your work directory
Launch the analysis

benchcab

Issues with non-standardised testing

The ideal testing framework

benchcab

Usage

Setup

Exercise

Running benchcab for:

Exercise

Automated workflow

The example work directory

Flux site tests

Directory structure and files

Exercise: model evaluation

Making tests extensible to new features

Exercise:

Custom science configurations

Running with CABLE version 2.x

Future work

Summary

We need your feedback!

Model evaluation

Regression testing

Demo

Motivation / rationale

What benchcab does (hopefully):

Configurability

Modes currently supported

Better set of analysis plots

Demo: flux site tests

Running benchcab

Running benchcab

Using modelevaluation.org

benchcab

What `benchcab` does (hopefully):

Running `benchcab`

Running `benchcab`