PFHub UpDates and Ideas

Daniel Wheeler

Phase Field Workshop, 2023-08-16

Long Term Vision for pfhub

  • Central registry of phase field curated results with CLI/Web/API tool to view and query results in many way.
  • Website is the registry and examples of using the tool with views + phase field guide materials

 

 

 

Please add ideas and open discussions on usnistgov/pfhub

UPdates

  • Update environments
    • Nix environments updated
    • Native Python (Pip, Conda, Mamba) environment implemented / tested
  • Implementing CLI tool for PFHub
  • Zenodo submission process
  • Papermill / Jupyter website build
    • BM1, BM2, BM3, BM4, BM7, BM8
  • Generated new schema using linkml (Trevor)

Fair Improvements

  • New schema in human readable form using LinkML
    • Seamless conversion between schema.org, json-schema, jsonld, yaml
    • MaRDA working group for more general phase field schema (tomorrow)
  • Require implementation to be in publicly accessible archive
  • Encourage use of FAIR4RS principles (metadata.json in repo)
  • Require curation of result data on Zenodo (or similar)
  • Improve data accessibility using Jupyter Notebooks and Python utility (in place of JS stack and custom apps)

PFHUB CLI TOOL

  • Makes the submission process more coherent
    • CLI tool can be used by a user on the local filesystem for submissions
      • View / compare results on local FS as they appear on website
    • Use same CLI tool for automated submissions and continuous integration
    • Implement Zenodo / PFHub submission to be a seamless process
  • Not quite finished for this meeting
  • First version on PyPI soon
  • Eventually the CLI will be subsumed by an upload notebook hosted locally or via cloud service

What Next?

  • What next?
    • Split repository into python-pfhub and web
    • Finish new upload process with upload notebook built using CLI
    • Use Jupyter Book to build website (or equivalent)
    • Update BM5, BM6 and include BM9
    • Small things
      • DOIs for benchmark notebooks with appropriate authors #1515
  • Aspirational goals
    • Cloud-hosted submission notebook
    • Increase data capabilities, metrics and display
      • Field data
    • Expand beyond Zenodo

LOCAL FS

pfhub CLI

USER

submission process

notebooks

PFHUB.YAml

csv, VTK, ...

Github REview

pfhub CLI

reviewer

surge

ACTions

website

HOSTED Submission Notebook

PFHUB CLI

$ pfhub --help
Usage: pfhub [OPTIONS] COMMAND [ARGS]...

  Submit results to PFHub and manipulate PFHub data

Options:
  --help  Show this message and exit.

Commands:
  convert             Convert between formats (old PFHub schema to new...
  convert-to-old      Convert between formats (new PFHub schema to old...
  download            Download a PFHub record
  download-zenodo     Download a Zenodo record
  generate-notebook   Generate the comparison notebook for the...
  generate-yaml       Infer a PFHub YAML file from GitHub ID, ORCID,...
  submit              Submit to Zenodo and open PFHub PR
  submit-from-zenodo  Submit an existing Zenodo record to PFHub
  test                Run the PFHub tests
  upload              Upload PFHub data to Zenodo
  validate            Validate a YAML file with the new PFHub schema
  validate-old        Validate a YAML file with the old PFHub schema

  See the documentation at
  https://github.com/usnistgov/pfhub/blob/master/CLI.md (under construction)
  • What data to we currently collect?
    • Provenance
    • Benchmark ID
    • Implementation repository
    • Post-processed outputs
    • Limited metadata
      • run time
      • memory usage
      • simulation time
    • Limited hardware data
    • Limited software data
    • Dataframe style data / time series
      • time vs free energy

Data collection

OLD schema

---
_id: 93113e00-0c5e-11e8-b653-4f1ed6519c85
benchmark:
  id: 3a
  version: '1'
data:
- name: run_time
  values:
  - sim_time: '1500'
    wall_time: '266576'
- name: memory_usage
  values:
  - unit: KB
    value: '2000000'
- name: efficiency
  transform:
  - as: x
    expr: "1. / datum.time_ratio"
    type: formula
  - as: y
    expr: datum.memory
    type: formula
  values:
  - memory: 2000000.0
    time_ratio: 0.005626
- description: Free energy versus time
  format:
    parse:
      free_energy: number
      time: number
    type: csv
  name: free_energy
  transform:
  - as: x
    expr: datum.time
    type: formula
  - as: y
    expr: datum.free_energy
    type: formula
  type: line
  url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/2b802a25593501b30cb0d8648a3b588dc54b36f7/time.csv
- description: Solid fraction versus time
  format:
    parse:
      solid_fraction: number
      time: number
    type: csv
  name: solid_fraction
  transform:
  - as: x
    expr: datum.time
    type: formula
  - as: y
    expr: datum.solid_fraction
    type: formula
  type: line
  url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/2b802a25593501b30cb0d8648a3b588dc54b36f7/time.csv
- description: Tip position versus time
  format:
    parse:
      time: number
      tip_position: number
    type: csv
  name: tip_position
  transform:
  - as: x
    expr: datum.time
    type: formula
  - as: y
    expr: datum.tip_position
    type: formula
  type: line
  url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/2b802a25593501b30cb0d8648a3b588dc54b36f7/time.csv
- description: Zero contour at t=1500s
  format:
    parse:
      x: number
      y: number
    type: csv
  name: phase_field_1500
  type: line
  url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/d0dcd61541604127a16c017891dcda1577c92997/contour.csv
date: 1518046097
layout: post
message: ' '
metadata:
  author:
    email: daniel.wheeler2@gmail.com
    first: Daniel
    github_id: wd15
    last: Wheeler
  hardware:
    acc_architecture: none
    clock_rate: '3.2'
    cores: '1'
    cpu_architecture: x86_64
    nodes: '1'
    parallel_model: serial
  implementation:
    container_url: ''
    name: fipy
    repo:
      url: https://gist.github.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb
      version: fc9134b08a9c
  summary: FiPy implementation of benchmark 3a on a 960x960 grid. The shape of the
    dendrite doesn't look exactly like the version in the notebook.
  timestamp: 2 February, 2018

New schema

id: fipy_1a_tkphd_pysparse
benchmark_problem: 1a.0
contributors:
- id: https://orcid.org/0000-0002-2920-8302
  name: Trevor Keller
  affiliation:
  - NIST
  email: trevor.keller@nist.gov
- id: https://orcid.org/0000-0002-2653-7418
  name: Daniel Wheeler
  affiliation:
  - NIST
  email: daniel.wheeler@nist.gov
date_created: '2017-01-10'
implementation:
  url: https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/tree/master/periodic
results:
  fictive_time: 53333.3
  hardware:
    architecture: cpu
    cores: 1
    nodes: 1
  memory_in_kb: 28600
  time_in_s: 157187
  dataset_temporal:
  - name: free_energy.csv
    columns:
    - time
    - free_energy
schema:
  url: https://github.com/usnistgov/pfhub-schema/tree/e0010d9/project
summary: Serial Travis CI benchmark with FiPy, periodic domain
framework:
- url: https://www.ctcms.nist.gov/fipy/
  name: FiPy
  download: https://github.com/usnistgov/fipy
  version: 3.1.2
- url: https://github.com/usnistgov/steppyngstounes
  name: steppyngstounes
  download: https://github.com/usnistgov/steppyngstounes
  version: '0.0'

Data queries

How can we currently query the data

  • Plot the dendrite tip position for all results for a particular code
  • Show results only from a particular author
  • Show results that use >N nodes
  • Show results that use a GPU

Better ways to query the data

  • Show dendrite curves for all finite difference methods
  • Show the transient free energy curve for all results with nominal O(h⁴) accuracy
  • Show the resource usage per nominal DOF
  • Characterize Ostwald ripening simulations by a length scale associated with the microstructure
  • Color data points in an efficiency plot based on numerical method or meshing strategy

Improve schema

What else should we collect?

  • Descriptions of discretization methods (FD, FV, FE, Spectral, ...)
  • Nominal order of accuracy, nominal DOF, meshing strategy
  • Description of linear solvers, preconditioners, non-linear strategy
  • Time stepping strategy (implicit v explicit)
  • Field variables at various times for statistical post-processing
  • Links to input files (rather than just the implementation repository)
  • Container (Docker build, Singularity build, Nix build)
  • What about the actual problem being solved?

 

schema Discussion

Could we spend some time right now collecting ideas?

Think about these three questions.

  • How can we improve the PFHub phase field schema?
  • What data and metadata should PFHub require?
  • How would you imagine querying the data? What questions would you ask?
  • What publication could you generate given better data / metadata?

Collect some ideas here: https://github.com/usnistgov/pfhub/discussions/1514

 

Guyer rant: let's ask the question about how to use the data rather than waste time redesigning schemas

 

 

 

 

Made with Slides.com