PFHub UpDates and Ideas
Daniel Wheeler
Phase Field Workshop, 2023-08-16
Long Term Vision for pfhub
- Central registry of phase field curated results with CLI/Web/API tool to view and query results in many way.
- Website is the registry and examples of using the tool with views + phase field guide materials
Please add ideas and open discussions on usnistgov/pfhub
UPdates
- Update environments
- Nix environments updated
- Native Python (Pip, Conda, Mamba) environment implemented / tested
- Implementing CLI tool for PFHub
- Zenodo submission process
- Papermill / Jupyter website build
- BM1, BM2, BM3, BM4, BM7, BM8
- Generated new schema using linkml (Trevor)
Fair Improvements
- New schema in human readable form using LinkML
- Seamless conversion between schema.org, json-schema, jsonld, yaml
- MaRDA working group for more general phase field schema (tomorrow)
- Require implementation to be in publicly accessible archive
- Encourage use of FAIR4RS principles (metadata.json in repo)
- Require curation of result data on Zenodo (or similar)
- Improve data accessibility using Jupyter Notebooks and Python utility (in place of JS stack and custom apps)
PFHUB CLI TOOL
- Makes the submission process more coherent
- CLI tool can be used by a user on the local filesystem for submissions
- View / compare results on local FS as they appear on website
- Use same CLI tool for automated submissions and continuous integration
- Implement Zenodo / PFHub submission to be a seamless process
- CLI tool can be used by a user on the local filesystem for submissions
- Not quite finished for this meeting
- First version on PyPI soon
- Eventually the CLI will be subsumed by an upload notebook hosted locally or via cloud service
What Next?
- What next?
- Split repository into python-pfhub and web
- Finish new upload process with upload notebook built using CLI
- Use Jupyter Book to build website (or equivalent)
- Update BM5, BM6 and include BM9
- Small things
- DOIs for benchmark notebooks with appropriate authors #1515
- Aspirational goals
- Cloud-hosted submission notebook
- Increase data capabilities, metrics and display
- Field data
- Expand beyond Zenodo
LOCAL FS
pfhub CLI
USER
submission process
notebooks
PFHUB.YAml
csv, VTK, ...
Github REview
pfhub CLI
reviewer
surge
ACTions
website
HOSTED Submission Notebook
PFHUB CLI
$ pfhub --help
Usage: pfhub [OPTIONS] COMMAND [ARGS]...
Submit results to PFHub and manipulate PFHub data
Options:
--help Show this message and exit.
Commands:
convert Convert between formats (old PFHub schema to new...
convert-to-old Convert between formats (new PFHub schema to old...
download Download a PFHub record
download-zenodo Download a Zenodo record
generate-notebook Generate the comparison notebook for the...
generate-yaml Infer a PFHub YAML file from GitHub ID, ORCID,...
submit Submit to Zenodo and open PFHub PR
submit-from-zenodo Submit an existing Zenodo record to PFHub
test Run the PFHub tests
upload Upload PFHub data to Zenodo
validate Validate a YAML file with the new PFHub schema
validate-old Validate a YAML file with the old PFHub schema
See the documentation at
https://github.com/usnistgov/pfhub/blob/master/CLI.md (under construction)
- What data to we currently collect?
- Provenance
- Benchmark ID
- Implementation repository
- Post-processed outputs
- Limited metadata
- run time
- memory usage
- simulation time
- Limited hardware data
- Limited software data
- Dataframe style data / time series
- time vs free energy
Data collection
OLD schema
---
_id: 93113e00-0c5e-11e8-b653-4f1ed6519c85
benchmark:
id: 3a
version: '1'
data:
- name: run_time
values:
- sim_time: '1500'
wall_time: '266576'
- name: memory_usage
values:
- unit: KB
value: '2000000'
- name: efficiency
transform:
- as: x
expr: "1. / datum.time_ratio"
type: formula
- as: y
expr: datum.memory
type: formula
values:
- memory: 2000000.0
time_ratio: 0.005626
- description: Free energy versus time
format:
parse:
free_energy: number
time: number
type: csv
name: free_energy
transform:
- as: x
expr: datum.time
type: formula
- as: y
expr: datum.free_energy
type: formula
type: line
url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/2b802a25593501b30cb0d8648a3b588dc54b36f7/time.csv
- description: Solid fraction versus time
format:
parse:
solid_fraction: number
time: number
type: csv
name: solid_fraction
transform:
- as: x
expr: datum.time
type: formula
- as: y
expr: datum.solid_fraction
type: formula
type: line
url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/2b802a25593501b30cb0d8648a3b588dc54b36f7/time.csv
- description: Tip position versus time
format:
parse:
time: number
tip_position: number
type: csv
name: tip_position
transform:
- as: x
expr: datum.time
type: formula
- as: y
expr: datum.tip_position
type: formula
type: line
url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/2b802a25593501b30cb0d8648a3b588dc54b36f7/time.csv
- description: Zero contour at t=1500s
format:
parse:
x: number
y: number
type: csv
name: phase_field_1500
type: line
url: https://gist.githubusercontent.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb/raw/d0dcd61541604127a16c017891dcda1577c92997/contour.csv
date: 1518046097
layout: post
message: ' '
metadata:
author:
email: daniel.wheeler2@gmail.com
first: Daniel
github_id: wd15
last: Wheeler
hardware:
acc_architecture: none
clock_rate: '3.2'
cores: '1'
cpu_architecture: x86_64
nodes: '1'
parallel_model: serial
implementation:
container_url: ''
name: fipy
repo:
url: https://gist.github.com/wd15/7e06a3141a6fbf317b1daf39ef1b0fbb
version: fc9134b08a9c
summary: FiPy implementation of benchmark 3a on a 960x960 grid. The shape of the
dendrite doesn't look exactly like the version in the notebook.
timestamp: 2 February, 2018
New schema
id: fipy_1a_tkphd_pysparse
benchmark_problem: 1a.0
contributors:
- id: https://orcid.org/0000-0002-2920-8302
name: Trevor Keller
affiliation:
- NIST
email: trevor.keller@nist.gov
- id: https://orcid.org/0000-0002-2653-7418
name: Daniel Wheeler
affiliation:
- NIST
email: daniel.wheeler@nist.gov
date_created: '2017-01-10'
implementation:
url: https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/tree/master/periodic
results:
fictive_time: 53333.3
hardware:
architecture: cpu
cores: 1
nodes: 1
memory_in_kb: 28600
time_in_s: 157187
dataset_temporal:
- name: free_energy.csv
columns:
- time
- free_energy
schema:
url: https://github.com/usnistgov/pfhub-schema/tree/e0010d9/project
summary: Serial Travis CI benchmark with FiPy, periodic domain
framework:
- url: https://www.ctcms.nist.gov/fipy/
name: FiPy
download: https://github.com/usnistgov/fipy
version: 3.1.2
- url: https://github.com/usnistgov/steppyngstounes
name: steppyngstounes
download: https://github.com/usnistgov/steppyngstounes
version: '0.0'
Data queries
How can we currently query the data
- Plot the dendrite tip position for all results for a particular code
- Show results only from a particular author
- Show results that use >N nodes
- Show results that use a GPU
Better ways to query the data
- Show dendrite curves for all finite difference methods
- Show the transient free energy curve for all results with nominal O(h⁴) accuracy
- Show the resource usage per nominal DOF
- Characterize Ostwald ripening simulations by a length scale associated with the microstructure
- Color data points in an efficiency plot based on numerical method or meshing strategy
Improve schema
What else should we collect?
- Descriptions of discretization methods (FD, FV, FE, Spectral, ...)
- Nominal order of accuracy, nominal DOF, meshing strategy
- Description of linear solvers, preconditioners, non-linear strategy
- Time stepping strategy (implicit v explicit)
- Field variables at various times for statistical post-processing
- Links to input files (rather than just the implementation repository)
- Container (Docker build, Singularity build, Nix build)
- What about the actual problem being solved?
schema Discussion
Could we spend some time right now collecting ideas?
Think about these three questions.
- How can we improve the PFHub phase field schema?
- What data and metadata should PFHub require?
- How would you imagine querying the data? What questions would you ask?
- What publication could you generate given better data / metadata?
Collect some ideas here: https://github.com/usnistgov/pfhub/discussions/1514
Guyer rant: let's ask the question about how to use the data rather than waste time redesigning schemas
pfhub-workshop-aug-2023
By Daniel Wheeler
pfhub-workshop-aug-2023
- 238