Off of the
shaky Grounds
the path towards stabilizing Substrate
benjamin kampmann
gnunicorn.org // ben@parity.io
Jun 24th, 2020
@Parity & Friends Meetup, Etherlands
Overview
- Challenges
- Metering, Measuring & Profiling
- Freezing, cargo unleash
- Testing & continuous releasing
CHallenges
HARD PERFORMANCE
- 🕳️ leaking memory
- 🥵 CPU spikes
- 👋 dropping peers
- 😱 stalling consensus
- 💸 bad weights
PROCESS
-
massive mono-repo
with plenty of path = ../ and circle-dev-dependencies, how even? - polkadot and substrate being out of sync
-
merge-order breaking master-branch
- unittest cover too little – all tests pass, yet the node is failing
- big refactors lead to instability for months
FIXING IT
🔧
Leaks, Spikes & Stalls
measure, Measure, Measure
MOAR TELEMETRY
Better logging
in multi-color and with emojis!
METER EVERYTHING!!!
Adding plenty of more Prometheus gauges
- resource metrics CPU, memory, load, file handles, etc
- networking Up-/downstream, etc
- Peer-2-Peer connected nodes, etc
- Sync + Import Queue blocks incoming, import times, etc
- GRANDPA messages, etc
- Database I/O, cache, state, etc
- Tokio tasks, channels, sizes, etc
- Internals caches, hash maps, etc.
> 500 metric Params
INVESTIGATE!
Leaks, Spikes & Stalls
MEMORY PROFILING !!!
How to release, even?
Upgrading & Automatizing the Processes
Freeze on master
- No more big refactors
- hard 2-reviewers-rule
- Polkadot-Companion-PR enforcement
- focus on stabilizing, bug fixes, docs
- breaks only on release-critical fixes and pre-defined features
- cleaning up
Too many moving Parts
Cleaning up
- started already last year: splitting dependencies, cleaning up tree
- reducing circular-dependencies
- updated licenses now Apache2 & GPL3+Classspath-Exc
- Changelog generation
- Cleaning up cargo.toml manifests
- benchmarking pallets for proper Weights
- Figure out release-strategy
- huge mono repo = huge dependency trees order of crates still matter
- each crate must be checked before release with the to-be-released-dependencies
- Checking these is already a 80min process
- CI should release
- Squash-merges mean your git tag is lost after merge
- crates.io has a rate limit ...
Releasing massive Mono Repos
Cargo unleash Em 🐉
github.com/GNUnicorn/cargo-unleash
- cargo subcommand to help manage and release massive (Rust) mono repos
- checks all crates for crates.io requirements
- packages, builds and publishes
- helps manage versions, keeps the tree up to date
- match packages on regexp
$ cargo unleash em-dragons
Alpha -> RC
- We are now able to release by just tagging any commit
- we've left the rather unstable alpha phase and are heading towards a final release
- still waiting to fix remaining bugs that come up during Polkadot launch
The Future?
Testing and Continuous Releases
MONO-Release Repo
- all crates share the same version we have to bump even if they didn't change
- keeping versions clean and compatible is complex especially for the outside
- we'd have to major-bump a lot and that might have to trickle down
- doing that well is hard – easy to break unintentionally
- but cargo-semver pre-releases do not work as expected: cargo updates on the same pre-names though semver says they should not!
- Ensuring things are fine is hard already and we constantly break stuff
Instead: Continuous REleasing
1. Bump every PR
2. RELEASE Every Commit of master
as SEMVER requires – automatically
– automatically*
3. Profit
But Wait – what about...
'em version numbers
stability !!!
'Em Version Numbers
- cargo don't care: it will use the latest minor and patch anyways
- we break things a lot in particular internal APIs
- and our non-framework-approach makes that public but for the majority, this doesn't matter
- You only care when your specific dependencies break
- But what if you had only one dependency to care about?
in a world of github, what even is a 'release'?
substrate client meta crate
– think tokio 0.2 but substrate –
# Cargo.toml [package] name = "suprchain" version = "2.0.0" authors = ["Benjamin Kampmann <ben@parity.io>"] edition = "2018" [dependencies]
suprchain-runtime = "0.2" substrate-client = "0.2"
// main.rs use suprchain_runtime::Runtime; use substrate_node; fn main() { substrate_node::Runner<Runtime>() .main(); }
# Cargo.toml [package] name = "suprchain" version = "2.0.0" authors = ["Benjamin Kampmann <ben@parity.io>"] edition = "2018" [dependencies]
suprchain-runtime = "0.2" substrate-client = { version = "0.2", features = ["unstable-async-offchain"] }
// main.rs use suprchain_runtime::Runtime; use substrate_node; fn main() { substrate_node::Runner<Runtime>() .with_async_offchain(|cfg| { cfg.max_timeout = 360; }) .main(); }
stability
- we don't have any classic build-QA-release-cycle @parity
- We released whenever we felt like it
main reason it didn't happen more frequently is that's quite some work - But if it is automatic, it can be done more frequently doesn't really change much on stability
releases indicate stability!
Moar testing – Upcoming
QA happens before the PR is merged
- Sticking with the 2-reviewers minimum
- + CODE_OWNERS for special areas
- Automatic deploying of some PRs to validator nodes and running the changes for a while
- Benching of relevant PRs
- Downstream testing of "nightly" build
- New testing environment* see right
- Increasing test coverage
- Integration testing for Runtime updates of live chains
#[test]
fn transfer_smoketest() {
// reuse provided test against local runtime
pallet_balances::tests::transfer_smoke(Runtime.into());
}
// testing a specific feature ourselves
#[test]
fn transfer_should_trigger_event() {
// given
let mut test = test::deterministic(Runtime.into());
// when
test.read_state(|| {
<Runtime as CreateTransaction>::create_transaction(
balance_call,
signer,
account,
nonce,
)
});
// controlled run
test.produce_blocks(1_u32);
// then
test.with_state(|| {
let events = frame_system::Module::<Runtime>::events();
assert_eq!(events.len(), 1);
let events = frame_system::Module::<Runtime>::events();
assert_eq!(events.len(), 1);
});
}
thanks!
questions?
benjamin kampmann
gnunicorn.org // ben@parity.io
Jun 24th, 2020
@Parity & Friends Meetup, Etherlands
In order of appearance:
- memory profiling – substrate.dev/docs/en/knowledgebase/integrate/memory-profiling
- benchmarking pallets – www.shawntabrizi.com/substrate-graph-benchmarks/docs/#/
- cargo unleash – github.com/gnunicorn/cargo-unleash
- polkadot launch process – polkadot.network/explaining-the-polkadot-launch-process/
- substrate client meta crate – github.com/paritytech/substrate/issues/4453
- github code owners – help.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners
- substrate test runner – github.com/paritytech/substrate-test-runner
Off of the shaky grounds
By Benjamin Kampmann
Off of the shaky grounds
- 1,356