Off of the
shaky Grounds

the path towards stabilizing Substrate

benjamin kampmann

gnunicorn.org // ben@parity.io

Jun 24th, 2020

@Parity & Friends Meetup, Etherlands

Overview

Challenges
Metering, Measuring & Profiling
Freezing, cargo unleash
Testing & continuous releasing

CHallenges

HARD PERFORMANCE

🕳️ leaking memory
🥵 CPU spikes
👋 dropping peers
😱 stalling consensus
💸 bad weights

PROCESS

massive mono-repo
with plenty of path = ../ and circle-dev-dependencies, how even?
polkadot and substrate being out of sync
merge-order breaking master-branch
unittest cover too little – all tests pass, yet the node is failing
big refactors lead to instability for months

FIXING IT

🔧

Leaks, Spikes & Stalls

measure, Measure, Measure

MOAR TELEMETRY

Better logging

in multi-color and with emojis!

METER EVERYTHING!!!

Adding plenty of more Prometheus gauges

resource metrics CPU, memory, load, file handles, etc
networking Up-/downstream, etc
Peer-2-Peer connected nodes, etc
Sync + Import Queue blocks incoming, import times, etc
GRANDPA messages, etc
Database I/O, cache, state, etc
Tokio tasks, channels, sizes, etc
Internals caches, hash maps, etc.

> 500 metric Params

INVESTIGATE!

Leaks, Spikes & Stalls

MEMORY PROFILING !!!

see substrate.dev/docs/en/knowledgebase/integrate/memory-profiling

How to release, even?

Upgrading & Automatizing the Processes

Freeze on master

No more big refactors
hard 2-reviewers-rule
Polkadot-Companion-PR enforcement
focus on stabilizing, bug fixes, docs
breaks only on release-critical fixes and pre-defined features
cleaning up

Too many moving Parts

Cleaning up

started already last year: splitting dependencies, cleaning up tree
reducing circular-dependencies
updated licenses now Apache2 & GPL3+Classspath-Exc
Changelog generation
Cleaning up cargo.toml manifests
benchmarking pallets for proper Weights
Figure out release-strategy

huge mono repo = huge dependency trees order of crates still matter
each crate must be checked before release with the to-be-released-dependencies
Checking these is already a 80min process
CI should release
Squash-merges mean your git tag is lost after merge
crates.io has a rate limit ...

Releasing massive Mono Repos

Cargo unleash Em 🐉

github.com/GNUnicorn/cargo-unleash

cargo subcommand to help manage and release massive (Rust) mono repos
checks all crates for crates.io requirements
packages, builds and publishes
helps manage versions, keeps the tree up to date
match packages on regexp

$ cargo unleash em-dragons

Alpha -> RC

We are now able to release by just tagging any commit
we've left the rather unstable alpha phase and are heading towards a final release
still waiting to fix remaining bugs that come up during Polkadot launch

The Future?

Testing and Continuous Releases

MONO-Release Repo

all crates share the same version we have to bump even if they didn't change
keeping versions clean and compatible is complex especially for the outside
we'd have to major-bump a lot and that might have to trickle down
doing that well is hard – easy to break unintentionally
but cargo-semver pre-releases do not work as expected: cargo updates on the same pre-names though semver says they should not!
Ensuring things are fine is hard already and we constantly break stuff

Instead: Continuous REleasing

1. Bump every PR

2. RELEASE Every Commit of master

as SEMVER requires – automatically

– automatically*

3. Profit

But Wait – what about...

'em version numbers

stability !!!

'Em Version Numbers

cargo don't care: it will use the latest minor and patch anyways
we break things a lot in particular internal APIs
and our non-framework-approach makes that public but for the majority, this doesn't matter
You only care when your specific dependencies break
But what if you had only one dependency to care about?

in a world of github, what even is a 'release'?

substrate client meta crate

– think tokio 0.2 but substrate –

# Cargo.toml
[package]
name = "suprchain"
version = "2.0.0"
authors = ["Benjamin Kampmann <ben@parity.io>"]
edition = "2018"

[dependencies]

suprchain-runtime = "0.2"
substrate-client = "0.2"

// main.rs
use suprchain_runtime::Runtime;
use substrate_node;

fn main() {
    substrate_node::Runner<Runtime>()
       .main();
}

# Cargo.toml
[package]
name = "suprchain"
version = "2.0.0"
authors = ["Benjamin Kampmann <ben@parity.io>"]
edition = "2018"

[dependencies]

suprchain-runtime = "0.2"
substrate-client = { version = "0.2", features = ["unstable-async-offchain"] }

// main.rs
use suprchain_runtime::Runtime;
use substrate_node;

fn main() {
    substrate_node::Runner<Runtime>()
       .with_async_offchain(|cfg| {
          cfg.max_timeout = 360;
       })
       .main();
}

stability

we don't have any classic build-QA-release-cycle @parity
We released whenever we felt like it
main reason it didn't happen more frequently is that's quite some work
But if it is automatic, it can be done more frequently doesn't really change much on stability

releases indicate stability!

Moar testing – Upcoming

QA happens before the PR is merged

Sticking with the 2-reviewers minimum
+ CODE_OWNERS for special areas
Automatic deploying of some PRs to validator nodes and running the changes for a while
Benching of relevant PRs
Downstream testing of "nightly" build
New testing environment* see right
Increasing test coverage
Integration testing for Runtime updates of live chains

#[test]
fn transfer_smoketest() {
  // reuse provided test against local runtime
  pallet_balances::tests::transfer_smoke(Runtime.into());
}


// testing a specific feature ourselves

#[test]
fn transfer_should_trigger_event() {
  // given
  let mut test = test::deterministic(Runtime.into());
  // when
  test.read_state(|| {
    <Runtime as CreateTransaction>::create_transaction(
      balance_call,

      signer,

      account,

      nonce,
      )
  });
  // controlled run
  test.produce_blocks(1_u32);
  // then
  test.with_state(|| {
    let events = frame_system::Module::<Runtime>::events();
    assert_eq!(events.len(), 1);

    let events = frame_system::Module::<Runtime>::events();
    assert_eq!(events.len(), 1);
  });
}

* github.com/paritytech/substrate-test-runner

thanks!

questions?

benjamin kampmann

gnunicorn.org // ben@parity.io

Jun 24th, 2020

@Parity & Friends Meetup, Etherlands

In order of appearance:

memory profiling – substrate.dev/docs/en/knowledgebase/integrate/memory-profiling
benchmarking pallets – www.shawntabrizi.com/substrate-graph-benchmarks/docs/#/
cargo unleash – github.com/gnunicorn/cargo-unleash
polkadot launch process – polkadot.network/explaining-the-polkadot-launch-process/
substrate client meta crate – github.com/paritytech/substrate/issues/4453
github code owners – help.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners
substrate test runner – github.com/paritytech/substrate-test-runner

Off of the shaky Grounds

Overview

CHallenges

HARD PERFORMANCE

PROCESS

FIXING IT

Leaks, Spikes & Stalls

measure, Measure, Measure

MOAR TELEMETRY

Better logging

in multi-color and with emojis!

METER EVERYTHING!!!

> 500 metric Params

INVESTIGATE!

Leaks, Spikes & Stalls

MEMORY PROFILING !!!

How to release, even?

Upgrading & Automatizing the Processes

Freeze on master

Too many moving Parts

Cleaning up

Releasing massive Mono Repos

Cargo unleash Em 🐉

$ cargo unleash em-dragons

Alpha -> RC

The Future?

Testing and Continuous Releases

MONO-Release Repo

Instead: Continuous REleasing

1. Bump every PR

2. RELEASE Every Commit of master

as SEMVER requires – automatically

– automatically*

3. Profit

But Wait – what about...

'em version numbers

stability !!!

'Em Version Numbers

substrate client meta crate

stability

Moar testing – Upcoming

thanks!

questions?

Off of the
shaky Grounds