Task Configuration at Scale
Andrew Halberstadt
CI Automation
:ahal
What is "Scale"?
- ~15,000 unique tasks
- ~410 pushes / weekday
- ~560 tasks / push (or 230k / weekday)
What is "Unique"?
- Ignore runtime info (timestamps, repo, user, etc)
- Otherwise every difference counts, e.g:
- linux64 opt mochitest chunks 1-5 => 5 unique tasks
- pref set vs unset => 2 unique tasks
There are a lot of similarities between many of those 15k tasks.
WET vs DRY
- Write Everything Twice vs Don't Repeat Yourself
- Aka duplication vs consolidation
- Can apply to configuration as well as code
- Two ends of a scale
- Let's examine both ends at their extremes
Write Everything Twice
- Pros
- Easy to understand
- Can handle new requirements well
- Cons
- Difficult to maintain
- A pain to make sweeping changes
Don't Repeat Yourself
- Pros
- Fewest LOC
- Can easily make sweeping changes
- Cons
- Also hard to maintain
- Modifications are code refactorings
- Hard to handle unforeseen changes
Both extremes are silly, there needs to be a balance.
Not All Configuration is Equal
- Some configuration changes frequently
- call this dynamic configuration
- # of chunks, platforms, suites
- Some configuration rarely changes
- call this static configuration
- caching, scopes, worker related configs
Dynamic configuration should be WET.
Static configuration should be DRY.
Easy, problem solved!
Configuration Groups
- Many ways to group tasks, e.g:
- all tasks => {release, product}
- product => {build, test, lint}
- test => {platform, suite, platform+suite}
- platform+suite => {chunk}
- Many more axes to group tasks across
Each layer has distinct but not disjoint sets of dynamic and static configuration.
Challenge
Design a configuration system that:
- Is easy to understand and maintain
- Is easy to modify
- individual tasks
- all tasks in a specific group (low or high)
- Can handle uncertainty and changing requirements
- easy to extend without regressing existing tasks
- Reduces unnecessary duplication
Our Solution: Taskgraph
- Not to be confused with "taskcluster"
- Confusingly lives under /taskcluster
- /taskcluster/taskgraph => core module
- /taskcluster/ci => initial task configuration files
- Docs: https://firefox-source-docs.mozilla.org/taskcluster/taskcluster/index.html
- Originally designed by Dustin Mitchell
- Shared ownership between many teams
- build, ci automation, releng, taskcluster, +more
Graph Generation
# see all available steps
$ ./mach taskgraph --help
# generate and display the full task graph (labels only)
$ ./mach taskgraph full
# generate and display the target task graph (entire JSON)
$ ./mach taskgraph target -J
# similarly..
$ ./mach taskgraph optimized
$ ./mach taskgraph morphed
Step 1: Load Task Configs
- Get a big list of every task
- Read all the .yml files under /taskcluster/ci
- Concepts
- kinds / kind dependencies
- jobs / jobs-from / job-defaults
- transforms
Step 2: Apply Transforms
- Slowly transform task into final form
- Many "stages" of transformation
- Validation at every step of the way
- End result in a format taskcluster expects
- Concepts
- transform functions
- stages
- schemas
Step 3: There is no Step 3
- Now we have the "full task graph"
- ./mach taskgraph full
- DAG of all tasks (2+ million JSON formatted lines)
- Filter target tasks and optimizations
- Apply morphs
- Submit to taskcluster (the service) via REST api
Recap
- Dynamic vs Static
- Configs in the .yml are generally dynamic
- Configs in transforms are generally static
- Concrete configuration groups (low to high):
- individual task (key in .yml)
- jobs-from (job-defaults)
- kinds (lowest level transform)
- transforms (intermediate stages)
- task.py (highest level transform)
- modification here affects every single task
Success?
- Taskgraph is not perfect
- Difficult to grok at first
- Difficult to figure out where to set a config
- No hard and fast rules
- Requires a lot of intuition to get right
- A lot of inconsistencies between teams
- Extremely flexible + very little gate keeping
- opens door for all sorts of weird applications
- complexity keeps increasing over time
Success!
- Overall taskgraph is a big success
- Allows us to move fast
- Handles all requirement changes we throw at it
- Ability to balance WET vs DRY
- even if implementation is not always perfect
- Can change entire configuration groups with ease
- Self serve + in-tree
- many tasks are developer created
- extremely powerful
Questions?
Taskgraph
By ahal
Taskgraph
- 692