Intermittent Failures

Whistler, June 24 2015

Jonathan Griffin

Manager, Automation and Tools (A-Team)

  • Is this bug important?  Should I spend the time to fix it?
  • Is my patch safe to land?  Did I break anything?

How big is our intermittent failure problem?

Intermittent failures by suite

Let's fix our orange problem!

Limitations

  • infrastructure load
  • engineering commitment
  • excellent test isolation

"> 8000 tests"

Limitations

  • need fine-grained historical test data

"100 million tests"

What these approaches have in common:

 

they reduce the noise in the system

1. Turn off bug comments for intermittent failures

2. Treeherder will start tracking the rate of individual intermittent failures

3. Auto retriggers on try

4. Auto-starring of known failures

5. Experiment with bots to
vet new tests and changes to tests

End Result:

  • Much higher signal-to-noise

  • Notifications should be actionable

  • Much less drag on developer efficiency

Intermittent Orange Discussion

Friday, 10:00 - 10:30, Girabaldi "A"

https://wiki.mozilla.org/Dev-sanity

Intermittent test failures

By jgriffin

Intermittent test failures

  • 664