How (Not) to
Measure Quality

 

in Software Development

@MichaKutz

@mkutz@mstdn.social

Why Measure Quality?

Why do we need to talk about this?

add new feature

improve quality

Goal:

Make better informed decisions about quality

What Could Possibly Go Wrong?

Goodhart's Law

When a measure becomes a target, it ceases to be a good measure

Negative Impact on Motivation/Collaboration

How to Find Metrics?

Goal → Question → Metric

Are we going in the right direction?

Where are we?

Which Quality?

How defective?

How well liked?

How functional?

Number of Found Bugs

in Staging vs Production

How defective?

How effective is the test process?

Broken Service Level Objectives

How defective?

How defective?

How well liked?

How functional?

Number of customer service complaints/contacts

How well liked?

How many people complaining?

User Surveys or Platform Ratings

How well liked?

How defective?

How well liked?

How functional?

User Experience Tests

How well does it work?

User Tracking

How well does it work?

How defective?

How well liked?

How functional?

How defective?

How functional?

How well liked?

How well protected?

How maintainable?

How confident is the team?

Code Coverage

@Test
void strike() {
  var game = new Game();
  var firstRollPins = 10;
  var secondRollPins = 5;
  var thirdRollPins = 3;
  game.roll(firstRollPins);
  game.roll(secondRollPins);
  game.roll(thirdRollPins);
  
  var score = game.score();

  assertThat(score)
    .isEqualTo(
      firstRollPins +
      (secondRollPins + thirdRollPins) * 2);
}
public int score() {
  int score = firstRoll + secondRoll;
  if (previous != null) {
    if (previous.strike) {
      score *= 2;
    } else if (previous.spare) {
      score += firstRoll;
    }
  }
  return score;
}
@Test
void strike() {
  var game = new Game();
  var firstRollPins = 10;
  var secondRollPins = 5;
  var thirdRollPins = 3;
  game.roll(firstRollPins);
  game.roll(secondRollPins);
  game.roll(thirdRollPins);
  
  var score = game.score();

  // assertThat(score)
  //  .isEqualTo(
  //    firstRollPins +
  //    (secondRollPins + thirdRollPins) * 2);
}

How well protected?

How much code is not executed by tests?

Mutation Testing: Surviving Mutations

@Test
void strike() {
  var game = new Game();
  var firstRollPins = 10;
  var secondRollPins = 5;
  var thirdRollPins = 3;
  game.roll(firstRollPins);
  game.roll(secondRollPins);
  game.roll(thirdRollPins);
  
  var score = game.score();

  assertThat(score
    .isEqualTo(
      firstRollPins +
      (secondRollPins + thirdRollPins) * 2);
}
public int score() {
  int score = firstRoll + secondRoll;
  if (previous != null) {
    if (previous.strike) {
      score *= 2;
    } else if (previous.spare) {
      score += firstRoll;
    }
  }
  return score;
}
AssertionFailedError:
expected: 26
 but was: 14

How well protected?

How well protected?

How maintainable?

How confident is the team?

Team Surveys

How confident is the team?

How effective can you work with the code?

How confident are you to deploy to production?

What would you need to improve the above answers?

How well protected?

How maintainable?

How confident is the team?

Static Code Analysis: Complexity

public int score() {
  return firstRoll + secondRoll;
}
public int score() {
  int score = firstRoll + secondRoll;
  if (previous != null && previous.spare) {
    score += firstRoll;
  }
  return score;
}
public int score() {
  int score = firstRoll + secondRoll;
  if (previous != null) {
    if (previous.strike) {
      score *= 2;
    } else if (previous.spare) {
      score += firstRoll;
    }
  }
  return score;
}
previous
  .spare
score *= 2;
previous
  .strike
score += firstRoll;
previous != null
  && previous.spare
int score = firstRoll
  + secondRoll;
return score;

How maintainable?

Static Code Analysis: Code Smells

# Code Smells

- long methods,
- huge classes,
- many parameters,
- code duplicates,
- methods with complexity > 7,
- …

How maintainable?

How well protected?

How maintainable?

How confident is the team?

How maintainable?

How confident is the team?

How well protected?

How productive?

How safe?

How fast?

Velocity

v = \frac{P_{estimate}}{t_{delivery} - t_{commit1}}

How fast?

How good are our estimations?

Delivery Lead Time

\Delta t_{delivery} = t_{delivery} - t_{commit1}

How fast?

How productive?

How safe?

How fast?

Batch Size

How productive?

How productive?

How safe?

How fast?

Change Fail Rate

08:07:23 Deploy basket-service ✓
09:06:11 Migrate checkout-service DB ✓
09:56:54 Deploy checkout-service ✓









08:07:23 Deploy basket-service ✓
09:06:11 Migrate checkout-service DB ✓
09:56:54 Deploy checkout-service ✓
10:19:44 Deploy order-managment-service ✗
10:39:27 Rollback order-managemet-service ✓







08:07:23 Deploy basket-service ✓
09:06:11 Migrate checkout-service DB ✓
09:56:54 Deploy checkout-service ✓
10:19:44 Deploy order-managment-service ✗
10:39:27 Rollback order-managemet-service ✓
11:09:59 Update database cluster ✓
12:27:32 Migrate order-managemet-service DB ✓
13:19:22 Deploy order-managemet-service ✓




08:07:23 Deploy basket-service ✓
09:06:11 Migrate checkout-service DB ✓
09:56:54 Deploy checkout-service ✓
10:19:44 Deploy order-managment-service ✗
10:39:27 Rollback order-managemet-service ✓
11:09:59 Update database cluster ✓
12:27:32 Migrate order-managemet-service DB ✓
13:19:22 Deploy order-managemet-service ✓
14:45:55 Update database-cluster ✗
15:50:49 Update database-cluster ✓


08:07:23 Deploy basket-service ✓
09:06:11 Migrate checkout-service DB ✓
09:56:54 Deploy checkout-service ✓
10:19:44 Deploy order-managment-service ✗
10:39:27 Rollback order-managemet-service ✓
11:09:59 Update database cluster ✓
12:27:32 Migrate order-managemet-service DB ✓
13:19:22 Deploy order-managemet-service ✓
14:45:55 Update database-cluster ✗
15:50:49 Update database-cluster ✓
16:39:11 Deploy product-service ✓
17:44:56 Deploy customer-data-service ✓

How safe?

Mean Time to Restore Service

08:07:23 Deploy basket-service ✓
09:06:11 Migrate checkout-service DB ✓
09:56:54 Deploy checkout-service ✓
10:19:44 Deploy order-managment-service ✗
10:39:27 Rollback order-managemet-service ✓
11:09:59 Update database cluster ✓
12:27:32 Migrate order-managemet-service DB ✓
13:19:22 Deploy order-managemet-service ✓
14:45:55 Update database-cluster ✗
15:50:49 Update database-cluster ✓
16:39:11 Deploy product-service ✓
17:44:56 Deploy customer-data-service ✓

How safe?

Delivery Lead Time

Batch Size

Change Fail Rate

Mean Time to Restore Service

How productive?

How safe?

How fast?

How productive?

How safe?

How fast?

How defective?

How well liked?

How functional?

How well protected?

How maintainable?

How confident is the team?

How productive?

How safe?

How fast?

@MichaKutz

@mkutz@mstdn.social

Not everything that counts can be counted,
and not everything that can be counted counts.

Copy of How (Not) to Measure Quality

By tunecom

Copy of How (Not) to Measure Quality

Measuring quality is hard, defining what quality is is difficult, being aware of why you measure is fundamentally important. Learn how to choose and combine metrics to create something valuable.

  • 346