Performance

Performant system

Responsive (fast)

Stable (doesn't explode)

... under a heavy workload

Why?

More and more partners ask for SLAs on response times
(or more transparency)

It improves the quality of our product

We need to scale

We need to be profitable and it reduces costs

How do we measure it?

Performance tests

Performance testing is a non-functional software testing technique that determines how the stability, speed, scalability, and responsiveness of an application holds up under a given workload.

Apps
(or anything else)

Prometheus

https://k6.io

https://monitoring.prod.oina.ws - Load testing dashboard

How to write a performance test?

Engineering / Performance

k6 tests are simple TypeScript scripts

https://gitlab.com/swan-io/commons/performance-tests

export const options: Options = {
  stages: [
    { duration: "10s", target: 3 },
    { duration: "30s", target: 3 },
    { duration: "10s", target: 0 },
  ],
  tags: {
    namespace: env.NAMESPACE,
  },
};

const graphqlQuery = graphql(`
  query ProjectInfoById($id: ID!) {
    projectInfoById(id: $id) {
      __typename
      id
      ...
    }
  }
`);

export default function () {
  const projectId = getRandomProjectId();
  const response = sendGqlRequest(urls.unauthenticated, graphqlQuery, {
    id: projectId,
  });

  check(response, {
    "status is 200 and has no error": (r) =>
      r.status === 200 && r.body.errors === undefined,
  });
}

How to run a performance test?

With Backstage

Engineering productivity squad:

https://backstage.prod.oina.ws/performance-testing

How do we prioritize work?

  • high workload/volume
  • high response times
  • scalability issues/challenges

Focus on use cases that are currently the least performant

Engineering / Performance

https://monitoring.prod.oina.ws/d/performance-metrics

Team

Thibaut Villeneuve

Alexandre Pinon

How to solve performance issues? 

It's case-by-case

There are often low-hanging fruits

  • adding missing SQL indexes
  • adding missing GraphQL Dataloaders
  • avoid fetching info in GraphQL resolvers that's available from the parent
  • avoid fetching useless info
  • parallelizing more stuff

Recent wins

users query

4 Virtual users (= 4 requests in parallel all the time) & 100k memberships: 100% timeout

7 Virtual users & 100k memberships: no error

no graph :( but it's big I swear

Various improvements in A-C

release

Various improvements in A-C

release

Dataloader in paginated queries

release

#engineering-performance

Thank you