Teamgetriebene Qualitätssicherung

für Microservices

Entwicklung

Integration

Betrieb

Bugs found by testers →

← Bugs reported by customers

Bug assignment with a small number of teams worked okay…

…but with more teams and more possible root causes (microservices) the bugs started piling up.

Build

Integrate

Running for

>3 h !!!

Build

Integrate

Operate

While unit, service, system and exploratory tests covered a lot of risk…

…production is still a messy place

…so teams started to monitor services in production.

Traffic

Errors

Latency

Saturation

+ business metrics

In order to improve maintainability as well, we are implementing a You Build It, You Run It policy…

So the teams do not only monitor their services during office hours…

…but also after 5 pm…

…being truly responsible for that deployment on Friday

Service specific monitoring is good, but we still want somebody to have an eye on the system as a whole…

…who is able to detect and manage global incidents.

Knowing which of the 150 services might cause the incident…

…which other services might be affected…

…which of the 25 teams might be able to help…

…and manage information between these teams, the stakeholders, customer service etc.

Build

Integrate

Operate