@MichaKutz
@MichaKutz
@MichaKutz
The monolith could be run on a local system…
…adding some microservices still worked…
…relying on remote instances worked for a while…
…but ultimately running the whole system locally failed
@MichaKutz
While testing the system as a whole locally became impossible…
…testing a single service isolated became a very important option…
…delivering much less ambiguous test results due to the limited scope
…allowing to run service tests prior to deployment to a test system…
@MichaKutz
Using feature branches to isolate unfinished code…
…eventually merging back to master and only then deploying it anywhere…
…is not a good option with multiple microservices involved…
…in separate repositories…
…owned by different teams…
…under different workloads…
Feature toggles helped us to isolate unfinished features and allow real continuous integration
…since merges would need to be synced!
@MichaKutz
Bugs found by testers →
← Bugs reported by customers
Bug assignment with a small number of teams worked okay…
…but with more teams and more possible root causes (microservices) the bugs started piling up.
@MichaKutz
Bugs found by testers →
← Bugs reported by customers
As a consequence we created a team of service managers to route bugs…
…improving initial assignment and kept an eye on bugs & incidents.
Bug assignment with a small number of teams worked okay…
…but with more teams and more possible root causes (microservices) the bugs started piling up.
@MichaKutz
@MichaKutz
Running for
>3 h !!!
The comprehensive system test suite, originally created for the monolith…
…teams kept adding more "special interest" tests to it.
Eventually we ignored several tests, asking the teams to adopt them…
…reducing the suite to the minimum common suite covering the money paths.
We also extracted a test framework…
…to make service specific tests easier to write.
@MichaKutz
Sign off-testing monolith releases
certainly wasn't the best part or the job…
…but it actually caught a lot of bugs!
Testing each microservice before deployment would be hard…
…and would ruin a lot of the benefits.
@MichaKutz
While service level tests caught a lot of bugs in the build phase…
…integration issues became a more prominent problem.
To compensate the release triggered sign-off tests, we added continuous exploratory testing…
…guided by our recently finished user stories
and continuous risk assessments.
@MichaKutz
No releases = no release notes,
no release schedule!
We create a changelog chat rooms to compensate…
…filled automatically by the deployment system…
…and also test results of the common system test suite…
…but also open to human comments!
@MichaKutz
@MichaKutz
While unit, service, system and exploratory tests covered a lot of risk…
…production is still a messy place
…so teams started to monitor services in production.
@MichaKutz
Traffic
Errors
Latency
Saturation
+ business metrics
In order to improve maintainability as well, we are implementing a You Build It, You Run It policy…
So the teams do not only monitor their services during office hours…
…but also after 5 pm…
…being truly responsible for that deployment on Friday
@MichaKutz
Service specific monitoring is good, but we still want somebody to have an eye on the system as a whole…
…who is able to detect and manage global incidents.
Knowing which of the 150 services might cause the incident…
…which other services might be affected…
…which of the 25 teams might be able to help…
…and manage information between these teams, the stakeholders, customer service etc.
@MichaKutz
@MichaKutz