Observability-Driven Development

bianca rosa

a WoW player, a millenial that grew up playing pokemon on her nintendo and ragnarok during countless nights

that rock'n'roll teeenager that listens to everything especially pop & reggaeton nowadays but is never sick of pink floyd and led zeppelin

passionate about coffee & beer

python

go

all things dev ops: ci/cd, cloud infra, observability, etc

web & distributed systems & system architecture

Why do we develop software?

Software needs to meet someone's expectations.

And what happens when that expectation isn't met?

Not meeting expectations generates frustration - and in business, that means money lost.

So normally, when there is frustration, we enter the problem-solving mode.

... and we starting putting out fires

We often think of observability way too late - when a problem is already happening, and it is already hard to figure out what is going on

an open invitation to leave fight or flight mode and think about this beforehand

Agenda

- Choosing an APM/Logging tool and setting up your application

 

- A simple problem

 

- Good logging practices

 

Text

- A slightly more complex performance issue

 

- Finding the root cause

Choosing an APM/Logging tool and setting up your application

- Elastic (Elastic Search, Elastic APM)

- New Relic

- Datadog

- HoneyComb

- Lightstep

- Splunk

and you might want to consider...

I chose New Relic for this example

the most boring example app a millenial can come up with: a todo-list

A simple problem

We can add logs!

Good Logging Practices

personal, opinionated, but also very backed up with researches and open to discussion and changes as I see new things

My definition of good logging

plain text messages without variables

 

 

write your logs for a computer to process, not a for a human to read, but make it so a human can read and understand

use extra fields

put as much info as you can - don't abuse in the sense of sending a huge payload, but those fields are hidden by default in the view and they are extremely useful

traceability

Request Identifiers should be present in all log messages in order to be able to trace what happened in a particular request

business traceability

customer identifiers should exists in all requests in order to measure the experience of a particular customer

keep track of the ids

everytime you save an object and you write a log line about it, keep add the ID as an extra field - this allows easy inspection

be intentional about log levels

you should know what kind of information you want to see while developing and what kind of information is only useful for production.

 

my rule of thumb is that ERROR is something to pay attention, INFO is extra info in production environment and DEBUG is made for my local dev.

So how can we apply this to our problem?

A slightly more complex performance issue

Finding the root cause

I once had this problem when I had to process 8MM messages a day and write to a firestore database

In this particular problem it tooks us +2 months to solve this easily because we just didn't have that amount of data to test in dev and it wasn't the highest priority issue to be worthy to spend time on it, but if the data was there from the beggining and the issue was clear, we'd be able to see it was easy to solve and prioritize

Always develop your MVP, POCs with good logging practices and APM monitoring (even if turned on by a flag)

They often grow too fast and observability becomes a issue faster than we are are able to foresee

Thank you! <3

follow me on twitter for random things in english, portuguese and sometimes spanish: https://twitter.com/__biancarosa

(i answer DMs slow, but I always try!)