a WoW player, a millenial that grew up playing pokemon on her nintendo and ragnarok during countless nights
that rock'n'roll teeenager that listens to everything especially pop & reggaeton nowadays but is never sick of pink floyd and led zeppelin
passionate about coffee & beer
all things dev ops: ci/cd, cloud infra, observability, etc
web & distributed systems & system architecture
Why do we develop software?
Software needs to meet someone's expectations.
And what happens when that expectation isn't met?
Not meeting expectations generates frustration - and in business, that means money lost.
So normally, when there is frustration, we enter the problem-solving mode.
... and we starting putting out fires
We often think of observability way too late - when a problem is already happening, and it is already hard to figure out what is going on
an open invitation to leave fight or flight mode and think about this beforehand
- Choosing an APM/Logging tool and setting up your application
- A simple problem
- Good logging practices
- A slightly more complex performance issue
- Finding the root cause
Choosing an APM/Logging tool and setting up your application
- Elastic (Elastic Search, Elastic APM)
- New Relic
and you might want to consider...
I chose New Relic for this example
the most boring example app a millenial can come up with: a todo-list
A simple problem
We can add logs!
Good Logging Practices
personal, opinionated, but also very backed up with researches and open to discussion and changes as I see new things
My definition of good logging
plain text messages without variables
write your logs for a computer to process, not a for a human to read, but make it so a human can read and understand
use extra fields
put as much info as you can - don't abuse in the sense of sending a huge payload, but those fields are hidden by default in the view and they are extremely useful
Request Identifiers should be present in all log messages in order to be able to trace what happened in a particular request
customer identifiers should exists in all requests in order to measure the experience of a particular customer
keep track of the ids
everytime you save an object and you write a log line about it, keep add the ID as an extra field - this allows easy inspection
be intentional about log levels
you should know what kind of information you want to see while developing and what kind of information is only useful for production.
my rule of thumb is that ERROR is something to pay attention, INFO is extra info in production environment and DEBUG is made for my local dev.
So how can we apply this to our problem?
A slightly more complex performance issue
Finding the root cause
I once had this problem when I had to process 8MM messages a day and write to a firestore database
In this particular problem it tooks us +2 months to solve this easily because we just didn't have that amount of data to test in dev and it wasn't the highest priority issue to be worthy to spend time on it, but if the data was there from the beggining and the issue was clear, we'd be able to see it was easy to solve and prioritize
Always develop your MVP, POCs with good logging practices and APM monitoring (even if turned on by a flag)
They often grow too fast and observability becomes a issue faster than we are are able to foresee
Thank you! <3
follow me on twitter for random things in english, portuguese and sometimes spanish: https://twitter.com/__biancarosa
(i answer DMs slow, but I always try!)
By Bianca Rosa