(Distributed)
observability (in .NET)
🕵

Vedran Mandić, MCSD, MCSA, MCT
@vekzdran / @vmandic

#ATD16, Zagreb 2021-12-08

Thank you ATD

Vedran Mandić freelances, talks (less) and lectures (less) C# / .NET and JavaScript, and this year started golang
last talk was... 2 years ago!
10 years as dev
I do enterprise mostly
@vekzdran / @vmandic

...and how to set this up in .NET / C#!

a user activity (story), an operation of biz value
- person created, invoice deleted, objects listed, user signed in, access right assigned to user...
how long did the whole "process" take?
- how long did each "activity" take?
what is the contextual data related to this activity?
- account ID, user ID, country, role, device...
easy if its 1 app (if you log it)
hard(er) if >+1 apps
- the more the merrier right :-)? how to correlate?

error or exception? :-) (let's just call it problem)
OK to trace if we have 1 app
clear to analyze ("immediately"), hard if not logged
do we know how to classify (levels, categories!)?
did we log all the (relevant) contextual data!? 😧
harder to analyze if more connected apps exist
- it's just cumbersome and takes time
- the action (activity) is distributed (sync / async)
  - where (what) is the root error? in which log?

(what happens when services evolve...)

this is super easy, you have a single (non-replica) app... lucky you!

now you gotta know which replica caused the issue...

your system is now distributed across three apps and 2 DBs

I did not even put a message broker / queue here 😅

boss said "make it more scalable" etc

is composed of three essential parts...

Centralized: allowing the collection/storage of all system logs in a central location.
Structured: allows you to add searchable metadata to logs.
Searchable: allows searching by multiple criteria (app version, date, category, level, text, metadata, etc.)
Configurable: allows changing verbosity without code changes (based on log level and/or scope).
Integrated: integrated into tracing, facilitating analysis of traces and logs in the same tool.

standardize
- agree across apps and teams what log levels are used, categories...
use a proven logging library / system
- log4net, Serilog, NLog, Elmah, Datadog, Logdna, Raygun...
maintain / reserve time
- do not just "leave it out there"
grow team / company "logging culture"
- set up a real-time monitor in dev room / slack channel...
log in bulks in async (ergo libraries...)
- to any source!
- don't log to console when in prod, cuts your perf...

Console log slowing you down? How much?

Rick Strahl "WestWind" (.NET Core 2.2):

That's 40x slower with Console [INFO] VS [WARN] on!

.NET6?

learn OpenTelemetry.io set of tools and "standards"
- use the W3C standard (2020.)
use an activity SDK (.NET has one) supporting W3C
- define (W3C's) SpanId, TraceId and ParentId
log baggage (additional properties) ie. form TraceContext
be consistent (apply activity tracking in all your services)
make it centralized, searchable, alterable
use a CFN dedicated service to store and visualize
- AppInsights, Zipkin, Jaeger, Prometheus, Splunk...