(Distributed)
observability (in .NET)
🕵

Vedran Mandić, MCSD, MCSA, MCT
@vekzdran / @vmandic

Thank you ATD

Hi, I am Vedran. 👋

  • Vedran Mandić freelances, talks (less) and lectures (less) C# / .NET and JavaScript, and this year started golang
  • last talk was... 2 years ago!
  • 10 years as dev
  • I do enterprise mostly
  • @vekzdran / @vmandic

What's this presentation about?

Talking about app actions and errors and how to the importance of tracking them efficiently!

 

...and how to set this up in .NET / C#!

Today's plan?

  1. Tracking actions and errors in code
  2. Observability
  3. Logging how to
  4. Tracing how to
  5. Demo

WHAT? (actions)

  • a user activity (story), an operation of biz value
    • person created, invoice deleted, objects listed, user signed in, access right assigned to user...
  • how long did the whole "process" take?
    • how long did each "activity" take?
  • what is the contextual data related to this activity?
    • account ID, user ID, country, role, device...
  • easy if its 1 app (if you log it)
  • hard(er) if >+1 apps
    • the more the merrier right :-)? how to correlate?

WHAT? (errors)

  • error or exception? :-) (let's just call it problem)
  • OK to trace if we have 1 app
  • clear to analyze ("immediately"), hard if not logged
  • do we know how to classify (levels, categories!)?
  • did we log all the (relevant) contextual data!? 😧
  • harder to analyze if more connected apps exist
    • it's just cumbersome and takes time
    • the action (activity) is distributed (sync / async)
      • where (what) is the root error? in which log?

How does this look?

(what happens when services evolve...)

this is super easy, you have a single (non-replica) app... lucky you!

now you gotta know which replica caused the issue...

your system is now distributed across three apps and 2 DBs

I did not even put a message broker / queue here 😅

boss said "make it more scalable" etc

Observability 🔍

is composed of three essential parts...

...but first read this brilliant piece on MSFT devblog:
https://devblogs.microsoft.com/dotnet/observability-asp-net-core-apps/

  1. Logging

  2. Tracing

  3. Metrics

Observability tries to answer some of the questions like...

  • Are we observing more errors than before?
  • Were there new error types?
  • Did the request duration unexpectedly increase compared to previous versions?
  • Has the throughput (req/sec) decreased?
  • Has the CPU and/or Memory usage increased?
  • Were there changes in our KPIs?
  • Is it selling less than before?
  • Did our visitor count decrease?

Making logging right...

  • Centralized: allowing the collection/storage of all system logs in a central location.
  • Structured: allows you to add searchable metadata to logs.
  • Searchable: allows searching by multiple criteria (app version, date, category, level, text, metadata, etc.)
  • Configurable: allows changing verbosity without code changes (based on log level and/or scope).
  • Integrated: integrated into tracing, facilitating analysis of traces and logs in the same tool.

Making logging right 1/2

  • standardize
    • agree across apps and teams what log levels are used, categories...
  • use a proven logging library / system
    • log4net, Serilog, NLog, Elmah, Datadog, Logdna, Raygun...
  • maintain / reserve time
    • do not just "leave it out there"
  • grow team / company "logging culture"
    • set up a real-time monitor in dev room / slack channel...
  • log in bulks in async (ergo libraries...)
    • to any source!
    • don't log to console when in prod, cuts your perf...

Making logging right 2/2

Console log slowing you down? How much?

Rick Strahl "WestWind" (.NET Core 2.2):

That's 40x slower with Console [INFO] VS [WARN] on!

.NET6?

Make traceability right..

  • learn OpenTelemetry.io set of tools and "standards"
    • use the W3C standard (2020.)
  • use an activity SDK (.NET has one) supporting W3C
    • define (W3C's) SpanId, TraceId and ParentId
  • log baggage (additional properties) ie. form TraceContext
  • be consistent (apply activity tracking in all your services)
  • make it centralized, searchable, alterable
  • use a CFN dedicated service to store and visualize
    • AppInsights, Zipkin, Jaeger, Prometheus, Splunk...

Making traceability right

Terminology recap for traceability

  • Trace - a collection of spans that share the same trace ID
  • First span is root one - represent the E2E of an operation
  • Each span carries metrics and baggage

Thanks for holding thight...

...also ST folks know their observability :-)

demo time!

more sort of a code-review than coding :-)

The end! 📸 Questions?

Distributed observability in .NET

By Vedran Mandić

Distributed observability in .NET

A presentation on the WHAT, HOW and WHY of logging and tracing in modern .NET applications with focus on the HTTP channel.

  • 688