Connecting the Dots

Petra Bierleutgeb

@pbvie

with Distributed Tracing

Agenda

  • Introduction to Distributed Tracing
    • What, Why, How and Where
  • Distributed Tracing in Scala
    • transparent tracing vs. explicit tracing
    • opentracing-java
    • patterns for better ergonomics
    • context propagation
    • writing your own instrumentation
  • Demo: Tracing a small system

we'll focus on this one

Motivation

  • (Distributed) Tracing is a very useful tool
    especially (but not only) in micro-service oriented systems
  • the OpenTracing Java API is not a (very) good fit for Scala
    • relies on global state
    • tedious to use with async code
  • existing tooling often quite heavy-weighted
  • improve usability in Scala by leveraging language features
    • explore feasibility of stateless tracing

Introduction to
Distributed Tracing

What, Why, How and Where

What

Distributed Tracing

  • tells the story of a workflow that might involve multiple services from start to finish
  • describes for a (usually non-trivial) task
    • which steps are involved
    • how are they connected
    • how long did each step take (duration)

Helpful for

  • understanding complex systems
  • breaking down latency
  • root cause analysis
  • anomaly detection
  • dependency analysis

How

A Trace

  • can be thought of as a directed acyclic graph (DAG)
  • nodes are spans
  • edges are references

handle order request

create order event

publish order

get user

A Trace

  • can be thought of as a directed acyclic graph (DAG)
  • but is usually pictured as a Gantt chart

handle order request

create order event

publish order

get user

Terminology

trace

span

  • Trace: The complete workflow
  • Span: a logical step/piece of work

Terminology

baggage items

tags

  • Tag: key/value pairs, valid for one span
    • examples: component, http.path, db.statement
  • Baggage Item: key/value pairs, propagated between spans
    • examples: transactionId, userId

see: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

Terminology

  • Logs: key/value pairs, attached to the span at a certain point in time
    • used to indicate errors or other events of interest

see: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

error log

References

  • causal relationships between a child span and a parent span
  • childOf
    • parent span depends on child span
    • parent span may not finish before child span
  • followsFrom
    • parent span doesn't depend on (and wait for) result
      of child span 
    • example: publish command that triggers further action

followsFrom

childOf

...and where?

Tracing-agnostic applications vs. tracing as part of application code

Network Level

  • + no changes to application code
  • ~ coarse-grained compared to in-application tracing
  • ~ required effort depends a lot on available infrastructure
  • - not (very) useful for monoliths
  • example: Tracing with Istio/Envoy

Application Level

  • + detailed traces
  • + works for all applications and infrastructures
  • + provide more context
  • - increases non-functional complexity*
  • - requires support for different frameworks/tools*
  • examples
    • transparent: Lightbend Telemetry, Kamon
    • manual: OpenTracing implementations

The OpenTracing API

...and a look at opentracing-java 

OpenTracing API

  • an implementation independent API for Distributed Tracing
  • official API definitions available in several languages
  • example: opentracing-java
    • some concrete implementations
      •  Jaeger
      • Zipkin
      • DataDog

OpenTracing

  • benefits
    • program against API instead of concrete implementation
      =>
      no vendor lock-in
    • growing number of OpenTracing compatible tracers

The OpenTracing Java API

  • Tracer
  • GlobalTracer
  • Scope
  • Span
  • SpanContext
  • Carrier (Injector/Extractor)

OpenTracing Java - Example

Span span = tracer.buildSpan("someWork").start();
try (Scope scope = tracer.scopeManager().activate(span, false)) {
    // Do things.
} catch(Exception ex) {
    Tags.ERROR.set(span, true);
    span.log(Map.of(Fields.EVENT, "error", Fields.ERROR_OBJECT, ex));
} finally {
    span.finish();
}

Scope

  • OpenTracing relies on a ScopeManager to store the currently active span
    • + spans don't have to be passed around
    • - introduces global state

Scopes and global state - a critical view

  • OpenTracing APIs focus on state and often favor side-effects
    • global/invisible state in concurrent programs is hard
    • difficult to verify and understand
  • contrary to good programming practice
    • immutability
    • reduce side-effects
    • behavior of functions depends only on input (arguments)

Scope? Who said anything about scope?

Tracing without Scopes

  • the OpenTracing specification doesn't mention scopes
  • trace data does not include information about
    • scopes
    • span activation/deactivation
  • can we have stateless tracing?
    • exTra (explicit Tracing)
    • experimental!

Sidenote: Tracing in FP

  • explicit tracing aims to remove global state but there's still side effects all over the place
  • fine for many applications but not suitable for pure FP
  • projects

Distributed Tracing in Scala

Better ergonimics with
exTra

exTra API (some of it)

  • withSpan/withSpanAsync
    execute a block of code with the current span, log errors and finish span on completion
  • withChildSpan/withChildSpanAsync
    create a new child span for the current span (provided via implicit TracingContext and execute the given block of code with it
  • withFollowsFrom/withFollowsFromAsync
    same as above but with followsFrom relationship instead of childOf

withSpan

def withSpan[T](span: Span)(op: Span => T): T =
  try {
    op(span)
  } finally {
    span.finish()
  }

usage

withSpan(tracer.buildSpan("myOp").start()) { span =>
  // do something
}

Making the span implicit

def withSpan[T](op: Span => T)(implicit span: Span): T =
  try {
    op(span)
  } finally {
    span.finish()
  }

usage

implicit val span = ...

// some other code

withSpan { _ =>
  // do something
}

Works great for error handling

def withSpan[T](span: Span)(op: Span => T): T =
  try {
    op(span)
  } catch {
    case t: Throwable =>
      span.setTag(Tags.ERROR.getKey, true)
      span.log(
        Map(
          Fields.EVENT        -> "error",
          Fields.ERROR_OBJECT -> t
        ).asJava
      )
      throw t
  } finally {
    span.finish()
  }

Works great for error handling

implicit val span = ...
withSpan { _ =>
  7 / 0
}

span is reported and marked as error

TracingContext

  • in addition to the active span we often need access to the tracer
  • TracingContext: wraps up tracer and span so they can be passed around more easily
final case class TracingContext(tracer: Tracer, span: Span)

TracingContext - Example

def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
  val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
  withSpan(childSpan) { _ =>
    a + b
  }
}

TracingContext - Example

def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
  val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
  withSpan(childSpan) { _ =>
    a + b
  }
}

this is a very common pattern...let's implement a more concise syntax for it

Implementing withChildSpan - 1

def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
  try {
    op(tc.span)
  } catch {
    case t: Throwable =>
      tc.span.setTag(Tags.ERROR.getKey, true)
      tc.span.log(errorLog(t).asJava)
      throw t
  } finally {
    tc.span.finish()
  }

updating withSpan to work with TracingContext

Implementing withChildSpan - 2

def withChildSpan[T](
    operationName: String
)(op: TracingContext => T)(implicit tc: TracingContext): T = {
  val childSpan = tc.tracer.buildSpan(operationName).asChildOf(tc.span).start()
  val childTracingContext = tc.copy(span = childSpan)
  withSpan(op)(childTracingContext)
}

now we can implement withChildSpan

  1. create childSpan
  2. create new TracingContext with child span
  3. call previously defined withSpan function

withChildSpan - Usage

withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
  withChildSpan("child-a") { _ =>
    Thread.sleep(20)
  }
  withChildSpan("child-b") { _ =>
    Thread.sleep(10)
  }
  withChildSpan("child-c") { _ =>
    Thread.sleep(15)
    7 / 0
  }
}

withChildSpan - async

withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
  withChildSpan("async-child") { _ =>
    Future {
      blocking(Thread.sleep(3000))
    }
  }
}

duration of  'async-child' is only 7.14ms

withChildSpan - async

def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
  try {
    op(tc.span)
  } catch {
    case t: Throwable =>
      tc.span.setTag(Tags.ERROR.getKey, true)
      tc.span.log(errorLog(t).asJava)
      throw t
  } finally {
    tc.span.finish()
  }
  • if op is returning a Future, this code doesn't work for 'childOf' relationships

withChildSpan - async

def withSpanAsync[T](op: TracingContext => Future[T])
  (implicit ec: ExecutionContext, tc: TracingContext): Future[T] =
    op(tc)
      .recoverWith {
        case t: Throwable =>
          tc.span.setTag(Tags.ERROR.getKey, true)
          tc.span.log(errorLog(t).asJava)
          Future.failed(t)
      }
      .andThen {
        case _ => tc.span.finish()
      }
  • use recover or recoverWith in combination with andThen
  • makes sure span is finished after Future is completed

Cross-process Tracing

Transporting span context across service boundaries

Distributed Tracing

Service A

Service B

?

Traces usually involve multiple services

Distributed Tracing

  • in order to continue a trace in a different process/service we need to pass over the span context
  • propagation format depends on tracer implementation
    • =>  the same OpenTracing implementation needs to be used on both sides

Span Context

  • internal properties + user defined baggage items
  • example: Jaeger
    • key: uber-trace-id
      value: 8c8c205d814630e:c94f0c7a2f89f95c:b41032f2168aae28:1

trace id

span id

parent id

flags

Injector/Extractor

  • used to inject/extract span context
  • consists of a format and a carrier
  • example:
    • format: TEXT_MAP
    • carrier: Kafka Headers, HTTP Headers

Implementing a custom extractor/injector

public interface TextMap extends Iterable<Map.Entry<String, String>> {

    Iterator<Map.Entry<String,String>> iterator();
    void put(String key, String value);
    
}

Example: Kafka

class KafkaHeaderInjector(private val headers: Headers) extends TextMap {
  override def iterator(): util.Iterator[JavaMap.Entry[String, String]] =
    throw new UnsupportedOperationException(
      "should only be used with Tracer.inject()"
    )
  override def put(key: String, value: String): Unit =
    headers.add(key, value.getBytes)
}
val record = new ProducerRecord(...)
tc.tracer.inject(tc.span.context(),
                  Format.Builtin.TEXT_MAP,
                  new KafkaHeaderInjector(record.headers()))
producer.send(record)

Demo System

Glühwein Ltd.

mulled wine

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

Code Example

1. Receive and handle
order request

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

2. Verify user exists

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

3. Publish and consume
create-order command

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

4. Handle order
with Akka

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

Result

The End

Thank you!

Questions?

Credits

Icon made by Freepik from flaticon.com