Connecting the Dots

Petra Bierleutgeb

@pbvie

with Distributed Tracing

Agenda

  • Introduction to Distributed Tracing
    • What, Why, How and Where
  • Distributed Tracing in Scala
    • transparent tracing vs. explicit tracing
    • opentracing-java
    • patterns for better ergonomics
    • context propagation
    • writing your own instrumentation
  • Demo: Tracing a small system

we'll focus on this one

Motivation

  • (Distributed) Tracing is a very useful tool
    especially (but not only) in micro-service oriented systems
  • the OpenTracing Java API is not a (very) good fit for Scala
    • relies on global state
    • tedious to use with async code
  • existing tooling often quite heavy-weighted
  • improve usability in Scala by leveraging language features
    • explore feasibility of stateless tracing

Introduction to
Distributed Tracing

What, Why, How and Where

What

Distributed Tracing

  • tells the story of a workflow that might involve multiple services from start to finish
  • describes for a (usually non-trivial) task
    • which steps are involved
    • how are they connected
    • how long did each step take (duration)

Helpful for

  • understanding complex systems
  • breaking down latency
  • root cause analysis
  • anomaly detection
  • dependency analysis

How

A Trace

  • can be thought of as a directed acyclic graph (DAG)
  • nodes are spans
  • edges are references

handle order request

create order event

publish order

get user

A Trace

  • can be thought of as a directed acyclic graph (DAG)
  • but is usually pictured as a Gantt chart

handle order request

create order event

publish order

get user

Terminology

trace

span

  • Trace: The complete workflow
  • Span: a logical step/piece of work

Terminology

baggage items

tags

  • Tag: key/value pairs, valid for one span
    • examples: component, http.path, db.statement
  • Baggage Item: key/value pairs, propagated between spans
    • examples: transactionId, userId

see: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

Terminology

  • Logs: key/value pairs, attached to the span at a certain point in time
    • used to indicate errors or other events of interest

see: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

error log

References

  • causal relationships between a child span and a parent span
  • childOf
    • parent span depends on child span
    • parent span may not finish before child span
  • followsFrom
    • parent span doesn't depend on (and wait for) result
      of child span 
    • example: publish command that triggers further action

followsFrom

childOf

...and where?

Tracing-agnostic applications vs. tracing as part of application code

Network Level

  • + no changes to application code
  • ~ coarse-grained compared to in-application tracing
  • ~ required effort depends a lot on available infrastructure
  • - not (very) useful for monoliths
  • example: Tracing with Istio/Envoy

Application Level

  • + detailed traces
  • + works for all applications and infrastructures
  • + provide more context
  • - increases non-functional complexity*
  • - requires support for different frameworks/tools*
  • examples
    • transparent: Lightbend Telemetry, Kamon
    • manual: OpenTracing implementations

The OpenTracing API

...and a look at opentracing-java 

OpenTracing API

  • an implementation independent API for Distributed Tracing
  • official API definitions available in several languages
  • example: opentracing-java
    • some concrete implementations
      •  Jaeger
      • Zipkin
      • DataDog

OpenTracing

  • benefits
    • program against API instead of concrete implementation
      =>
      no vendor lock-in
    • growing number of OpenTracing compatible tracers

The OpenTracing Java API

  • Tracer
  • GlobalTracer
  • Scope
  • Span
  • SpanContext
  • Carrier (Injector/Extractor)

OpenTracing Java - Example

Span span = tracer.buildSpan("someWork").start();
try (Scope scope = tracer.scopeManager().activate(span, false)) {
    // Do things.
} catch(Exception ex) {
    Tags.ERROR.set(span, true);
    span.log(Map.of(Fields.EVENT, "error", Fields.ERROR_OBJECT, ex));
} finally {
    span.finish();
}

Scope

  • OpenTracing relies on a ScopeManager to store the currently active span
    • + spans don't have to be passed around
    • - introduces global state

Scopes and global state - a critical view

  • OpenTracing APIs focus on state and often favor side-effects
    • global/invisible state in concurrent programs is hard
    • difficult to verify and understand
  • contrary to good programming practice
    • immutability
    • reduce side-effects
    • behavior of functions depends only on input (arguments)

Scope? Who said anything about scope?

Tracing without Scopes

  • the OpenTracing specification doesn't mention scopes
  • trace data does not include information about
    • scopes
    • span activation/deactivation
  • can we have stateless tracing?
    • exTra (explicit Tracing)
    • experimental!

Sidenote: Tracing in FP

  • explicit tracing aims to remove global state but there's still side effects all over the place
  • fine for many applications but not suitable for pure FP
  • projects

Distributed Tracing in Scala

Better ergonimics with
exTra

exTra API (some of it)

  • withSpan/withSpanAsync
    execute a block of code with the current span, log errors and finish span on completion
  • withChildSpan/withChildSpanAsync
    create a new child span for the current span (provided via implicit TracingContext and execute the given block of code with it
  • withFollowsFrom/withFollowsFromAsync
    same as above but with followsFrom relationship instead of childOf

withSpan

def withSpan[T](span: Span)(op: Span => T): T =
  try {
    op(span)
  } finally {
    span.finish()
  }

usage

withSpan(tracer.buildSpan("myOp").start()) { span =>
  // do something
}

Making the span implicit

def withSpan[T](op: Span => T)(implicit span: Span): T =
  try {
    op(span)
  } finally {
    span.finish()
  }

usage

implicit val span = ...

// some other code

withSpan { _ =>
  // do something
}

Works great for error handling

def withSpan[T](span: Span)(op: Span => T): T =
  try {
    op(span)
  } catch {
    case t: Throwable =>
      span.setTag(Tags.ERROR.getKey, true)
      span.log(
        Map(
          Fields.EVENT        -> "error",
          Fields.ERROR_OBJECT -> t
        ).asJava
      )
      throw t
  } finally {
    span.finish()
  }

Works great for error handling

implicit val span = ...
withSpan { _ =>
  7 / 0
}

span is reported and marked as error

TracingContext

  • in addition to the active span we often need access to the tracer
  • TracingContext: wraps up tracer and span so they can be passed around more easily
final case class TracingContext(tracer: Tracer, span: Span)

TracingContext - Example

def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
  val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
  withSpan(childSpan) { _ =>
    a + b
  }
}

TracingContext - Example

def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
  val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
  withSpan(childSpan) { _ =>
    a + b
  }
}

this is a very common pattern...let's implement a more concise syntax for it

Implementing withChildSpan - 1

def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
  try {
    op(tc.span)
  } catch {
    case t: Throwable =>
      tc.span.setTag(Tags.ERROR.getKey, true)
      tc.span.log(errorLog(t).asJava)
      throw t
  } finally {
    tc.span.finish()
  }

updating withSpan to work with TracingContext

Implementing withChildSpan - 2

def withChildSpan[T](
    operationName: String
)(op: TracingContext => T)(implicit tc: TracingContext): T = {
  val childSpan = tc.tracer.buildSpan(operationName).asChildOf(tc.span).start()
  val childTracingContext = tc.copy(span = childSpan)
  withSpan(op)(childTracingContext)
}

now we can implement withChildSpan

  1. create childSpan
  2. create new TracingContext with child span
  3. call previously defined withSpan function

withChildSpan - Usage

withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
  withChildSpan("child-a") { _ =>
    Thread.sleep(20)
  }
  withChildSpan("child-b") { _ =>
    Thread.sleep(10)
  }
  withChildSpan("child-c") { _ =>
    Thread.sleep(15)
    7 / 0
  }
}

withChildSpan - async

withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
  withChildSpan("async-child") { _ =>
    Future {
      blocking(Thread.sleep(3000))
    }
  }
}

duration of  'async-child' is only 7.14ms

withChildSpan - async

def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
  try {
    op(tc.span)
  } catch {
    case t: Throwable =>
      tc.span.setTag(Tags.ERROR.getKey, true)
      tc.span.log(errorLog(t).asJava)
      throw t
  } finally {
    tc.span.finish()
  }
  • if op is returning a Future, this code doesn't work for 'childOf' relationships

withChildSpan - async

def withSpanAsync[T](op: TracingContext => Future[T])
  (implicit ec: ExecutionContext, tc: TracingContext): Future[T] =
    op(tc)
      .recoverWith {
        case t: Throwable =>
          tc.span.setTag(Tags.ERROR.getKey, true)
          tc.span.log(errorLog(t).asJava)
          Future.failed(t)
      }
      .andThen {
        case _ => tc.span.finish()
      }
  • use recover or recoverWith in combination with andThen
  • makes sure span is finished after Future is completed

Cross-process Tracing

Transporting span context across service boundaries

Distributed Tracing

Service A

Service B

?

Traces usually involve multiple services

Distributed Tracing

  • in order to continue a trace in a different process/service we need to pass over the span context
  • propagation format depends on tracer implementation
    • =>  the same OpenTracing implementation needs to be used on both sides

Span Context

  • internal properties + user defined baggage items
  • example: Jaeger
    • key: uber-trace-id
      value: 8c8c205d814630e:c94f0c7a2f89f95c:b41032f2168aae28:1

trace id

span id

parent id

flags

Injector/Extractor

  • used to inject/extract span context
  • consists of a format and a carrier
  • example:
    • format: TEXT_MAP
    • carrier: Kafka Headers, HTTP Headers

Implementing a custom extractor/injector

public interface TextMap extends Iterable<Map.Entry<String, String>> {

    Iterator<Map.Entry<String,String>> iterator();
    void put(String key, String value);
    
}

Example: Kafka

class KafkaHeaderInjector(private val headers: Headers) extends TextMap {
  override def iterator(): util.Iterator[JavaMap.Entry[String, String]] =
    throw new UnsupportedOperationException(
      "should only be used with Tracer.inject()"
    )
  override def put(key: String, value: String): Unit =
    headers.add(key, value.getBytes)
}
val record = new ProducerRecord(...)
tc.tracer.inject(tc.span.context(),
                  Format.Builtin.TEXT_MAP,
                  new KafkaHeaderInjector(record.headers()))
producer.send(record)

Demo System

Glühwein Ltd.

mulled wine

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

Code Example

1. Receive and handle
order request

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

2. Verify user exists

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

3. Publish and consume
create-order command

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

4. Handle order
with Akka

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

Result

The End

Thank you!

Questions?

Credits

Icon made by Freepik from flaticon.com

Connecting the Dots with Distributed Tracing

By Petra Bierleutgeb

Connecting the Dots with Distributed Tracing

@Scala eXchange 2018

  • 2,496