Connecting the Dots

Petra Bierleutgeb

@pbvie

with Distributed Tracing

Agenda

Introduction to Distributed Tracing
- What, Why, How and Where
Distributed Tracing in Scala
- transparent tracing vs. explicit tracing
- opentracing-java
- patterns for better ergonomics
- context propagation
- writing your own instrumentation
Demo: Tracing a small system

we'll focus on this one

Motivation

(Distributed) Tracing is a very useful tool
especially (but not only) in micro-service oriented systems
the OpenTracing Java API is not a (very) good fit for Scala
- relies on global state
- tedious to use with async code
existing tooling often quite heavy-weighted
improve usability in Scala by leveraging language features
- explore feasibility of stateless tracing

Introduction to
Distributed Tracing

What, Why, How and Where

What

Distributed Tracing

tells the story of a workflow that might involve multiple services from start to finish
describes for a (usually non-trivial) task
- which steps are involved
- how are they connected
- how long did each step take (duration)

Helpful for

understanding complex systems
breaking down latency
root cause analysis
anomaly detection
dependency analysis

How

A Trace

can be thought of as a directed acyclic graph (DAG)

nodes are spans
edges are references

handle order request

create order event

publish order

get user

A Trace

can be thought of as a directed acyclic graph (DAG)

but is usually pictured as a Gantt chart

handle order request

create order event

publish order

get user

Terminology

trace

span

Trace: The complete workflow
Span: a logical step/piece of work

Terminology

baggage items

Terminology

Logs: key/value pairs, attached to the span at a certain point in time
- used to indicate errors or other events of interest

see: https://github.com/opentracing/specification/blob/master/semantic_conventions.md

error log

References

causal relationships between a child span and a parent span
childOf
- parent span depends on child span
- parent span may not finish before child span
followsFrom
- parent span doesn't depend on (and wait for) result
  of child span
- example: publish command that triggers further action

followsFrom

childOf

...and where?

Tracing-agnostic applications vs. tracing as part of application code

Network Level

+ no changes to application code
~ coarse-grained compared to in-application tracing
~ required effort depends a lot on available infrastructure
- not (very) useful for monoliths
example: Tracing with Istio/Envoy

Application Level

+ detailed traces
+ works for all applications and infrastructures
+ provide more context
- increases non-functional complexity*
- requires support for different frameworks/tools*
examples
- transparent: Lightbend Telemetry, Kamon
- manual: OpenTracing implementations

The OpenTracing API

...and a look at opentracing-java

OpenTracing API

an implementation independent API for Distributed Tracing
official API definitions available in several languages
example: opentracing-java
- some concrete implementations
  - Jaeger
  - Zipkin
  - DataDog

OpenTracing

benefits
- program against API instead of concrete implementation
  => no vendor lock-in
- growing number of OpenTracing compatible tracers

The OpenTracing Java API

Tracer
GlobalTracer
Scope
Span
SpanContext
Carrier (Injector/Extractor)

OpenTracing Java - Example

Span span = tracer.buildSpan("someWork").start();
try (Scope scope = tracer.scopeManager().activate(span, false)) {
    // Do things.
} catch(Exception ex) {
    Tags.ERROR.set(span, true);
    span.log(Map.of(Fields.EVENT, "error", Fields.ERROR_OBJECT, ex));
} finally {
    span.finish();
}

Scope

OpenTracing relies on a ScopeManager to store the currently active span
- + spans don't have to be passed around
- - introduces global state

Scopes and global state - a critical view

OpenTracing APIs focus on state and often favor side-effects
- global/invisible state in concurrent programs is hard
- difficult to verify and understand
contrary to good programming practice
- immutability
- reduce side-effects
- behavior of functions depends only on input (arguments)

Scope? Who said anything about scope?

Tracing without Scopes

the OpenTracing specification doesn't mention scopes
trace data does not include information about
- scopes
- span activation/deactivation
can we have stateless tracing?
- exTra (explicit Tracing)
- experimental!

Sidenote: Tracing in FP

explicit tracing aims to remove global state but there's still side effects all over the place
fine for many applications but not suitable for pure FP
projects
- http4s tracer
- puretracing

Distributed Tracing in Scala

Better ergonimics with
exTra

exTra API (some of it)

withSpan/withSpanAsync
execute a block of code with the current span, log errors and finish span on completion
withChildSpan/withChildSpanAsync
create a new child span for the current span (provided via implicit TracingContext and execute the given block of code with it
withFollowsFrom/withFollowsFromAsync
same as above but with followsFrom relationship instead of childOf

withSpan

def withSpan[T](span: Span)(op: Span => T): T =
  try {
    op(span)
  } finally {
    span.finish()
  }

usage

withSpan(tracer.buildSpan("myOp").start()) { span =>
  // do something
}

Making the span implicit

def withSpan[T](op: Span => T)(implicit span: Span): T =
  try {
    op(span)
  } finally {
    span.finish()
  }

usage

implicit val span = ...

// some other code

withSpan { _ =>
  // do something
}

Works great for error handling

def withSpan[T](span: Span)(op: Span => T): T =
  try {
    op(span)
  } catch {
    case t: Throwable =>
      span.setTag(Tags.ERROR.getKey, true)
      span.log(
        Map(
          Fields.EVENT        -> "error",
          Fields.ERROR_OBJECT -> t
        ).asJava
      )
      throw t
  } finally {
    span.finish()
  }

Works great for error handling

implicit val span = ...
withSpan { _ =>
  7 / 0
}

span is reported and marked as error

TracingContext

in addition to the active span we often need access to the tracer
TracingContext: wraps up tracer and span so they can be passed around more easily

final case class TracingContext(tracer: Tracer, span: Span)

TracingContext - Example

def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
  val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
  withSpan(childSpan) { _ =>
    a + b
  }
}

TracingContext - Example

def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
  val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
  withSpan(childSpan) { _ =>
    a + b
  }
}

this is a very common pattern...let's implement a more concise syntax for it

Implementing withChildSpan - 1

def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
  try {
    op(tc.span)
  } catch {
    case t: Throwable =>
      tc.span.setTag(Tags.ERROR.getKey, true)
      tc.span.log(errorLog(t).asJava)
      throw t
  } finally {
    tc.span.finish()
  }

updating withSpan to work with TracingContext

Implementing withChildSpan - 2

def withChildSpan[T](
    operationName: String
)(op: TracingContext => T)(implicit tc: TracingContext): T = {
  val childSpan = tc.tracer.buildSpan(operationName).asChildOf(tc.span).start()
  val childTracingContext = tc.copy(span = childSpan)
  withSpan(op)(childTracingContext)
}

now we can implement withChildSpan

create childSpan
create new TracingContext with child span
call previously defined withSpan function

withChildSpan - Usage

withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
  withChildSpan("child-a") { _ =>
    Thread.sleep(20)
  }
  withChildSpan("child-b") { _ =>
    Thread.sleep(10)
  }
  withChildSpan("child-c") { _ =>
    Thread.sleep(15)
    7 / 0
  }
}

withChildSpan - async

withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
  withChildSpan("async-child") { _ =>
    Future {
      blocking(Thread.sleep(3000))
    }
  }
}

duration of 'async-child' is only 7.14ms

withChildSpan - async

def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
  try {
    op(tc.span)
  } catch {
    case t: Throwable =>
      tc.span.setTag(Tags.ERROR.getKey, true)
      tc.span.log(errorLog(t).asJava)
      throw t
  } finally {
    tc.span.finish()
  }

if op is returning a Future, this code doesn't work for 'childOf' relationships

withChildSpan - async

def withSpanAsync[T](op: TracingContext => Future[T])
  (implicit ec: ExecutionContext, tc: TracingContext): Future[T] =
    op(tc)
      .recoverWith {
        case t: Throwable =>
          tc.span.setTag(Tags.ERROR.getKey, true)
          tc.span.log(errorLog(t).asJava)
          Future.failed(t)
      }
      .andThen {
        case _ => tc.span.finish()
      }

use recover or recoverWith in combination with andThen
makes sure span is finished after Future is completed

Cross-process Tracing

Transporting span context across service boundaries

Distributed Tracing

Service A

Service B

Traces usually involve multiple services

Distributed Tracing

in order to continue a trace in a different process/service we need to pass over the span context
propagation format depends on tracer implementation
- => the same OpenTracing implementation needs to be used on both sides

Span Context

internal properties + user defined baggage items
example: Jaeger
- key: uber-trace-id
  value: 8c8c205d814630e:c94f0c7a2f89f95c:b41032f2168aae28:1

trace id

span id

parent id

flags

Injector/Extractor

used to inject/extract span context
consists of a format and a carrier
example:
- format: TEXT_MAP
- carrier: Kafka Headers, HTTP Headers

Implementing a custom extractor/injector

public interface TextMap extends Iterable<Map.Entry<String, String>> {

    Iterator<Map.Entry<String,String>> iterator();
    void put(String key, String value);
    
}

Example: Kafka

class KafkaHeaderInjector(private val headers: Headers) extends TextMap {
  override def iterator(): util.Iterator[JavaMap.Entry[String, String]] =
    throw new UnsupportedOperationException(
      "should only be used with Tracer.inject()"
    )
  override def put(key: String, value: String): Unit =
    headers.add(key, value.getBytes)
}

val record = new ProducerRecord(...)
tc.tracer.inject(tc.span.context(),
                  Format.Builtin.TEXT_MAP,
                  new KafkaHeaderInjector(record.headers()))
producer.send(record)

Demo System

Glühwein Ltd.

mulled wine

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

Code Example

1. Receive and handle
order request

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

2. Verify user exists

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

3. Publish and consume
create-order command

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

4. Handle order
with Akka

user service

order service

preparation service

client

gRPC req/resp

Akka HTTP

Kafka

Akka Actors

thirsty customer

Result

The End

Thank you!

Questions?

Credits

Icon made by Freepik from flaticon.com

Connecting the Dots with Distributed Tracing

By Petra Bierleutgeb

Connecting the Dots with Distributed Tracing

@Scala eXchange 2018

2,352

Agenda

Motivation

Introduction to Distributed Tracing

What

Distributed Tracing

Helpful for

How

A Trace

A Trace

Terminology

Terminology

Terminology

References

...and where?

Network Level

Application Level

The OpenTracing API

OpenTracing API

OpenTracing

The OpenTracing Java API

OpenTracing Java - Example

Scope

Scopes and global state - a critical view

Scope? Who said anything about scope?

Tracing without Scopes

Sidenote: Tracing in FP

Distributed Tracing in Scala

exTra API (some of it)

withSpan

Making the span implicit

Works great for error handling

Works great for error handling

TracingContext

TracingContext - Example

TracingContext - Example

Implementing withChildSpan - 1

Implementing withChildSpan - 2

withChildSpan - Usage

withChildSpan - async

withChildSpan - async

withChildSpan - async

Cross-process Tracing

Distributed Tracing

Distributed Tracing

Span Context

Injector/Extractor

Implementing a custom extractor/injector

Example: Kafka

Demo System

Code Example

1. Receive and handle order request

2. Verify user exists

3. Publish and consume create-order command

4. Handle order with Akka

The End

Credits

Connecting the Dots with Distributed Tracing

More from Petra Bierleutgeb

Introduction to
Distributed Tracing

1. Receive and handle
order request

3. Publish and consume
create-order command

4. Handle order
with Akka