Connecting the Dots
Petra Bierleutgeb
@pbvie
with Distributed Tracing
Agenda
- Introduction to Distributed Tracing
- What, Why, How and Where
- Distributed Tracing in Scala
- transparent tracing vs. explicit tracing
- opentracing-java
- patterns for better ergonomics
- context propagation
- writing your own instrumentation
- Demo: Tracing a small system
we'll focus on this one
Motivation
- (Distributed) Tracing is a very useful tool
especially (but not only) in micro-service oriented systems - the OpenTracing Java API is not a (very) good fit for Scala
- relies on global state
- tedious to use with async code
- existing tooling often quite heavy-weighted
-
improve usability in Scala by leveraging language features
- explore feasibility of stateless tracing
Introduction to
Distributed Tracing
What, Why, How and Where
What
Distributed Tracing
- tells the story of a workflow that might involve multiple services from start to finish
- describes for a (usually non-trivial) task
- which steps are involved
- how are they connected
- how long did each step take (duration)
Helpful for
- understanding complex systems
- breaking down latency
- root cause analysis
- anomaly detection
- dependency analysis
How
A Trace
- can be thought of as a directed acyclic graph (DAG)
- nodes are spans
- edges are references
handle order request
create order event
publish order
get user
A Trace
- can be thought of as a directed acyclic graph (DAG)
- but is usually pictured as a Gantt chart
handle order request
create order event
publish order
get user
Terminology
trace
span
- Trace: The complete workflow
- Span: a logical step/piece of work
Terminology
baggage items
tags
-
Tag: key/value pairs, valid for one span
- examples: component, http.path, db.statement
-
Baggage Item: key/value pairs, propagated between spans
- examples: transactionId, userId
see: https://github.com/opentracing/specification/blob/master/semantic_conventions.md
Terminology
-
Logs: key/value pairs, attached to the span at a certain point in time
- used to indicate errors or other events of interest
see: https://github.com/opentracing/specification/blob/master/semantic_conventions.md
error log
References
- causal relationships between a child span and a parent span
-
childOf
- parent span depends on child span
- parent span may not finish before child span
-
followsFrom
-
parent span doesn't depend on (and wait for) result
of child span - example: publish command that triggers further action
-
parent span doesn't depend on (and wait for) result
followsFrom
childOf
...and where?
Tracing-agnostic applications vs. tracing as part of application code
Network Level
- + no changes to application code
- ~ coarse-grained compared to in-application tracing
- ~ required effort depends a lot on available infrastructure
- - not (very) useful for monoliths
- example: Tracing with Istio/Envoy
Application Level
- + detailed traces
- + works for all applications and infrastructures
- + provide more context
- - increases non-functional complexity*
- - requires support for different frameworks/tools*
- examples
- transparent: Lightbend Telemetry, Kamon
- manual: OpenTracing implementations
The OpenTracing API
...and a look at opentracing-java
OpenTracing API
- an implementation independent API for Distributed Tracing
- official API definitions available in several languages
- example: opentracing-java
- some concrete implementations
- Jaeger
- Zipkin
- DataDog
- some concrete implementations
OpenTracing
- benefits
- program against API instead of concrete implementation
=> no vendor lock-in - growing number of OpenTracing compatible tracers
- program against API instead of concrete implementation
The OpenTracing Java API
- Tracer
- GlobalTracer
- Scope
- Span
- SpanContext
- Carrier (Injector/Extractor)
OpenTracing Java - Example
Span span = tracer.buildSpan("someWork").start();
try (Scope scope = tracer.scopeManager().activate(span, false)) {
// Do things.
} catch(Exception ex) {
Tags.ERROR.set(span, true);
span.log(Map.of(Fields.EVENT, "error", Fields.ERROR_OBJECT, ex));
} finally {
span.finish();
}
Scope
- OpenTracing relies on a ScopeManager to store the currently active span
- + spans don't have to be passed around
- - introduces global state
Scopes and global state - a critical view
- OpenTracing APIs focus on state and often favor side-effects
- global/invisible state in concurrent programs is hard
- difficult to verify and understand
-
contrary to good programming practice
- immutability
- reduce side-effects
- behavior of functions depends only on input (arguments)
Scope? Who said anything about scope?
Tracing without Scopes
- the OpenTracing specification doesn't mention scopes
-
trace data does not include information about
- scopes
- span activation/deactivation
- can we have stateless tracing?
- exTra (explicit Tracing)
- experimental!
Sidenote: Tracing in FP
- explicit tracing aims to remove global state but there's still side effects all over the place
- fine for many applications but not suitable for pure FP
- projects
Distributed Tracing in Scala
Better ergonimics with
exTra
exTra API (some of it)
-
withSpan/withSpanAsync
execute a block of code with the current span, log errors and finish span on completion -
withChildSpan/withChildSpanAsync
create a new child span for the current span (provided via implicit TracingContext and execute the given block of code with it -
withFollowsFrom/withFollowsFromAsync
same as above but with followsFrom relationship instead of childOf
withSpan
def withSpan[T](span: Span)(op: Span => T): T =
try {
op(span)
} finally {
span.finish()
}
usage
withSpan(tracer.buildSpan("myOp").start()) { span =>
// do something
}
Making the span implicit
def withSpan[T](op: Span => T)(implicit span: Span): T =
try {
op(span)
} finally {
span.finish()
}
usage
implicit val span = ...
// some other code
withSpan { _ =>
// do something
}
Works great for error handling
def withSpan[T](span: Span)(op: Span => T): T =
try {
op(span)
} catch {
case t: Throwable =>
span.setTag(Tags.ERROR.getKey, true)
span.log(
Map(
Fields.EVENT -> "error",
Fields.ERROR_OBJECT -> t
).asJava
)
throw t
} finally {
span.finish()
}
Works great for error handling
implicit val span = ...
withSpan { _ =>
7 / 0
}
span is reported and marked as error
TracingContext
- in addition to the active span we often need access to the tracer
- TracingContext: wraps up tracer and span so they can be passed around more easily
final case class TracingContext(tracer: Tracer, span: Span)
TracingContext - Example
def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
withSpan(childSpan) { _ =>
a + b
}
}
TracingContext - Example
def calculateSum(a: Int, b: Int)(implicit tc: TracingContext): Int = {
val childSpan = tc.tracer.buildSpan("calculateSum").asChildOf(tc.span).start()
withSpan(childSpan) { _ =>
a + b
}
}
this is a very common pattern...let's implement a more concise syntax for it
Implementing withChildSpan - 1
def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
try {
op(tc.span)
} catch {
case t: Throwable =>
tc.span.setTag(Tags.ERROR.getKey, true)
tc.span.log(errorLog(t).asJava)
throw t
} finally {
tc.span.finish()
}
updating withSpan to work with TracingContext
Implementing withChildSpan - 2
def withChildSpan[T](
operationName: String
)(op: TracingContext => T)(implicit tc: TracingContext): T = {
val childSpan = tc.tracer.buildSpan(operationName).asChildOf(tc.span).start()
val childTracingContext = tc.copy(span = childSpan)
withSpan(op)(childTracingContext)
}
now we can implement withChildSpan
- create childSpan
- create new TracingContext with child span
- call previously defined withSpan function
withChildSpan - Usage
withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
withChildSpan("child-a") { _ =>
Thread.sleep(20)
}
withChildSpan("child-b") { _ =>
Thread.sleep(10)
}
withChildSpan("child-c") { _ =>
Thread.sleep(15)
7 / 0
}
}
withChildSpan - async
withSpan(TracingContext.withNewRootSpan("example")(tracer)) { implicit tc =>
withChildSpan("async-child") { _ =>
Future {
blocking(Thread.sleep(3000))
}
}
}
duration of 'async-child' is only 7.14ms
withChildSpan - async
def withSpan[T](op: TracingContext => T)(implicit tc: TracingContext): T =
try {
op(tc.span)
} catch {
case t: Throwable =>
tc.span.setTag(Tags.ERROR.getKey, true)
tc.span.log(errorLog(t).asJava)
throw t
} finally {
tc.span.finish()
}
- if op is returning a Future, this code doesn't work for 'childOf' relationships
withChildSpan - async
def withSpanAsync[T](op: TracingContext => Future[T])
(implicit ec: ExecutionContext, tc: TracingContext): Future[T] =
op(tc)
.recoverWith {
case t: Throwable =>
tc.span.setTag(Tags.ERROR.getKey, true)
tc.span.log(errorLog(t).asJava)
Future.failed(t)
}
.andThen {
case _ => tc.span.finish()
}
- use recover or recoverWith in combination with andThen
- makes sure span is finished after Future is completed
Cross-process Tracing
Transporting span context across service boundaries
Distributed Tracing
Service A
Service B
?
Traces usually involve multiple services
Distributed Tracing
- in order to continue a trace in a different process/service we need to pass over the span context
- propagation format depends on tracer implementation
- => the same OpenTracing implementation needs to be used on both sides
Span Context
- internal properties + user defined baggage items
- example: Jaeger
- key: uber-trace-id
value: 8c8c205d814630e:c94f0c7a2f89f95c:b41032f2168aae28:1
- key: uber-trace-id
trace id
span id
parent id
flags
Injector/Extractor
- used to inject/extract span context
- consists of a format and a carrier
- example:
- format: TEXT_MAP
- carrier: Kafka Headers, HTTP Headers
Implementing a custom extractor/injector
public interface TextMap extends Iterable<Map.Entry<String, String>> {
Iterator<Map.Entry<String,String>> iterator();
void put(String key, String value);
}
Example: Kafka
class KafkaHeaderInjector(private val headers: Headers) extends TextMap {
override def iterator(): util.Iterator[JavaMap.Entry[String, String]] =
throw new UnsupportedOperationException(
"should only be used with Tracer.inject()"
)
override def put(key: String, value: String): Unit =
headers.add(key, value.getBytes)
}
val record = new ProducerRecord(...)
tc.tracer.inject(tc.span.context(),
Format.Builtin.TEXT_MAP,
new KafkaHeaderInjector(record.headers()))
producer.send(record)
Demo System
Glühwein Ltd.
mulled wine
user service
order service
preparation service
client
gRPC req/resp
Akka HTTP
Kafka
Akka Actors
thirsty customer
Code Example
1. Receive and handle
order request
user service
order service
preparation service
client
gRPC req/resp
Akka HTTP
Kafka
Akka Actors
thirsty customer
2. Verify user exists
user service
order service
preparation service
client
gRPC req/resp
Akka HTTP
Kafka
Akka Actors
thirsty customer
3. Publish and consume
create-order command
user service
order service
preparation service
client
gRPC req/resp
Akka HTTP
Kafka
Akka Actors
thirsty customer
4. Handle order
with Akka
user service
order service
preparation service
client
gRPC req/resp
Akka HTTP
Kafka
Akka Actors
thirsty customer
Result
The End
Thank you!
Questions?
Credits
Icon made by Freepik from flaticon.com
Connecting the Dots with Distributed Tracing
By Petra Bierleutgeb
Connecting the Dots with Distributed Tracing
@Scala eXchange 2018
- 2,496