Distributed tracing for everyone
What is OpenTelemetry?
- An open standard for distributed tracing
- Collector implementations
- Tools for operating on telemetry
- Language SDKs
- Vendor agnostic
You might be wondering
What's distributed tracing?
Distributed Tracing:
observing the internal state of a distributed system using structured data
Questions DT helps you answer
- What service is currently causing a terrible experience for user X?
- What service has the highest latency?
- What services are the least reliable?
- Why is service Y failing right now?
- What is the uptime of product Z?
- What is the in vivo structure of the system?
Distributed Tracing
Distributed Tracing
Did someone ask?
Can you summarize DT with a meme?
Yes, yes I can
OpenTelemetry + DT
Terminology
- Traces + Spans
- Metrics
- Logs
Baggage
Traces + Spans
-
Traces are trees of spans
- Root span + children
- E.g.: lifecycle of a "request"
- Spans are single operations: e.g.: function call, db query
- Most of DT is about traces
NOTE:
Service entry point != root span!
Specification Link
Traces + Spans
-
Why are they useful?
- Events (~logs)
- Span fields (including duration)
- Many (most?) useful metrics can be computed from the above
Specification Link
Metrics
- Summary data
- Counters, gauges, histograms
- New-ish specification, experimental
Specification Link
Logs
- Newest part of the spec
- An event without a parent span
- Not sure if there's a reason to use logs right now
Specification Link
Deployment
Collectors
Collectors
-
Receivers
-
-
Receive otel data
-
Ex:
-
otlp
-
jaeger WP
-
-
-
Processors
-
Operate on data
-
Ex:
-
add hostname to every trace
-
-
-
Exporters
-
Send data to $LOCATION
-
Ex:
-
JSON file
-
Honeycomb
-
Local jaeger
-
-
Reference Architecture
Example
gRPC Client/Server
- OTLP protocol
- Logging exporter
- Jaeger exporter
- Trace context is propagated
Example Code
#[instrument]
async fn greet() -> Result<(), Box<Error> {
let mut client = GreeterClient::connect(
"http://[::1]:50051")
.instrument(info_span!("client connect"))
.await?;
let mut request = tonic::Request::new(HelloRequest {
name: "Tonic".into(),
});
global::get_text_map_propagator(|propagator| {
propagator.inject_context(
&tracing::Span::current().context(),
&mut MetadataMap(request.metadata_mut()),
)
});
let response = client
.say_hello(request)
.instrument(info_span!("say_hello"))
.await?;
info!("Response received: {:?}", response);
Ok(())
}
#[tonic::async_trait]
impl Greeter for MyGreeter {
#[instrument]
async fn say_hello(
&self,
request: Request<HelloRequest>,
) -> Result<Response<HelloReply>, Status> {
let parent_cx =
global::get_text_map_propagator(
|prop| prop.extract(
&MetadataMap(request.metadata())));
tracing::Span::current().set_parent(parent_cx);
let name = request.into_inner().name;
expensive_fn(format!("Got name: {:?}", name));
// Return an instance of type HelloReply
let reply = hello_world::HelloReply {
message: format!("Hello {}!", name),
};
Ok(Response::new(reply))
}
}
Client
Server
Further Reading
The space is enormous
- https://opentelemetry.io/docs/
- https://github.com/open-telemetry/opentelemetry-specification
- https://github.com/open-telemetry/opentelemetry-collector
- https://github.com/open-telemetry/opentelemetry-collector-contrib
- https://github.com/open-telemetry/opentelemetry-cpp
- https://github.com/open-telemetry/opentelemetry-rust
- https://www.honeycomb.io/
- https://lightstep.com/
OpenTelemetry
By Phillip Cloud
OpenTelemetry
- 880