Distributed tracing for everyone

What is OpenTelemetry?

  • An open standard for distributed tracing
  • Collector implementations
  • Tools for operating on telemetry
  • Language SDKs
  • Vendor agnostic

You might be wondering

What's distributed tracing?

Distributed Tracing:

observing the internal state of a distributed system using structured data

Questions DT helps you answer

  • What service is currently causing a terrible experience for user X?
  • What service has the highest latency?
  • What services are the least reliable?
  • Why is service Y failing right now?
  • What is the uptime of product Z?
  • What is the in vivo structure of the system?

Distributed Tracing

Distributed Tracing

Did someone ask?

Can you summarize DT with a meme?

Yes, yes I can

OpenTelemetry + DT

Terminology

  1. Traces + Spans
  2. Metrics
  3. Logs
  4. Baggage

Traces + Spans

  • Traces are trees of spans
    • Root span + children
    • E.g.: lifecycle of a "request"
  • Spans are single operations: e.g.: function call, db query
  • Most of DT is about traces

NOTE:

Service entry point != root span!

Traces + Spans

  • Why are they useful?
    • Events (~logs)
    • Span fields (including duration)
  • Many (most?) useful metrics can be computed from the above

Metrics

  • Summary data
  • Counters, gauges, histograms
  • New-ish specification, experimental

Logs

  • Newest part of the spec
  • An event without a parent span
  • Not sure if there's a reason to use logs right now

Deployment

Collectors

Collectors

  • Receivers

    • Receive otel data

    • Ex:

      • otlp

      • jaeger WP

  • Processors

    • Operate on data

    • Ex:

      • add hostname to every trace

  • Exporters

    • Send data to $LOCATION
    • Ex:

      • ​JSON file

      • Honeycomb

      • Local jaeger

Reference Architecture

Example

gRPC Client/Server

  • OTLP protocol
  • Logging exporter
  • Jaeger exporter
  • Trace context is propagated

Example Code

#[instrument]
async fn greet() -> Result<(), Box<Error> {
    let mut client = GreeterClient::connect(
    	"http://[::1]:50051")
        .instrument(info_span!("client connect"))
        .await?;

    let mut request = tonic::Request::new(HelloRequest {
        name: "Tonic".into(),
    });

    global::get_text_map_propagator(|propagator| {
        propagator.inject_context(
            &tracing::Span::current().context(),
            &mut MetadataMap(request.metadata_mut()),
        )
    });

    let response = client
        .say_hello(request)
        .instrument(info_span!("say_hello"))
        .await?;

    info!("Response received: {:?}", response);
    Ok(())
}
#[tonic::async_trait]
impl Greeter for MyGreeter {
    #[instrument]
    async fn say_hello(
        &self,
        request: Request<HelloRequest>,
    ) -> Result<Response<HelloReply>, Status> {
        let parent_cx =
            global::get_text_map_propagator(
            	|prop| prop.extract(
                	&MetadataMap(request.metadata())));
        tracing::Span::current().set_parent(parent_cx);

        let name = request.into_inner().name;
        expensive_fn(format!("Got name: {:?}", name));

        // Return an instance of type HelloReply
        let reply = hello_world::HelloReply {
            message: format!("Hello {}!", name),
        };

        Ok(Response::new(reply))
    }
}

Client

Server

Further Reading

The space is enormous

OpenTelemetry

By Phillip Cloud

OpenTelemetry

  • 712