Traceability Approaches with..

Tempo / Grafana
NewRelic
Micrometer
OpenTelemetry

What is Distributed Tracing?

➝ A method to monitor and track requests as they flow through a distributed system.

➝ Provides visibility into the lifecycle of a request by capturing trace data at each service or component.

➝ Helps uncover performance issues, service dependencies, and bottlenecks.

Trace/Span Anatomy

A trace is a collection of spans representing the complete journey of a request through a system.

🧭 Span Metadata:

➝ Trace: The end-to-end view of a request across all components.

➝ Span: A single unit of work—represents an operation or step within a trace.

➝ Parent Span: The span that initiated a sub-operation.

➝ Child Span: A sub-operation spawned by a parent span.

➝ Root Span: The first span in a trace—often the inbound request handler.

➝ name – logical operation name (e.g., GET /users)

➝ start time / duration

➝ trace-id – shared across the whole trace

➝ span-id – unique to this span

➝ parent-id – optional, links to upstream span

➝ attributes – key-value metadata (e.g., http.method, db.statement)

➝ events – timestamped logs within a span

➝ status – success, error, etc.

📌 Key Terms:

Traceparent Header

📌 Key Concept: traceparent header
The traceparent HTTP header tracks request lineage across service boundaries. It contains:

https://www.w3.org/TR/trace-context/#traceparent-header

➝ version – format version

➝ trace-id – unique ID for the entire trace

➝ parent-id – ID of the calling span

➝ trace-flags – flags for sampling/debugging

Scenario Under Test

Setup approaches

1. NewRelic agent → NewRelic

2. Otel agent → Otel Collector → (Tempo + Grafana)

3. Otel agent → NewRelic OTLP Endpoint → NewRelic

4. Micrometer & Otel bridge (no agent) → NewRelic OTLP Endpoint → NewRelic

1. NewRelic agent → NewRelic

✅ Trace correlation: Spans are easily correlated across services using New Relic’s telemetry format.

✅ Some auto-tracing out-of-the-box: Auto tracing for some common frameworks and libraries.

❌ Some auto-tracing out-of-the-box: No propagation of context for Kafka (requires extra code for producer/consumer).

⚠️ Respects traceparent header: Propagation of injected traceparent request header. BUT.

❌ Vendor-specific tracing model: Changes the name of the Kafka message header to newrelic

❌ No traceId in logging MDC: Will require some other mechanism for populating logging context (Xm Logging library).

❌ Requires using a java agent: Can be preventing at times.

❌ Vendor lock-in: Not possible to switch to a different UI

https://docs.newrelic.com/docs/apm/agents/java-agent/instrumentation/java-agent-instrument-kafka-message-queues/#collect-kafka-distributed-trace

https://gitlab.xm.com/xlDevs/xm-logging

📌 Agent:

📌 UI:

✅ Advanced visualizations: Mature one-stop-shop UI for end-to-end trace analysis, service maps, and latency breakdowns.

1. NewRelic agent → NewRelic

https://one.eu.newrelic.com/distributed-tracing?account=2868847&duration=1800000&state=53d27747-81f9-ae60-9ed1-2f5b8a5aac9a

2. Otel agent → Otel Collector → (Tempo + Grafana)

✅ Standards-compliant: Fully W3C Trace Context-compliant; spans flow consistently across any instrumented services.
✅ Minimal effort for context propagation: Automatic propagation of tracing context across Kafka and MDC logging context.
✅ Auto discovery: Supports a plethora of supported libraries/requires minimal configuration.
❌ Requires using a java agent: Can be preventing at times.

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md

📌 Agent:

📌 UI:

✅ Highly customisable: Modular and customisable.

❌ Limited traces UI: Traces UI is basic — lacks features like deep dependency maps.

⚠️ Requires configuration: Configuring and maintaining it can be time-consuming.

2. Otel agent → Otel Collector → Tempo + Grafana

https://tinyurl.com/29rta4jm

3. Otel agent → NewRelic OTLP Endpoint → NewRelic

✅ Best of both worlds: Uses open-source instrumentation (Otel) and feeds into a polished tracing UI (New Relic).

https://docs.newrelic.com/docs/opentelemetry/best-practices/opentelemetry-otlp/#endpoint-port-protocol

https://one.eu.newrelic.com/distributed-tracing?account=2868847&duration=1800000&state=eaf89b32-58a9-141f-5444-8c700325591a

3. Otel agent → NewRelic OTLP Endpoint → NewRelic

4. Micrometer & Otel bridge (no agent) → NewRelic OTLP Endpoint → NewRelic

✅ SpringBoot specific: Tailored to SpringBoot applications and components.

✅ Unified Facade API: Centralised observability facade from custom metrics and traces.
✅ Lightweight: No runtime agent - avoids library version incompatibilities and reduces overhead.
✅ High Precision: Only essential spans are created — reducing trace noise.
⚠️ Configuration-driven auto-instrumentation / context-propagation: Components must be appropriately configured to enable auto-tracing - easy to miss key traces (especially with async/Kafka)
❌ High implementation effort: More complex and time-consuming to scale across services vs OTEL agent.

https://docs.spring.io/spring-boot/reference/actuator/tracing.html

https://one.eu.newrelic.com/distributed-tracing?account=2868847&duration=1800000&state=82a45a8d-80c1-c96c-5b6b-39fc44451a22

4. Micrometer & Otel bridge (no agent) → NewRelic OTLP Endpoint → NewRelic

Furher Discussion Topics

⚠️ which approach do we follow for tracing?

⚠️ standardise UI for viewing traces? Grafana/NewRelic.

⚠️ standardise approach for generating traces? NewRelic agent/Otel agent/Micrometer.

⚠️ adopt traceparent standard convention?

..to consider if traceparent standard is adopted

⚠️ ensure front end systems are generating the traceparent header.

⚠️ AWS API GW does not automatically propagate traceparent headers, only by being explicit on velocity templates.

⚠️ istio proxy does not automatically log traceparent header, and will require adding support for it.

Traceability Approaches with..

What is Distributed Tracing?

Trace/Span Anatomy

🧭 Span Metadata:

📌 Key Terms:

Traceparent Header

Scenario Under Test

1. NewRelic agent → NewRelic

📌 Agent:

📌 UI:

1. NewRelic agent → NewRelic

2. Otel agent → Otel Collector → (Tempo + Grafana)

📌 Agent:

📌 UI:

2. Otel agent → Otel Collector → Tempo + Grafana

3. Otel agent → NewRelic OTLP Endpoint → NewRelic

3. Otel agent → NewRelic OTLP Endpoint → NewRelic

3. Otel agent → NewRelic OTLP Endpoint → NewRelic

4. Micrometer & Otel bridge (no agent) → NewRelic OTLP Endpoint → NewRelic

4. Micrometer & Otel bridge (no agent) → NewRelic OTLP Endpoint → NewRelic

Furher Discussion Topics

Observability Experiments with OpenTelemetry

Observability Experiments with OpenTelemetry

Theo Kliaris