Taming Tenancy, Cost and Architecture at Collibra
Through OpenTelemetry and our Telemetry Backbone
Almost 30 years in the sector
Mostly as Software Engineer
Web - 3D - Middleware - Mobile - Big Data
More recent as Architect
Data - SRE - Infrastructure
Community
Apache Beam contributor
OpenTelemetry Collector contributor
Collibra
Principal Systems Architect
Alex
Van Boxel
A data intelligence platform powered by active metadata
AI Governance
Data Catalog
Data Governance
Data Lineage
Data Notebook
Data Privacy
Data Quality & Observability
Protect
"How much does X cost?"

"Compare the Usage and Reservation?"


Trend analysis per tenant
and Collection at the Edge
Architecture
Collibra Architecture
- On-prem heritage: single deployable monolith → hosted SaaS on VMs (single-tenant isolation for free)
- Kubernetes shift: microservices for team velocity & polyglot (Python now dominant in AI)
- The pivot: "container per service per tenant" was unsustainable → shared multi-tenant on K8s
Shared multi-tenancy saves cost — but makes it harder to figure out the cost per tenant... how can we solve this?


B
Collector (VM)
Collector(s) on the VM. This could be multiple (eg. one per signal)

A
Collector (node)
Collector(s) installed as deamonsets.

C
Collector (cluster)
Cluster wide collectors not relevant to the per node workloads

D
Collector (ingress)
A collector hooked into the ingress gateway on a specific path, to capture telemetry from the browser and our edge.

21
Pub/Sub
Queuing system is an essential part of the backbone
OpenTelemetry Attributes
Golden Signals
Gold Attr. #1: Tenant
collibra.tenant.environment_id
- VMs Resource Attributes - Configurated at the collector
- Pod Resource Attributes - For single tenant pods
- Multi-tenant Pod - Signal Attributes
Gold Attr. #1: Tenant
collibra.tenant.environment_id
- VMs Resource Attributes - Configurated at the collector
- Pod Resource Attributes - For single tenant pods
- Multi-tenant Pod - Signal Attributes
{
"event_name": "workflow:started",
"tenant_environment_id": "...",
"asset_id": "..."
}CSTE - Collibra Structured Telemetry Event: Events are our golden signal
Gold Attr. #1: Tenant
collibra.tenant.environment_id
- VMs Resource Attributes - Configurated at the collector
- Pod Resource Attributes - For single tenant pods
- Multi-tenant Pod - Signal Attributes
{
"event_name": "workflow:started",
"tenant_environment_id": "...",
"asset_id": "..."
}MDC.put("tenant_environment_id",
ctx.getTenantEnvironmentId());
try {
// all logs in this thread
} finally {
MDC.clear();
}Multi-tenant service? Dev's responsibility to add signals in code, eg. Mapped Diagnostic Context
Gold Attr. #2: Architecture
https://c4model.com/ - The C4 model is an easy to learn, developer friendly approach to software architecture diagramming (by Simon Brown)
- System - logical product capability
- Container - service, logical database, topic, module
- Deployment Node - where it runs (can nest)

Gold Attr. #2: Architecture
https://c4model.com/ - The C4 model is an easy to learn, developer friendly approach to software architecture diagramming (by Simon Brown)
- System - logical product capability
- Container - service, logical database, topic, module
- Deployment Node - where it runs (can nest)
collibra.c4.system collibra.c4.container collibra.c4.deployment
Gold Attr. #2: Architecture
- Pod Resource Attributes - Easy with 1:1 mapping
- Modular Monoliths Signal Attributes - It's not only out single tenant core, but also k8s jobs
labels:
c4.collibra.com/system: telemetry
c4.collibra.com/container: colkyvernocollibra.c4.system: telemetry
collibra.c4.container: colkyvernoModular Monoliths ( it becomes the resposability for devs
Enrichment and Routing
Telemetry Backbone


21
Pub/Sub
Queuing system is an essential part of the backbone

8
Master Data
Can be sourced from different systems to merge into the data

7
Pipelines and Backends
Paralel pipelines do the processing, enrichment, filtering, calculation and backup to our backends
In-Flight Enrichment from Master Data
- JSON field promotion: body fields → signal OpenTelemetry attributes
- Master data lookup: keyed by collibra.tenant.environment_id
Devs don't need to know contract terms or support levels — they just log the tenant environment ID, and the backbone dynamically infers and injects the rest.
OpenAPI Reverse-Mapping
- URL cardinality explosion bloats metric DBs and costs
- Reverse-map Istio URL + method → OpenAPI operationId
- Low-cardinality, semantic endpoint stream → automated SLOs across all microservices
- Also: aggressively drop runtime spam & infra-sweep noise before vendors see it
"We don't measure URLs. We measure contracts."

3
OTLP Backup
Backup of the raw data, on cheap storage.

5
Batch Imports
We import into our data lake in batch as it's cost efficient.

9
Data lake
Our data lake is where all the calculations are done for reporting, including cost attribution.
Retention is infinit.
Cost Attribution - Closing the Loop
- Telemetry volume cost — aggregate signal volume per C4 system × tenant
- Compute cost slicing — CPU / mem / disk / network by tenant and C4
- C4-aware provisioning — Collibra Infra CRDs carry C4 metadata; cloud billing maps to logical owner
Open problem: defensible "virtual dollar" formula for cross-team chargebacks.
and wiring
Semantic Conventions
Semantic Conventions

Semantic Conventions

Wiring - SemConv + Weaver

Wiring - More YAML

and takeaway
Future
OpAMP — Pushing Control to the Collection Edge
- Bandwidth problem: backbone filtering saves on vendors, but raw telemetry still costs WAN egress
- OpAMP: dynamic management & configuration of the entire collector fleet
- Adaptive edge sampling: normal tenant → aggressive sampling; incident → dial up fidelity at source
Key Takeaways
① Golden attributes on day one
Define tenancy and architecture dimensions before you split into microservices, not after.
② Decouple with a backbone
Buffer-first ingestion (Pub/Sub) + centralized enrichment unlocks both ops and FinOps / BI.
③ Invest in semantic contracts
They structure your signals today and become the foundation for AI diagnostic agents tomorrow.
Taming Tenancy, Cost and Architecture at Collibra
By Alex Van Boxel
Taming Tenancy, Cost and Architecture at Collibra
- 10