In addition to Tracing and Logging, Metrics are one of the three* pillars of modern observability.
* Health Checks are the Fourth Pillar (ง'̀-'́)ง
Many teams use metrics solutions to measure application performance. But metrics add far more value than application performance monitoring.
When we break it down, there are three main metrics to collect.
* Spoiler Alert: This list is in least to most significant
Application Performance provides insight into how the application is running.
Examples:
- OS Threads created
- Memory in Heap
- Garbage collection events
- CPU Utilization
Service Calls include both internal executions as well as external calls.
Examples:
Business Events include significant actions or results generated from user interactions and requests.
Examples:
By measuring Application Performance, System Calls, and Business Events, we increase the likelihood of identifying incidents.
Not all incidents appear as CPU or Memory issues. More often than not, they appear as unexpected results in user workflows.
One of the simplest metrics types in Prometheus is Counters. A Counter can only increment and never decrease in value.
This metric type is suitable for counting the number of times something has occurred, such as:
// Register Counter
counter := promauto.NewCounter(prometheus.CounterOpts{
Name: "example_counter",
Help: "Counter Example will increment when called",
})
// Increment by 1
counter.Inc()
Labels are a way to add details to metrics without creating duplicate metrics.
For example, with an API Authentication Request Counter, we may want to know how many authentication requests were successful or how many failed.
// Register Counter with Labels
auths := promauto.NewCounterVec(prometheus.CounterOpts{
Name: "api_auths",
Help: "API Authentication Status",
},
[]string{"source_ip", "authentication_status", "api_endpoint"},
)
// Increment by 1
auths.WithLabelValues("10.0.0.1", "failed", "/v1/orders").Inc()
Like Counters, Summary metrics will count the number of times an event occurs, but they primarily track a value over time.
With Summary, it is possible to track the time an event has taken and its occurrence as a single metric.
// Register Summary with Labels
requests := promauto.NewSummaryVec(prometheus.SummaryOpts{
Name: "api_requests",
Help: "API Request latency",
Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
},
[]string{"api_endpoint", "http_code", "http_method"},
)
// Start a time tracker
now := time.Now()
// Track time by value in seconds.
requests.WithLabelValues("/v1/orders", "200", "POST").Observe(time.Since(now).Seconds())
A Gauge is similar to Counters, except you can both increment the value and decrement the value.
Typically, we will use a Gauge to measure a current value that can change over time (i.e., pool size, outstanding requests, etc.).
// Register Gauge with Labels
users := promauto.NewGaugeVec(prometheus.GaugeOpts{
Name: "users",
Help: "Current Users utilizing the system",
},
[]string{"company", "activity"},
)
// Increment by 1
users.WithLabelValues("example", "chat").Inc()
// Decrement by 1
users.WithLabelValues("example", "chat").Dec()
Rather than explicitly calling .Dec() at the end of a function, users of Go can use the defer call to execute on function completion.
go func() {
// Increment Gauge by 1
users.WithLabelValues("example", "chat").Inc()
defer users.WithLabelValues("example", "chat").Dec()
// Do work
}()
Twitter: @madflojo
LinkedIn: Benjamin Cane
Blog: BenCane.com
Distinguished Engineer - American Express