Basic Metrics

Developers Guide to using Metrics for Observability

A Pillar of Observability

In addition to Tracing and Logging, Metrics are one of the three* pillars of modern observability.
 

  • Metrics give the ability to track any value over time at scale.
  • Incrementing a metric is more efficient than tracing or logging.
  • Tooling exists to aggregate metrics across multiple instances, providing an aggregated visualization.
* Health Checks are the Fourth Pillar (ง'̀-'́)ง

Understanding What to Measure

Doing it wrong

Many teams use metrics solutions to measure application performance.  But metrics add far more value than application performance monitoring.

3 Types of Metrics

When we break it down, there are three main metrics to collect.

  • Application Performance
  • Service Calls
  • Business Events

* Spoiler Alert: This list is in least to most significant

Application Performance

Application Performance provides insight into how the application is running.

 

Examples:

- OS Threads created

- Memory in Heap

- Garbage collection events

- CPU Utilization

 

Service Calls

Service Calls include both internal executions as well as external calls.

 

Examples:

  • Incoming Requests
  • Request Processing Time
  • Database Query Execution Time
  • Dependency Calls
  • Internal Task Executions

Business Events

Business Events include significant actions or results generated from user interactions and requests. 

 

Examples:

  • Transaction Requests
  • Transaction Results
  • Failed Transactions
  • Dropped Transactions

With These Powers Combined

By measuring Application Performance, System Calls, and Business Events, we increase the likelihood of identifying incidents.

 

Not all incidents appear as CPU or Memory issues. More often than not, they appear as unexpected results in user workflows.

Collecting Metrics

Heads Up: Examples are based on Prometheus

Counters

One of the simplest metrics types in Prometheus is Counters. A Counter can only increment and never decrease in value.

 

This metric type is suitable for counting the number of times something has occurred, such as:

 

  • User logins
  • API Requests
  • Transactions Received 
// Register Counter
counter := promauto.NewCounter(prometheus.CounterOpts{
	Name: "example_counter",
	Help: "Counter Example will increment when called",
})

// Increment by 1
counter.Inc()

Labels

Labels are a way to add details to metrics without creating duplicate metrics.

 

For example, with an API Authentication Request Counter, we may want to know how many authentication requests were successful or how many failed.

 

// Register Counter with Labels
auths := promauto.NewCounterVec(prometheus.CounterOpts{
	Name: "api_auths",
	Help: "API Authentication Status",
},
	[]string{"source_ip", "authentication_status", "api_endpoint"},
)

// Increment by 1
auths.WithLabelValues("10.0.0.1", "failed", "/v1/orders").Inc()

Summary

Like Counters, Summary metrics will count the number of times an event occurs, but they primarily track a value over time.

 

With Summary, it is possible to track the time an event has taken and its occurrence as a single metric.

// Register Summary with Labels
requests := promauto.NewSummaryVec(prometheus.SummaryOpts{
	Name:       "api_requests",
	Help:       "API Request latency",
	Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
},
	[]string{"api_endpoint", "http_code", "http_method"},
)

// Start a time tracker
now := time.Now()

// Track time by value in seconds.
requests.WithLabelValues("/v1/orders", "200", "POST").Observe(time.Since(now).Seconds())

Gauges

A Gauge is similar to Counters, except you can both increment the value and decrement the value. 

 

Typically, we will use a Gauge to measure a current value that can change over time (i.e., pool size, outstanding requests, etc.).

// Register Gauge with Labels
users := promauto.NewGaugeVec(prometheus.GaugeOpts{
	Name: "users",
	Help: "Current Users utilizing the system",
},
	[]string{"company", "activity"},
)

// Increment by 1
users.WithLabelValues("example", "chat").Inc()

// Decrement by 1
users.WithLabelValues("example", "chat").Dec()

Using Defer (Golang Specific)

Rather than explicitly calling .Dec() at the end of a function, users of Go can use the defer call to execute on function completion.

go func() {
	// Increment Gauge by 1
	users.WithLabelValues("example", "chat").Inc()
	defer users.WithLabelValues("example", "chat").Dec()

	// Do work
}()

Summary

  • Don't just treat metrics as a replacement for an APM
  • Measure Business Events along with application internals
  • Use Counters to count when something happens
  • Use Gauges to measure values that change (increase, decrease) over time
  • Use Summary to measure outcomes (latency, return value)
  • Labels make metrics more meaningful and provide more refined insights

EOF

Benjamin Cane

Twitter: @madflojo 
LinkedIn: Benjamin Cane
Blog: BenCane.com

Distinguished Engineer - American Express