Prometheus Instrumentation

13 Sept. 2018

What Is Prometheus?

Metrics-Based Monitoring

Unified System for Metrics and Monitoring
Pull-Based (Scrapes Targets)
Integrates Seamlessly with Grafana for Graphs
Includes a Powerful Expression Language
Supports Multidimensional Metrics
Alerts Based on Metrics
Instrumentation Libraries for White Box Monitoring

Metrics Scraping

Metrics Are Scraped Over HTTP
Uses Service Discovery to Find Targets
Simple Text-Based Format

chaumes@prometheus-901:~$ curl -sS localhost:9100/metrics | grep node_filesystem_avail

# HELP node_filesystem_avail Filesystem space available to non-root users in bytes.
# TYPE node_filesystem_avail gauge
node_filesystem_avail{device="/dev/sda1",fstype="ext4",mountpoint="/"} 2.3620919296e+10
node_filesystem_avail{device="none",fstype="tmpfs",mountpoint="/run/lock"} 5.24288e+06
node_filesystem_avail{device="none",fstype="tmpfs",mountpoint="/run/shm"} 5.20429568e+08
node_filesystem_avail{device="none",fstype="tmpfs",mountpoint="/run/user"} 1.048576e+08
node_filesystem_avail{device="rpc_pipefs",fstype="rpc_pipefs",mountpoint="/run/rpc_pipefs"} 0
node_filesystem_avail{device="srv_salt",fstype="vboxsf",mountpoint="/srv/salt"} 4.1484754944e+11
node_filesystem_avail{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 1.03616512e+08
node_filesystem_avail{device="vagrant",fstype="vboxsf",mountpoint="/vagrant"} 4.1484754944e+11

Instrumentation Libraries

Python
Ruby
Java/Scala
Go
Bash (unofficial/3rd-party)
Many Others (unofficial/3rd-party)

Metric Types

Counter

Represents a Cumulative Numerical Value
Monotonically Increases
- e.g. the value can never go down or reset
Useful for
- number of requests served
- tasks completed
- number of errors

Gauge

Represents a Single Numerical Value
Can Increase or Decrease Arbitrarily
Useful for
- memory or CPU cycles used
- number of threads or processes
- number of tasks (e.g. in a queue)
- number of objects (e.g. in a database)

Histogram

Samples Observations in Configurable Buckets
Cumulative Across Buckets
Exposes Multiple Time Series
- cumulative counters for the observation buckets
- total sum of all observed values
- count of events observed
Useful for
- Measuring Latencies/Response Times by Quantile
- Approximating Apdex Scores

Summary

Similar to a Histogram
Calculates Configurable Quantiles Over a Sliding Time Window
Cannot Be Aggregated (e.g. among multiple instances)
Exposes Multiple Time Series
- streaming quantiles of observed events
- total sum of all observed values
- count of observed events
Useful for
- similar metrics as histograms

Histogram or Summary?

It's Complicated!
Read Docs and Seek Guidance
Guidelines Distilled
- If you need to aggregate, use Histogram
- If you have an idea of the range and distribution of values that will be observed, use Histogram
- If you need an accurate quantile, regardless of the range and distribution of values, use Summary

Service Types

Online

Human or System Expects an Immediate Response
White Box Instrumentation Helps Diagnose Where a Problem Lies
Key Metrics
- number of performed queries (counter)
- number of errors/exceptions (counter)
- latency (histogram or summary)
Pro Tip: Count Queries When They *END*

Offline

Continually Running, but Nothing Awaits Response
Key Metrics
- Items In (counter)
- Items in Progress (gauge)
- Items Out (counter)
- Items Sent (gauge)
Pro Tip: Use a Heartbeat to Expose Processing Time

Batch

Like an Offline Service, but Not Continually Running
Cannot Be Scraped (Must Use Push Gateway)
Key Metrics
- UNIX Timestamp of Last Successful Run (gauge)
- UNIX Timestamp of Last Failed Run (gauge)
- Duration of Each Processing Stage (gauge)
- Overall Runtime (gauge)
- Number of Records Processed (counter)
- Number of Records Failed (counter)

Prometheus Instrumentation

13 Sept. 2018

What Is Prometheus?

Metrics-Based Monitoring

Metrics Scraping

Instrumentation Libraries

Metric Types

Counter

Gauge

Histogram

Summary

Histogram or Summary?

Service Types

Online

Offline

Batch

Examples

Best Practices

Metric Names and Labels

General Instrumentation

Prometheus Instrumentation

Prometheus Instrumentation

wryfi

Prometheus Instrumentation

13 Sept. 2018

What Is Prometheus?

Metrics-Based Monitoring

Metrics Scraping

Instrumentation Libraries

Metric Types

Counter

Gauge

Histogram

Summary

Histogram or Summary?

Service Types

Online

Offline

Batch

Examples

Best Practices

Metric Names and Labels

General Instrumentation

Prometheus Instrumentation

More from wryfi