prometheus-cpp

 

High Performance Metrics in C++

Jupp Mueller

@jupp0r (GitHub/Twitter)

/me

  • Senior Software Engineer @ LogMeIn
  • worked on large scale applications for > 10 years
  • I like outdoor things (climbing, surfing, hiking, trail running ...)
  • I'm a keyboard nerd
  • feel free to interrupt me or shout comments during the talk

Metrics

Prometheus

Server

Application

Application

Application

HTTP GET

HTTP GET

HTTP GET

Service Discovery

subscribes to

Development

  • learn how your software behaves
  • get hard data to make engineering decisions
  • find resource leaks
  • pinpoint performance bottlenecks
  • ...

Operations

  • monitoring
  • alerting

Prometheus

  • rewrite by Ex-Googlers of Googles' Borgmon monitoring system
  • conists of
    • client libraries
    • collection service
    • time series database
    • alerting service

Prometheus Server

  • has strong opinions on pets vs cattle
  • pull-based metrics collection
  • scrapes clients in configurable intervals
  • integrates with modern service discovery systems / container orchestrators
    • kubernetes
    • consul
    • etcd
    • ...

Prometheus

Server

Application

Application

Application

HTTP GET

HTTP GET

HTTP GET

Service Discovery

subscribes to

Metrics

Data
Model

Everything is a time series.

Time series each have a name and a set of label pairs.

Because different label pairs yield different, independent time series we talk of label dimensions.

api_http_requests_total{method="POST",
                        handler="/messages"}

Metric

Types

Counter

  • cumulative
  • single value
  • only ever goes up
  • used to represent total number of connections, etc 

Gauge

  • single value
  • can increase and decrease
  • used to represent things like active connections, etc

Histogram

  • has single observe method
  • has cumulative counters for configurable observation buckets
  • total sum of all observed values
  • observation count

Histogram

# HELP exposer_request_latencies Latencies of serving scrape requests, in milliseconds
# TYPE exposer_request_latencies HISTOGRAM
exposer_request_latencies_bucket{le="1.000000",} 0
exposer_request_latencies_bucket{le="5.000000",} 1
exposer_request_latencies_bucket{le="10.000000",} 1
exposer_request_latencies_bucket{le="20.000000",} 1
exposer_request_latencies_bucket{le="40.000000",} 1
exposer_request_latencies_bucket{le="80.000000",} 1
exposer_request_latencies_bucket{le="160.000000",} 1
exposer_request_latencies_bucket{le="320.000000",} 1
exposer_request_latencies_bucket{le="640.000000",} 1
exposer_request_latencies_bucket{le="1280.000000",} 1
exposer_request_latencies_bucket{le="2560.000000",} 1
exposer_request_latencies_bucket{le="inf",} 1
exposer_request_latencies_sum{} 2.000000
exposer_request_latencies_count{} 1

Instrumenting Applications

Client Libraries

Prometheus

Server

Application

Application

Application

HTTP GET

HTTP GET

HTTP GET

Service Discovery

subscribes to

Go    Java    Scala    Python    Ruby    Bash C++    Lisp    Elixir    Erlang    Haskell    Lua .NET    Node.js    PHP    Rust

  // create an http server running on port 8080
  auto exposer = Exposer{"127.0.0.1:8080"};

  // create a metrics registry with component=main labels applied to all its
  // metrics
  auto registry = std::make_shared<Registry>();

  // add a new counter family to the registry (families combine values with the
  // same name, but distinct label dimenstions)
  auto& counter_family = BuildCounter()
                             .Name("time_running_seconds")
                             .Help("How many seconds is this server running?")
                             .Labels({{"label", "value"}})
                             .Register(*registry);

  // add a counter to the metric family
  auto& second_counter = counter_family.Add(
      {{"another_label", "value"}, {"yet_another_label", "value"}});

  // ask the exposer to scrape the registry on incoming scrapes
  exposer.RegisterCollectable(registry);

  for (;;) {
    std::this_thread::sleep_for(std::chrono::seconds(1));
    // increment the counter by one (second)
    second_counter.Increment();
  }
  return 0;

Performance

INFO: Running command line: bazel-bin/tests/benchmark/benchmarks
Run on (8 X 2300 MHz CPU s)
2016-10-17 15:56:49
Benchmark                              Time           CPU Iterations
--------------------------------------------------------------------
BM_Counter_Increment                  11 ns         11 ns   62947942
BM_Counter_Collect                    84 ns         84 ns    8221752
BM_Gauge_Increment                    11 ns         11 ns   61384663
BM_Gauge_Decrement                    11 ns         11 ns   62148197
BM_Gauge_SetToCurrentTime            199 ns        198 ns    3589670
BM_Gauge_Collect                      86 ns         85 ns    7469136
BM_Histogram_Observe/0               122 ns        122 ns    5839855
BM_Histogram_Observe/1               116 ns        115 ns    5806623
BM_Histogram_Observe/8               126 ns        126 ns    5781588
BM_Histogram_Observe/64              138 ns        138 ns    4895550
BM_Histogram_Observe/512             228 ns        228 ns    2992898
BM_Histogram_Observe/4k              959 ns        958 ns     642231
BM_Histogram_Collect/0               328 ns        327 ns    2002792
BM_Histogram_Collect/1               356 ns        354 ns    1819032
BM_Histogram_Collect/8              1553 ns       1544 ns     454921
BM_Histogram_Collect/64            10389 ns      10287 ns      66759
BM_Histogram_Collect/512           75795 ns      75093 ns       9075
BM_Histogram_Collect/4k           615853 ns     610277 ns       1222
BM_Registry_CreateFamily             195 ns        182 ns    3843894
BM_Registry_CreateCounter/0          319 ns        317 ns    1914132
BM_Registry_CreateCounter/1         2146 ns       2131 ns     408432
BM_Registry_CreateCounter/8         8936 ns       8837 ns      82439
BM_Registry_CreateCounter/64       72589 ns      72010 ns       9248
BM_Registry_CreateCounter/512     694323 ns     686655 ns       1056
BM_Registry_CreateCounter/4k    18246638 ns   18150525 ns         40

Queries

avtp3_connections_created_total - avtp3_connections_closed_total
sum(avtp3_channels_created_total)
  - sum(avtp3_channels_closed_total)
avtp3_connections_created_total
avtp3_connections_created_total
  - avtp3_connections_closed_total
avtp3_transferred_bytes_sum{direction="incoming",protocol="udp"}
avtp3_transferred_bytes_sum{direction="outgoing",protocol="udp"}
avtp3_transferred_bytes_sum{direction="incoming",protocol="tcp"}
avtp3_transferred_bytes_sum{direction="outgoing",protocol="tcp"}
sum(irate(avtp3_transferred_bytes_count{}[1d]))

sum(irate(avtp3_lost_packets_total{}[1d]))
histogram_quantile(
    0.99,
    sum(
    	rate(
            task_queueing_delay_ns_bucket{
                instance=~"^($bridge).*$",
                type="immediate"
            }[1m])
    ) by (instance, le)
)/1000000

Thanks for your attention! Questions?

prometheus-cpp

By Jupp Müller

prometheus-cpp

  • 2,854