prometheus @ home
Linux Engineer @ Snow
twitter.com/@erwindekeijzer
github.com/gnur
https://snow.nl
Erwin de Keijzer
How do you get to know new
open source
technology?
Scratch your own itch
learning anything is a lot easier to be motivated about if you have a problem to solve
My itches
- the washing machine
- monitoring golpje
- making, and verifying, backups
The washing machine
13:30
14:48
15:04
15:23
So lets get some notifications when the machine is done
The setup
- Raspberry pi zero w
- usb - p1 cable
- prometheus
- custom go binary for exposing data
Raspberry pi
zero wifi ->
usb -> p1 cable
smart meter
p1 port ->
prometheus-p1-exporter
- reads /dev/ttyUSB0
- stores relevant info in global vars
- exposes these metrics using promhttp library
- prometheus scrapes endpoint
Prometheus
- pulls metrics from targets
- targets can be predefined or dynamic
- flexible query language
- integrated web-ui
- grafana integration for dashboards
LIVE DEMO
the result
it doesn't work
- power usage is not constant
- can only monitor total usage
- Other machines have similar usage patterns
Doesn't matter,
had fun
golpje
- go lang
- searches piratebay for tv shows
- downloads new episodes
need for monitoring
- I had no idea how much resources it was using
- I was pretty sure it had a memory leak
- sometimes it just stopped working
the setup
- systemd slice for resource usage tracking
- cAdvisor for exporting these metrics
Text
- made by Google
- container advisor
- exposes metrics of containers
- (also keeps historical data, not using that)
systemd setup
[Unit]
Description=Limited resources Slice
DefaultDependencies=no
Before=slices.target
[Slice]
CPUShares=1024
MemoryLimit=512M
[Unit]
Description=golpje agent
Requires=network-online.target
After=network-online.target
[Service]
Environment=GOMAXPROCS=2
User=upload
Group=upload
Restart=always
ExecStart=/usr/local/bin/golpje start
KillSignal=SIGINT
PrivateTmp=true
PrivateDevices=true
ProtectHome=true
ProtectSystem=full
ReadWriteDirectories=/var/lib/golpje /mnt/video
Slice=golpje.slice
CPUAccounting=true
MemoryAccounting=true
[Install]
WantedBy=multi-user.target
golpje.slice
golpje.service
live demo
but it's not enough
- "Black box" monitoring can only provide so much information
- "White box" monitoring can provide much more data, but needs some more work
Let's add some metrics to golpje
- # searches
- # failed searches
- # searchresults
type Searchmetrics struct {
Enabled bool
Searches prometheus.Counter
FailedSearches prometheus.Counter
SearchResults prometheus.Counter
}
var sm searcher.Searchmetrics
if con.config.GetBool("metrics_enabled") {
sm = searcher.Searchmetrics{
Enabled: true,
Searches: prometheus.NewCounter(
prometheus.CounterOpts{
Name: "golpje_searches",
Help: "total number of searches",
},
),
FailedSearches: prometheus.NewCounter(
prometheus.CounterOpts{
Name: "golpje_failed_searches",
Help: "total number of searches that failed",
},
),
SearchResults: prometheus.NewCounter(
prometheus.CounterOpts{
Name: "golpje_search_results",
Help: "total number of results that have been found",
},
),
}
prometheus.MustRegister(sm.Searches)
prometheus.MustRegister(sm.FailedSearches)
prometheus.MustRegister(sm.SearchResults)
}
func Start(db *storm.DB, piratebayURL string, results chan Searchresult, searchInterval time.Duration, m Searchmetrics) {
pb := piratebay.New(piratebayURL)
for {
shows, err := shows.All(db)
if err != nil {
continue
}
for _, show := range shows {
if m.Enabled {
m.Searches.Inc()
}
torrents, err := pb.Search(show.Name)
if err != nil {
if m.Enabled {
m.FailedSearches.Inc()
}
continue
}
for _, torrent := range torrents {
if m.Enabled {
m.SearchResults.Inc()
}
results <- Searchresult{
Title: torrent.Title,
Magnetlink: torrent.MagnetLink,
Vipuser: torrent.VIP,
Seeders: torrent.Seeders,
ShowID: show.ID,
}
}
}
time.Sleep(searchInterval)
}
}
if con.config.GetBool("metrics_enabled") {
go func() {
http.Handle(con.config.GetString("metrics_path"), promhttp.Handler())
log.Fatal(http.ListenAndServe(con.config.GetString("metrics_port"), nil))
}()
}
Another demo
alerting with prometheus
- prometheus not high available
- alertmanager is highly available (clustered)
- alertmanager does smart deduplication
live demo
sources
- https://prometheus.io
- https://github.com/prometheus/alertmanager
- https://github.com/gnur/golpje
- https://github.com/gnur/prometheus-p1-exporter
- https://github.com/prometheus/client_golang
- https://github.com/google/cadvisor
- https://github.com/mholt/caddy
- https://mosh.org/
prometheus at home
By Erwin de Keijzer
prometheus at home
- 659