Supercharge your Tableau Server Monitoring with Grafana and InfluxDB

Tamas Foldi, Starschema

@tfoldi

Who am I?

*old picture, I am way older than this.

  • Five times Tableau Zen Master
  • Co-founder of a consulting company (Starschema)
  • Like to hack and glue things together

Monitoring. But Why?

  • Status and Availability 
  • Real time usage statistics
  • Early alerts for performance issues

Level 1: Built-in Alerts

  • Free as fresh air

  • Tableau Status updates, disk space alerts 

  • Alerts in emails only

  • No hardware, performance or customer metrics

If your Tableau Server dies - your monitoring dies too.

Level 2: TabMon

  • Open source, pretty comprehensive

  • Stores data in Postgres (not good)

  • JMX, Windows Perf Counters

  • Windows only :(

  • Tableau Dashboard based - limited alerting

Might be an option for Windows based small deployments.

Level 3: Resource Monitor Tool

  • Licensed Add-on ($$)

  • Former PowerTools (acquisition)

  • Windows and Linux

  • Decent solution

Default in case you have the admin option

Level 9000: TIG Stack

Telegraf

InfluxDB

Grafana

Telegraf

Telegraf is an agent for collecting, processing, aggregating, and writing metrics.

 

Its design goals were to have a minimal memory footprint with a plugin system so that developers in the community can easily add support for collecting metrics.

InfluxDB

InfluxDB is an open-source time series database (TSDB).

 

It is optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring  and real-time analytics

Supports retention and down scaling

And Grafana is just such a beauty.

play.grafana.com

Ingredients

  • OS Level Metrics
    • CPU, Memory, IO, Disk Usage
  • Network Info
    • Load balancer latency from end-users
  • Tableau Server Status
    • Status as whole and for each services
  • JMX Counters
    • # of VizQL Sessions, Cache Hit ratio
  • Log files (optional
  • Postgres Data (optional)

all, without impacting the Server performance

Installing the

TIG stack

https://www.howtoforge.com/tutorial/how-to-install-tig-stack-telegraf-influxdb-and-grafana-on-ubuntu-1804/

 

https://medium.com/starschema-blog/monitor-your-infrastructure-with-influxdb-and-grafana-on-kubernetes-a299a0afe3d2

OS Level Monitoring

    [[inputs.cpu]]
      percpu = true
      totalcpu = true
      collect_cpu_time = false
      report_active = false
    [[inputs.disk]]
      ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
    [[inputs.diskio]]
    [[inputs.kernel]]
    [[inputs.mem]]
    [[inputs.processes]]
    [[inputs.swap]]
    [[inputs.system]]
telegraf.conf

Serverinfo.xml

TSM API

https://help.tableau.com/v0.0/api/tsm_api/en-us/docs/tsm-reference.htm

TSM API - /status

https://help.tableau.com/v0.0/api/tsm_api/en-us/docs/tsm-reference.htm#status

In Grafana

https://medium.com/starschema-blog/tableau-services-manager-tsm-api-the-undocumented-passwordless-authentication-9b76ed00119d

And finally, JMX.

Tableau JMX

JMX allows getting application-specific performance counters from vizql servers

 

Disabled by default, to enable:

tsm configuration set -k service.jmx_enabled -v true

 

Metrics to read

  • Number of bootstraps, active sessions
  • Backgrounder jobs (failed, succeeded, extract jobs)
  • Cache/hit ratio
  • Memory consumption
  • Query times

Last but not least.
Alerting.

All together:

https://medium.com/starschema-blog/

Thank you!

Got questions?

Tableau Server Monitoring with Grafana and InfluxDB

By Tamas Foldi

Tableau Server Monitoring with Grafana and InfluxDB

  • 905