Secret Techniques from the Toughest Tableau Server Deployments
Tamas Foldi
Starschema, CTO
tfoldi@starschema.com - @tfoldi
Who am I?
OK, so large deployments?
How do you survive with this complexity?
Automate.
Analyse.
Extend.
Automation
Basic AWS Tableau Infrastructure
Tons of manual processes
Problem
Human work does not scale well.
Error prone
Upgrades, test and dev systems deploymens are not agile enough
Lack of integration with modern CI/CD pipelines
Solution
Automate all processes with infrastructure provision scripts
Do not perform in-place upgrades: always build new from scratch and drop the old. Seriously.
Automate!
Automate using
- CloudFormation scripts
- Chef, Ansible, Terraform
- Docker
- ANYTHING JUST
DON'T DO MANUALLY
Docker Demo
Communication
Communication vs Tableau
Problem
Tableau does not have a good in-built communication system where administrators and CoE leaders can access their users
No out-of-the-box solution to access users in case of system outages
Solution
Add custom communication capabilities to Desktop and Server products
Custom Firewall and Loadbalancer scripts
Outage Notification
(Demo)
Notification Best Practices
- Custom Load Balancer Setup
- "Trust" or "Status" site
- Eliminate Emails
(use IM or in-vizportal)
CoE Communication
Customized Desktop
Understanding the infrastructure
Understand? What? Why?
Problem
Administrators usually have zero control on what gets published or where their servers connects. Getting answers to "what are my badly designed workbooks?", "what happens if a team changes their data model?" or "Is my Tableau Server fast enough for my overpaid manager?" can help to survive.
TabMon and many other tools do not scale.
Solution
Watch serverlogs, infrastructure events, repository and workbook/datasource XML sources and build sophisticated preventive and detective controls on top of them.
HOW?
The new serverlogs
New "res" block
{ ts: '2018-06-20T14:30:01.122',
pid: 3032,
tid: '20f4',
sev: 'info',
req: 'WypIyAulS@J5itywamnL1wAAASA',
sess: '5D4BA677DDAF4A8BAEDBA44064CEF65F-0:0',
site: 'Default',
user: 'Adminka',
k: 'end-ds-lazy-connect',
l: {},
a:
{ depth: 3,
elapsed: 0.004,
id: 'P////+XZkWNP/////4ekGx',
name: 'ds-lazy-connect',
res:
{ alloc: { e: 72600, i: 515000, ne: 648, ni: 4292 },
free: { e: 61900, i: 447000, ne: 540, ni: 3737 },
kcpu: { e: 0, i: 0 },
ntid: 1,
ucpu: { e: 2, i: 5 } },
rk: 'ok',
rv: {},
sponsor: 'P////+WHU/0JqVb8cD3IGA',
type: 'end',
vw: 'Economy',
wb: 'Regional' },
v:
{ caption: 'Stocks',
elapsed: 0.004,
name: 'dataengine.42038.846130138889' } }
What is inside?
a:
{ depth: 3,
elapsed: 0.004,
id: 'P////+XZkWNP/////4ekGx',
name: 'ds-lazy-connect',
res:
{ alloc: { e: 72600, i: 515000, ne: 648, ni: 4292 },
free: { e: 61900, i: 447000, ne: 540, ni: 3737 },
kcpu: { e: 0, i: 0 },
ntid: 1,
ucpu: { e: 2, i: 5 } },
alloc: memory
free: free'd up memory
ucpu: CPU consumption (application side)
kcpu: CPU consumption (kernel side)
ntid: used threads
e = exclusive (self), i = inclusive (self + children), ne = number of exclusive calls, ni = number of inclusive calls.
Real-time, proactive alerting
But what about the metadata?
Lineage, database tables and columns are in the XML files
Metadata directly available in the Dashboard
DEMO
How to understand your users?
Tableau Tracker Demo
Advanced Topics
Off-site backup
Problem
Traditional Tableau backups can quickly grow into hundreds of GB-s and take hours to create on your primary node.
Backup frequency will necessary go down and if a problem occurs between backups multiple day’s worth of data might be lost.
Solution
Off-load backup to external computers: stream postgres repository outside the cluster and replicate filestore. Build tsbak files on separate machine. This also supports point-in-time recovery.
Disaster Recovery /
Multi-region Replicatoin
Traffic Management
Problem
CEO spends 2$m on data visualisation then cries that her reports are slow
Backgound jobs fail to make their SLA
Web Authoring slows down the system when users are experimenting
Solution
Isolate unpredictable load (QA site, interactive users, special set of background jobs) to dedicated hosts/processes
Route specific user groups to dedicated resources
Traffic Management
Questions?
Tamas Foldi
tfoldi@starschema.com / twitter: @tfoldi
DCTUG
By Tamas Foldi
DCTUG
DCTUG
- 590