StatsCraft
Monitoring Conference
- website and agenda: http://statscraft.org.il
- twitter: @statscraft (#statscraft)
- facebook: https://www.facebook.com/statscraft.il
- email: statscraftcon@gmail.com

Agenda
- Understand the problem.
- Understand what monitoring is.
- Example use-case(s)
- A different approach
- Learn methodologies and tools
The Problem
Nir Cohen @ Gigaspaces
@thinkops
http://github.com/nir0s
We monitor because...
We want to satify the customer.
(make money?)
Still underrated...
- Automated Resource Provisioning
- Configuration Management
- Automated Code Deployment
- Continuous Whatever
Monitoring
- Automated Resource Provisioning
- Configuration Management
- Automated Code Deployment
- Continuous Whatever
- Monitoring
PROBLEM!
Blame the tools?
























Problem origin

DISCLAIMER
We're monitoring the wrong things.
_rootCauseAnalysis:
the alternative is harder.
We're considering logs a second class citizen.
_rootCauseAnalysis:
the alternative is harder.
Our data is lacking.
_rootCauseAnalysis:
inertia. that's how it was, that's how it is.
We separate monitoring from application
_rootCauseAnalysis:
we're not used to this. (Ops problem)
We monitor reactively, not proactively
_rootCauseAnalysis:
reaction requires less initial energy than anticipation.
We put uptime above system and product quality
_rootCauseAnalysis:
it's much easier.
We deal with hard limits.
_rootCauseAnalysis:
arbitrary numbers are easier to set.
Monitoring is non-functional but resource hungry
_rootCauseAnalysis:
we just don't accept it.
Good monitoring requires the right people, not just Ops!
_rootCauseAnalysis:
delegation is natural. other have more important things to do.
Alert fatigue is common.
_rootCauseAnalysis:
solving issues is much easier than solving problems, and apparently, we are additted to non-actionable alerts.
We're auto-scaling prematurely
_rootCauseAnalysis:
brute force is natural
We're choosing the wrong tools.
_rootCauseAnalysis:
it's easier to choose the tool than to choose what to monitor.
Good monitoring is hard
_rootCauseAnalysis: systems become complex, so they're harder to monitor.
So, after all, why do we not monitor properly?
_rootCauseAnalysis:
-
Simplification
-
Delegation
-
Rationalization
No fear,
Let's see how we can make this all better

is here!
If a service crashes and no one is around to monitor it, does it raise an alert?
The Problem
By Nir Cohen
The Problem
StatsCraft 2015 Keynote on the current problems in monitoring
- 4