Intro to Ganglia

Why metrics collection is important

What is a metric

  • Anything quantifiable that helps understanding your system
  • Its a measure of a system resource
  • It may or may not have a unit

iostat -dmx 1
df -h

    Why Collect Metrics?

    Scenario 1

    You feel that your systems are not performing optimally. You need to convince your manager to get new hardware. What will you do?

    Scenario 2

    You are bumping up density, but your DC expert warns that it can impact power utilization.

    It can even cause the power supply to trip.

    Why Collect Metrics

    • It complements monitoring/alerting
    • It allows to analyse historical trends, issues
    • Identify problems (at times before they happen)
    • Analyse Targets
    • Decide/Define Future Goals
    • Better decisions

    What is Ganglia

    • Its a scalable/distributed system for monitoring grids/clusters
    • Allows for viewing historical statistics via a web interface

    Why Use Ganglia

    • Its very light on resources
    • Has an interesting scalable/distributed architecture
    • It comes with a set of modules/plugin which collect common metrics for linux servers, making easy to collect lot of metrices without putting lot of effort

    Ganglia Architecture

    Different Components used in ganglia

    • gmond (ganglia monitoring daemon)
    • gmetad (ganglia meta daemon)
    • RRDs ( a binary timeseries db format for storing a single metric)
    • Ganglia web frontend (interface to visualize the data)
    • gmetric utility for submitting random metrics

    Ganglia Architecture

    image/svg+xml gmond gmond gmond gmond gmond gmond gmond head gmond head gmond head gmond head XDR over UDP gmetad Poll XML over TCP RRDs Web frontendin php


    • Lightweight threaded daemon
    • Collects metrics periodically on system
    • Listens to udp channel and writes collected metrics in-memory hash table
    • Gmond can unicast/multicast data over to other gmond nodes in XDR format
    • The data is only pushed, if there is a change in value or it crosses a time threshold
    • Gmond nodes can also announce this data when queried as an xml output to trusted hosts (gmetad)
    • 16+ 136*n_nodes +364*n_metrics bytes total memory utilization

    Gmond Architecture


    • Gmetad is the daemon which polls the xml which gmond/gmetad announces
       telnet <master_gmond_ip> 8649
    • It can do so only if it is in the trusted_host list
    • Gmetad writes this data to a RRD database
    • Ganglia frontend queries gmetad to find relevant metrics


    • Simple utility to submit metric to gmond on the server
    • You can also write modules for gmond in python/c
    • Using gmetric can be slow but its simpler
    • To simply collect a metric write a cronjob for a gmetric script depending upon the frequency
    #metric to get count of total entries in the tables app
    value=`mysql --database=demo -BNe "select count(*) from tbl_msg;"`
    /usr/bin/gmetric --name "Message count" --value $value --type "uint16" --unit "entries" --group "messageboard app"


    • RRDTool writes metrics submitted to a round robin database, which ensures a constant footprint of a metrics
    • It is used by many metric collection utilities like cacti, ganglia etc. , since it has bindings in most of the languages
    • It also provides tools to generate graphs
    • RRD use multiple RRAs to store the data of different time periods, it uses a consolidating function to consolidate data for one RRA to another.
    • gmetad uses rrdcached to cache the input data before flushing it to disk to optimize io


    Ganglia Frontend

    • Visualize your collected metrices
    • Aggregates data trends from gmetad
    • You can search/consolidate various graphs using various RRD sources
    • Compare values between different nodes
     ab -n 4000 -c 24 http:///<load_balancer_ip>/  


    • Architecture:
    • Gmetric:
    • rrdtool :

    Intro to Ganglia

    By Ayush Goyal

    Intro to Ganglia

    • 1,752