Monitoring your HBase cluster with a Raspberry Pi

Complex

want to monitor

  • HBase Master
  • HBase Region Servers
  • Hadoop NameNodes
  • Hadoop DataNodes
  • Hadoop JobTracker

jmx monitoring

  • HBase Master provides information of dead region servers
    • If you cannot connect with Master to get information, you know it's dead (account for stand-by)
  • Hadoop NameNode provides information of dead DataNodes
    • If you cannot connect with the NameNode to get DataNode information = it's dead

Jenkins + Java

  • Jenkins job that periodically runs (ex. 15 min)
  • Java program is invoked from job that evaluates health of cluster through JMX
  • Some other health checks are applied in the Jenkins job (ex. pings TaskTracker page to make sure it is up)
  • Sends additional alerts
    • Email alerts to individuals
    • IRC notifications in our #hbase-medics channel

raspberry pi + Jenkins

  • Python program that pings Jenkins API for health of job
  • If un-healthy, set high signal on pin, turn on light
  • If healthy, set low signal on pin, turn off light (or remain off)
  • Only evaluate health and turn on light during business hours (when people are typically around)

~$26.00

$1.00

~$10.00

demo

#!/usr/bin/env python
import urllib
import json
import sys
import time
import RPi.GPIO as GPIO
import datetime

def main_loop():

    while True:
        now = datetime.datetime.now()

        # If it is after hours, set the signal to low and skip evaluating
        if now.hour < 8 or now.hour > 20:
            GPIO.output(12, GPIO.LOW)
            time.sleep(3600)
            continue

        try:
            resp = urllib.urlopen('http://MYJENKINS_HOST/job/MYJENKINS_JOB/api/json')

            if resp.getcode() != 200:
                print 'Invalid HTTP response code at ', now.strftime("%Y-%m-%d %H:%M"), str(resp.getcode())
                continue

            # Sometimes we get invalid responses from Jenkins: ValueError("No JSON object could be decoded")
            jsonResp = json.load(resp)
            color = jsonResp['color']

            # Checking that it does not equal red, vs. checking that it is blue (successful), as it appears the color changes
            # when checking when the build is occurring.
            if color != 'red':
                # Set the signal low
                GPIO.output(12, GPIO.LOW)
            else:
                # The job is failed, spin the light
                print 'Jenkins job is failed at ', now.strftime("%Y-%m-%d %H:%M")
                GPIO.output(12, GPIO.HIGH)
        except Exception, err:
            print 'Failed to get JSON from Jenkins at ', now.strftime("%Y-%m-%d %H:%M"), ' with ', str(err)
            pass

        time.sleep(30)

if __name__ == '__main__':
    try:
        GPIO.setwarnings(False)
        GPIO.setmode(GPIO.BOARD)
        GPIO.setup(12, GPIO.OUT)
        main_loop()
    except KeyboardInterrupt:
        print >> sys.stderr, '\nShutting down...\n'
        GPIO.cleanup()
        sys.exit(0)
Made with Slides.com