Sistema de monitoreo Centro de Datos
"Centinela"

SNMP

Simple Network Management Protocol, o SNMP, es el protocolo estandar utilizado para manipular dispositivos de red, o dispositivos embebidos.

Con SNMP un dispositivo expone datos (Agente) en forma de variables que describen la configuración de su sistema.   Estas variables pueden ser leídas o modificadas por un servidor de SNMP (Manager).

MIBS 

Un MIB es un archivo de texto formateado, que lista todos las variables (data objects) utilizadas por un dispositivo en particular.

 

  1. El fabricante del dispositivo suministra el archivo MIB.
  2. Se carga el MIB al servidor de SNMP (manager)
  3. El servidor SNMP usara el MIB para interpretar los mensajes y variables del nuevo dispositivo
GAMATRONIC-POWER-PLUS-MIB DEFINITIONS ::= BEGIN
     
IMPORTS
    MODULE-IDENTITY, OBJECT-TYPE, NOTIFICATION-TYPE,
    OBJECT-IDENTITY,
    Counter32, Gauge32, Integer32, 
    enterprises,IpAddress 				       
    	FROM SNMPv2-SMI   
    	
    TEXTUAL-CONVENTION, DisplayString
    	FROM SNMPv2-TC 
    	
    gamatronicLTD
    	FROM GAMATRONIC-MIB;  
     
	
	
	powerplusMIB OBJECT IDENTIFIER 	::= { gamatronicLTD 5 }  
	
	
	
	
	--
	-- The Device Identification group.
	--   All objects in this group except for ppIdentSite 
	--	are set at device initialization
	--   and remain static.
	--
	
	ppIdent OBJECT IDENTIFIER ::= { powerplusMIB 1 }
	
	ppIdentModelID OBJECT-TYPE
		SYNTAX INTEGER
		MAX-ACCESS read-only
		STATUS current
		DESCRIPTION
			"UPS model identifier"
		::= { ppIdent 1 }
		
	ppIdentControllerSoftwareVersion OBJECT-TYPE
		SYNTAX DisplayString (SIZE (8))
		MAX-ACCESS read-only
		STATUS current
		DESCRIPTION
			"System controller software version"
		::= { ppIdent 2 }  
		
	ppIdentAgentSoftwareVersion OBJECT-TYPE
		SYNTAX DisplayString (SIZE (5))
		MAX-ACCESS read-only
		STATUS current
		DESCRIPTION
			"System controller software version"
		::= { ppIdent 3 }
		
	ppIdentControllerID OBJECT-TYPE
		SYNTAX DisplayString (SIZE (4))
		MAX-ACCESS read-only
		STATUS current
		DESCRIPTION
			"System controller identifier"
		::= { ppIdent 4 }
		
	ppIdentSite OBJECT-TYPE
		SYNTAX DisplayString (SIZE (6))
		MAX-ACCESS read-only
		STATUS current
		DESCRIPTION
			"Power Plus site"
		::= { ppIdent 5 }

Ejemplo MIB UPS's Gamatronic

UPS 1

UPS 2

Aplicación backend

  • Aplicación en python con bibliotecas pymongo y pysnmp para conexión a los servicios
  • "Dockerizada" pero instalada de forma nativa en el servidor centinela, por problemas con el sistema de archivos interno de docker con el S.O. Centos.
  • Archivos json para describir cada uno de los dispositivos, con información snmp, ruta de almacenamiento en mongo, descripción del equipo, variables a consultar y un evaluador de operaciones matemáticas sencillo.

Base de datos Mongo

Mongo es un sistema de bases de datos, de código abierto orientado a documentos, clasificada como NOSQL,  Su esquema para colecciones son archivos de tipo JSON.   

Entre sus características mas importantes: Sistema de querys, Indexado, Alta disponibilidad, Balanceo cargas, Agregación (Flujos de datos, Operaciones MapReduce)

Ejemplo archivo json dispositivos

{
  "device-descriptor" : 
    {
      "serverpath" : "ST-1A-enf1.cicese.mx",
      "serverport" : "XXXX",
      "mibfile" : "PCOWEB-MIB",
      "community" : "XXXXXXXXX",
      "servername" : "Sistema aire acondicionado de precision 3",
      "description" : "",
      "collection_name" : "precision_ac_3",
      "queryform" : {

        "SystemStatus" : {
          "status" : "SysStatus.0",
          "mode" : "sysmode.0",					
          "roomtemperature" : "roomtemperature.0 * 0.1",
          "roomhumidity" : "roomhumidity.0 * 0.1",
          "fanstatus"	: "BlowerStatus.0",
          "humidifierstatus" : "HumOut.0",
          "compressor1status" : "Comp1Status.0",
          "compressor2status" : "Comp2Status.0",
          "heater1status" : "Heat1Status.0",
          "heater2status" : "Heat2Status.0",
          "heater3status" : "Heat3Status.0",
          "heater4status" : "Heat4Status.0",
          "tempsetpoint" : "tempsetpoint.0 * 0.1",
          "temphighalarmset" : "temphighalarmset.0 * 0.1",
          "templowalarmset" : "templowalarmset.0 * 0.1",
          "humiditysetpoint" : "humiditysetpoint.0 * 0.1",
          "humidityhighalarmset" : "humidityhighalarmset.0 * 0.1",
          "humiditylowalarmset" : "humiditylowalarmset.0 * 0.1"
        },
        "Operation" : {
          "tempcontroltime" : "temptime.0",
          "humcontroltime" : "humtime.0",
          "coolingoutput" : "aouty1.0",
          "heatingoutput" : "aouty2.0",
          "humidifieroutput" : "aouty3.0",
          "economizeroutput" : "aouty4.0",
          "humdemand" : "humdemand.0",
          "coolingdemand" : "( Comp1Status.0 + Comp2Status.0 ) * 50",
          "heatingdemand" : "( Heat1Status.0 + Heat2Status.0 ) * 50"
        },
        "Alarms" : {
          "NoAirFlowAlarm" : "NoAirFlowAlarm.0",
          "HighHeatAlarm" : "HighHeatAlarm.0",
          "SmokeAlarm" : "SmokeAlarm.0",
          "Comp1LPAlarm" : "Comp1LPAlarm.0",
          "Comp1HPAlarm" : "Comp1HPAlarm.0",
          "Comp2LPAlarm" : "Comp2LPAlarm.0",
          "Comp2HPAlarm" : "Comp2HPAlarm.0",
          "Comp1SCAlarm" : "Comp1SCAlarm.0",
          "Comp2SCAlarm" : "Comp2SCAlarm.0",
          "DrainAlarm" : "DrainAlarm.0",
          "HighTempAlarm" : "HighTempAlarm.0",
          "LowTempAlarm" : "LowTempAlarm.0",
          "HighHumAlarm" : "HighHumAlarm.0",
          "LowHumAlarm" : "LowHumAlarm.0",
          "WaterFlowAlarm" : "WaterFlowAlarm.0",
          "FireAlarm" : "FireAlarm.0",
          "HumAlarm" : "HumAlarm.0",
          "SenFailure1" : "SenFailure1.0",
          "SenFailure2" : "SenFailure2.0",
          "FilterAlarm" : "FilterAlarm.0",
          "DSHeatAlarm" : "DSHeatAlarm.0",
          "DSCoolAlarm" : "DSCoolAlarm.0",
          "FanOLAlarm" : "FanOLAlarm.0",
          "Comp3LPAlarm" : "Comp3LPAlarm.0",
          "Comp3HPAlarm" : "Comp3HPAlarm.0",
          "Comp4LPAlarm" : "Comp4LPAlarm.0",
          "Comp4HPAlarm" : "Comp4HPAlarm.0",
          "PLanAlarm" : "PLanAlarm.0",
          "GlobalAlarm" : "GlobalAlarm.0"
        }				
      },
      "units" : {

        "SystemStatus" : {
          "status" : "On/Off 0=off, 1=On",
          "mode" : "System mode 0=Off, 1=On, 2=Active, 3=Standby, 4=Override, 5=Off by digital input, 6=Manual",					
          "roomtemperature" : "C",
          "roomhumidity" : "%",
          "fanstatus"	: "",
          "humidifierstatus" : "",
          "compressor1status" : "",
          "compressor2status" : "",
          "heater1status" : "",
          "heater2status" : "",
          "heater3status" : "",
          "heater4status" : "",
          "tempsetpoint" : "C",
          "temphighalarmset" : "C",
          "templowalarmset" : "C",
          "humiditysetpoint" : "%",
          "humidityhighalarmset" : "%",
          "humiditylowalarmset" : "%"
        },
        "Operation" : {
          "tempcontroltime" : "",
          "humcontroltime" : "",
          "coolingoutput" : "",
          "heatingoutput" : "",
          "humidifieroutput" : "",
          "economizeroutput" : "",
          "humdemand" : "%",
          "coolingdemand" : "%",
          "heatingdemand" : "%"
        },
        "Alarms" : {
          "NoAirFlowAlarm" : "",
          "HighHeatAlarm" : "",
          "SmokeAlarm" : "",
          "Comp1LPAlarm" : "",
          "Comp1HPAlarm" : "",
          "Comp2LPAlarm" : "",
          "Comp2HPAlarm" : "",
          "Comp1SCAlarm" : "",
          "Comp2SCAlarm" : "",
          "DrainAlarm" : "",
          "HighTempAlarm" : "",
          "LowTempAlarm" : "",
          "HighHumAlarm" : "",
          "LowHumAlarm" : "",
          "WaterFlowAlarm" : "",
          "FireAlarm" : "",
          "HumAlarm" : "",
          "SenFailure1" : "",
          "SenFailure2" : "",
          "FilterAlarm" : "",
          "DSHeatAlarm" : "",
          "DSCoolAlarm" : "",
          "FanOLAlarm" : "",
          "Comp3LPAlarm" : "",
          "Comp3HPAlarm" : "",
          "Comp4LPAlarm" : "",
          "Comp4HPAlarm" : "",
          "PLanAlarm" : "",
          "GlobalAlarm" : ""
        }									

      }
    }
}

Aplicación backend - Alertas

{
  "device-descriptor" : 
    {
      "serverpath" : "telematica-ups.cicese.mx",
      "serverport" : "XXXX",
      "mibfile" : "GAMATRONIC-POWER-PLUS-MIB",
      "community" : "XXXXXX",
      "servername" : "Sistema de alimentacion ininterrumpida 1",
      "description" : "",
      "collection_name" : "ups1",
      "queryform" : {
        "SystemGenerals" : {
          "temperature" : "ppSystemTemperature.0"
        },
        "ActiveAlarms" : {
          "communication-lost" : 				"ppAlarmPresent.1",
          "startup-time-stamp" : 				"ppAlarmPresent.2",
          "ups-or-more-not-responding" : 		"ppAlarmPresent.3",
          "load-current-high" : 				"ppAlarmPresent.4",
          "ups-shut-down" : 					"ppAlarmPresent.5",
          "suspect-a-fault-output-stage" : 	"ppAlarmPresent.6",
          "suspect-fault-current-sharing" : 	"ppAlarmPresent.7",
          "battery-circuit-breaker-is-open" : "ppAlarmPresent.8",
          "last-self-test-fail" : 			"ppAlarmPresent.9",
          "stsw-not-responding" : 			"ppAlarmPresent.10",
          "ac-input-failure" : 				"ppAlarmPresent.11",
          "ac-input-high" : 					"ppAlarmPresent.12",
          "input-brownout" : 					"ppAlarmPresent.13",
          "user-3-input-open" : 				"ppAlarmPresent.14",
          "user-2-input-open" : 				"ppAlarmPresent.15",
          "user-1-input-open" : 				"ppAlarmPresent.16",
          "last-battery-test" : 				"ppAlarmPresent.17",
          "equalizing-mode" : 				"ppAlarmPresent.18",
          "emergency-power-off-activated" : 	"ppAlarmPresent.19",
          "static-switch-warning" : 			"ppAlarmPresent.20",
          "low-battery-voltage" : 			"ppAlarmPresent.21",
          "end-of-backup" : 					"ppAlarmPresent.22",
          "n-a-1" : 							"ppAlarmPresent.23",
          "high-battery-voltage" : 			"ppAlarmPresent.24",
          "no-ac-output-to-load" : 			"ppAlarmPresent.25",
          "over-temperature" : 				"ppAlarmPresent.26",
          "an-alarms-is-vibrating" : 			"ppAlarmPresent.27",
          "load-on-bypass" : 					"ppAlarmPresent.28",
          "n-a-2" : 							"ppAlarmPresent.29",
          "n-a-3" : 							"ppAlarmPresent.30",
          "one-ups-module-warning" : 			"ppAlarmPresent.31",
          "ups-modules-warning" : 			""
        }				
      },
      "alarm_criteria" : {
        "SystemGenerals" : {
          "temperature" : { "alarm_type" : "range", "range_min" : "12", "range_max" : "55" } 
        },
        "ActiveAlarms" : {
          "communication-lost" : 				      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "startup-time-stamp" : 				      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "ups-or-more-not-responding" :      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "load-current-high" : 				      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "ups-shut-down" : 					        { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "suspect-a-fault-output-stage" : 	  { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "suspect-fault-current-sharing" : 	{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "battery-circuit-breaker-is-open" : { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "last-self-test-fail" : 			      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "stsw-not-responding" : 			      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "ac-input-failure" : 				        { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "ac-input-high" : 					        { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "input-brownout" : 					  { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "user-3-input-open" : 				{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "user-2-input-open" : 				{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "user-1-input-open" : 				{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "last-battery-test" : 				      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "equalizing-mode" : 				        { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "emergency-power-off-activated" : 	{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "static-switch-warning" : 			    { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "low-battery-voltage" : 			      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "end-of-backup" : 					        { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "n-a-1" : 							      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "high-battery-voltage" : 			{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "no-ac-output-to-load" : 			{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "over-temperature" : 				  { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "an-alarms-is-vibrating" : 		{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "load-on-bypass" : 					  { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "n-a-2" : 							      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "n-a-3" : 							      { "alarm_type" : "alarm_notification", "active_value" : "1" },
          "one-ups-module-warning" : 		{ "alarm_type" : "alarm_notification", "active_value" : "1" },
          "ups-modules-warning" : 			{ "alarm_type" : "alarm_notification", "active_value" : "1" }
        }				
        
      },
      "units" : {
        "SystemGenerals" : {
          "temperature" : "C",
          },
        "ActiveAlarms" : {
          "communication-lost" : 				"",
          "startup-time-stamp" : 				"",
          "ups-or-more-not-responding" : 		"",
          "load-current-high" : 				"",
          "ups-shut-down" : 					"",
          "suspect-a-fault-output-stage" : 	"",
          "suspect-fault-current-sharing" : 	"",
          "battery-circuit-breaker-is-open" : "",
          "last-self-test-fail" : 			"",
          "stsw-not-responding" : 			"",
          "ac-input-failure" : 				"",
          "ac-input-high" : 					"",
          "input-brownout" : 					"",
          "user-3-input-open" : 				"",
          "user-2-input-open" : 				"",
          "user-1-input-open" : 				"",
          "last-battery-test" : 				"",
          "equalizing-mode" : 				"",
          "emergency-power-off-activated" : 	"",
          "static-switch-warning" : 			"",
          "low-battery-voltage" : 			"",
          "end-of-backup" : 					"",
          "n-a-1" : 							"",
          "high-battery-voltage" : 			"",
          "no-ac-output-to-load" : 			"",
          "over-temperature" : 				"",
          "an-alarms-is-vibrating" : 			"",
          "load-on-bypass" : 					"",
          "n-a-2" : 							"",
          "n-a-3" : 							"",
          "one-ups-module-warning" : 			"",
          "ups-modules-warning" : 			""
        }				
        
      }
  }
}
  • Tipos de criterios de alertas, "range", "alarm_on_active"
  • Envío correo con notificación de alertas activas.

Por hacer:

  • Notificar cuando alerta este activa, y tambien cuando una alerta se desactive.
  • Definir nuevos criterios para alertas.
  • Describir a que se refieren las alertas.
  • Definir prioridades a las alertas

Aplicación frontend

  • Aplicación PHP con bibliotecas MongoClient para acceso a datos.
  • Gráficas usando google-charts framework

 

Por hacer:

  • Construir la interface html5 con XHR Ajax para consultar datos en tiempo real.
  • Analítica en dashboard

monitoreo-centinela

By Favio Medrano

monitoreo-centinela

Sobre el diseño del sistema de monitoreo del centro de datos "Centinela"

  • 236