Project Prometheus



James Macdonell - Information Security Analyst
Patrick O'Connor - Operating System Analyst
Aaron Smith - ITC/Web Developer

Introduction


Goal:
Introduce CSUSB's Centralized Logging and Metrics Infrastructure


Topics:
Blackboard, Central Logging, Metrics, Graphing Data, Technical Infrastructure, Future Plans



The Blackboard Problem


        1. Blackboard went down
        2. Existing monitoring showed OK status
        3. Blackboard support was contacted, no immediate diagnosis

THE RESULT


      1.  Faculty/Staff could not access Blackboard during  finals
      2.  Blackboard support claimed configuration issue
      3.  Couldnt diagnose, Blackboard support recommended increase in resources as the only fix
      4.  James volunteered analysis of network traffic
      5.  Patrick volunteered assistance with Tomcat


Limping through Finals



/webapps/portal/frameset.jsp
/webapps/portal/frameset.jsp
/webapps/portal/execute/topframe
/webapps/portal/execute/tabs/tabAction
/webapps/blackboard/execute/modulepage/view
/webapps/blackboard/execute/courseMain
/webapps/blackboard/content/listContent.jsp
/webapps/blackboard/execute/announcement
/branding/themes/CSUSB_SP9/images/background_h.png
/webapps/blackboard/course/course_button.jsp
/webapps/blackboard/execute/displayIndividualContent
/images/console/icons/help_0.gif
/branding/colorpalettes/CSUSB/colorpalette.css
/branding/themes/CSUSB_SP9/theme.css
/branding/themes/CSUSB_SP9/images/backgrounds_h.png
/webapps/discussionboard/do/conference
/images/ci/ng/cm_arrow_left.gif
/branding/_1_1/CSUSB_Online_Internal_Logo.png
/webapps/discussionboard/do/message


BLACKBOARD


 BACK ONLINE!



What went wrong?


                Spending the day in pl2107
                - consulting - idk, try bigger boxen
               - other csu -  idk, we run redhat/oracle*

                - csusb - pretty sure it's something 
                        inside the big opaque java.exe

                * got hints about logging, staff

Attitude Change


blackboard at csusb
-became-
csusb's blackboard

"Stop treating Blackboard as a vendor 
and start treating them as a partner" 
- John McGuthery

Changing Problem Scope


                What went wrong?
                What do we have?
                  - what does normal look like?
                  - what is the maximum capacity?
                  - what factors affect response time?
                  - what debugging do we have available?

Solution



Two new systems:

Consolidated logging with fast retrieval of information
Historical metrics and benchmarks

Accessible Logging


  1.  Logstash Agents / Syslog Forwarders
  2.  Redis Broker
  3.  Logstash Indexer / Syslog Listener
  4.  Elasticsearch
  5.  Kibana 

         

      Graphing/Metrics



    1.  Logstash - Parsing from log files
    2.  jmxtrans - github.com/jmxtrans - JVM data
    3.  StatsD - Etsy - stats aggregation, shipping
    4.  Graphite - Orbitz - composer/dashboard




Applying 

this to 

Blackboard

System Architecture


Shipping Logs with Logstash



bb-sqlerror-log
bb-access-log
bb-authentication-log
stdout-stderr
catalina-log

JVM Information


Heap size
Threads
Garbage Collection

Database Activity


  • Application Events
    • Logins/Logouts
    • Course Access
      • Content
      • Grades
      • Discussion Board
      • Tools
    • Tab Access



Demo(s)



Conclusion




Final Thoughts



Questions?

Project Prometheus - Techs Meeting

By dontrebootme

Project Prometheus - Techs Meeting

  • 1,156