Project Prometheus
James Macdonell - Information Security Analyst
Patrick O'Connor - Operating System Analyst
Aaron Smith - ITC/Web Developer
Introduction
Goal:
Introduce CSUSB's Centralized Logging and Metrics Infrastructure
Topics:
Blackboard, Central Logging, Metrics, Graphing Data, Technical Infrastructure, Future Plans
The Blackboard Problem
- Blackboard went down
- Existing monitoring showed OK status
- Blackboard support was contacted, no immediate diagnosis
THE RESULT
- Faculty/Staff could not access Blackboard during finals
- Blackboard support claimed configuration issue
- Couldnt diagnose, Blackboard support recommended increase in resources as the only fix
- James volunteered analysis of network traffic
- Patrick volunteered assistance with Tomcat
Limping through Finals
/webapps/portal/frameset.jsp
/webapps/portal/frameset.jsp
/webapps/portal/execute/topframe
/webapps/portal/execute/tabs/tabAction
/webapps/blackboard/execute/modulepage/view
/webapps/blackboard/execute/courseMain
/webapps/blackboard/content/listContent.jsp
/webapps/blackboard/execute/announcement
/branding/themes/CSUSB_SP9/images/background_h.png
/webapps/blackboard/course/course_button.jsp
/webapps/blackboard/execute/displayIndividualContent
/images/console/icons/help_0.gif
/branding/colorpalettes/CSUSB/colorpalette.css
/branding/themes/CSUSB_SP9/theme.css
/branding/themes/CSUSB_SP9/images/backgrounds_h.png
/webapps/discussionboard/do/conference
/images/ci/ng/cm_arrow_left.gif
/branding/_1_1/CSUSB_Online_Internal_Logo.png
/webapps/discussionboard/do/message
BLACKBOARD
BACK ONLINE!
What went wrong?
Spending the day in pl2107
- consulting - idk, try bigger boxen
- other csu - idk, we run redhat/oracle*
- csusb - pretty sure it's something
inside the big opaque java.exe
* got hints about logging, staff
Attitude Change
blackboard at csusb
-became-
csusb's blackboard
"Stop treating Blackboard as a vendor
and start treating them as a partner"
- John McGuthery
Changing Problem Scope
What do we have?
- what does normal look like?
- what is the maximum capacity?
- what factors affect response time?
- what debugging do we have available?
Solution
Two new systems:
Consolidated logging with fast retrieval of information
Historical metrics and benchmarks
Accessible Logging
-
Logstash Agents / Syslog Forwarders
-
Redis Broker
-
Logstash Indexer / Syslog Listener
-
Elasticsearch
- Kibana
Graphing/Metrics
- Logstash - Parsing from log files
- jmxtrans - github.com/jmxtrans - JVM data
- StatsD - Etsy - stats aggregation, shipping
- Graphite - Orbitz - composer/dashboard
Applying
this to
Blackboard
System Architecture
Shipping Logs with Logstash
bb-sqlerror-log
bb-access-log
bb-authentication-log
stdout-stderr
catalina-log
JVM Information
Heap size
Threads
Garbage Collection
Database Activity
-
Application Events
- Logins/Logouts
- Course Access
- Content
- Grades
- Discussion Board
- Tools
- Tab Access
Demo(s)
Conclusion
Final Thoughts
Questions?
Project Prometheus - Techs Meeting
By dontrebootme
Project Prometheus - Techs Meeting
- 1,156