Opencast Analytics @ DCE

Jay Luker - Senior Software Engineer, Harvard DCE

Using Opencast API data to...

  • track workflow performance
  • understand workflow performance
  • (nearly) real-time user activity monitoring
  • provide user insight to instructors & researchers (future)

Workflow Browser

  • d3-based workflow visualization
  • Each workflow rendered as a timeline showing duration of individual operations
  • Pan, zoom and filter
  • Mouse-over operations to display details
  • Goal was to allow visual id of problem workflows and/or operations

WPI Dashboard

  • Workflow Performance Index
  • WPI = media length divided by workflow run time
  • run time calculation excludes operations not related to actual video production
  • Time-series graphs of active user sessions
  • Filter by live vs on demand, identified vs anonymous
  • Top terms: course, IP address, browser, device, OS
  • User location mapping based on geoip lookups

User Activity

Dashboard

Usertracking Event Pipeline

Paella

  • Standard event types: PLAY, PAUSE, SEEK, etc
  • HEARTBEAT events every 30s
  • Custom param: playing=<true|false>
  • Additional interesting event types:
    • setVolume
    • enterFullscreen
    • setplaybackrate
    • caption:enabled

Usertracking Service

  • Usertracking service adds user id based on auth
  • Current DB size: ~73G
  • Mostly mh_user_action: 131m records
  • This is a problem
  • We are exploring solutions

Harvester

  • Runs */2 minutes
  • Fetches actions in batches using start/end params
  • Builds event record from action +  episode metadata
  • Publishes to SQS

ELK Elastic Stack

  • Logstash - multi-purpose data pipeline component
    • geoip lookups
    • browser useragent parsing
  • Elasticsearch - data store for search & analytics
    • automated snapshots to s3
    • indexes older than x days: archive & delete (future)
  • Kibana - analytics & visualization platform
    • internal dashboards
    • embed graphs in LMS course pages (future)
{
  "mpid": "f21fcc8f-93...",
  "ip": "12.34.56.78",
  "huid": "12346789",
  "is_live": 0,
  "ua": {...},
  "session_id": "15fe8z...",
  "action_id": 72922555,
  "@version": "1",
  "@timestamp": "2016-01-01...",
  "episode": {...},
  "host": "127.0.0.1",
  "geoip": {...},
  "ua": {...}
}
{
  "inpoint": 4790,
  "type": "HEARTBEAT",
  "is_playing": true
}
{
  "series": "20160114328",
  "course": "Data Science",
  "type": "L10",
  "title": "Lecture 10",
  "year": "2016",
  "term": "01",
  "cdn": "14328",
  "duration": 5978537,
  "start": "2017-04-03T21:29:00Z"
}
{
  "ip": "12.34.56.78",
  "country_code2": "SA",
  "country_code3": "SAU",
  "country_name": "Saudi Arabia",
  "continent_code": "AS",
  "timezone": "Asia/Riyadh",
  "location": [45,25]
}
{
  "name": "Chrome",
  "os": "Windows 7",
  "os_name": "Windows 7",
  "device": "Other",
  "major": "47",
  "minor": "0",
  "patch": "2526"
}

Next Steps (in no particular order)

  • Convert Workflow Browser & WPI Dashboard to new Elastic Stack deployment
  • Query API for user event data
  • Embeddable visualizations
  • Separate usertracking collection from engage

Thanks!

Here's some links!

Opencast Analytics @ DCE

By James Luker

Opencast Analytics @ DCE

  • 1,137