Centralized Data Collection and Use

The Many Uses of Data

(Data Marts)

  • Educational Data Mining
  • Learning Analytics
  • Application Reporting
  • Usage Data/Metrics

Educational Data Mining

Concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings in which they learn.

EDM History

  • Annual workshops begin in 2005
  • 1st international conference in Montreal in 2008
  • Journal of Educational Data Mining in 2009
  • First Handbook of EDM in 2010
  • International EDM Society founded in 2011

EDM Questions

  • What sequence of topics is most effective for a specific student?
  • Which student actions are associated with better learning and higher grades?
  • Which actions indicate satisfaction and engagement?
  • What features of an online environment lead to better learning outcomes?

EDM Analytics

  • Combine statistics, machine learning and data mining
  • Student and instructional modeling
  • Highly parallelized batch development
  • Traditional big data toolsets
  • Simulation and algorithmic processing

Learning Analytics and Knowledge

Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs.

LAK History

  • LAK conference series established in 2010
  • Society for Learning Analytics (SoLAR) in 2011

LAK Questions

  • When are students ready to move onto the next topic?
  • When is a student at risk for not completing a course?
  • What grade is a student likely to receive without intervention?
  • Does a student require intervention?

 

Artboard 1 Created with Sketch. Grouping Educational Data Mining Learning Analytics and Knowledge Data Visualization Recommendations Predicting Student Performance Platform Distinction (Mobile/Desktop) Learner Models Behavior Detection Informing Instruction Planning & Scheduling Courseware Construction

EDM vs LAK?

EDM

     

LAK

Automated discovery using human judgement

Discovery

Human judgement using automated discovery

Component reduction, analysis of relationships

Reduction

Overall understanding of system as a whole

Educational software and student modeling

Origins

Intelligent curriculum, outcome prediction and intervention

Automated adaptation

Adaptation

Inform and empower instructors/learners

Classification, clustering, Bayesian, relationship mining

Techniques

SNA, sentiment analysis, influence analytics, prediction

Application Reporting

The real-time acquisition, faceting and aggregation of learning data at any stage of the educational hierarchy.

Reporting Questions

  • How many students have completed the Unit 4, Week 3 assessment?
  • How do the scores for the Grade 3 project based learning assignment compare across classes?
  • Which teachers have the most students in Tier 2 for Grade 3, Comprehension?
  • Which schools are using the Wonder Works intervention program most effectively?

Application Reporting

  • Stand-alone and embedded reporting surfaced to user via web using graphical and tabular display
  • Real-time needs require highly performant (sub-second) SLA
  • Faceting and aggregation for hundreds to hundreds of thousands items
  • Date-based histograms and custom aggregation scripts

Usage Data and Metrics

The collection and analysis of successive measurements made over time.

Usage/Metrics Questions

  • When instructing "factors and multiples", how often do educators assign mini-games?
  • What percent of above-level students use hints?
  • How many times per week does the educator consult the planner?
  • Where should development time be focused (feature prioritization)?
  • Tracking whether customer training results in higher usage of new features.

Time Series Analysis

  • Extraction of meaningful statistics
  • Forecasting to predict future values
  • Regression analysis on cross-series data
  • Workflow analysis
  • Feature input generation for EDM analysis

Platform Architecture

Different Data Types Require Different Processing Workflows

Platform Requirements

  • Massively scalable architecture
  • Support for parallel processing analytics (EDM/LAK)
  • Support for real-time reporting/analytic queries
  • Support for time series data
  • Easily accommodates new learning events
  • Achieves desired SLAs for each data type

SLA Variance

SLA EDM/LAK Reporting Usage/Metrics

Availability

Time for arriving event to be available to client

minutes < 2 seconds < 10 seconds

Query

Time for query request/response roundtrip

hours or days < 2 seconds < 5 seconds

New Input

Maintenance time for new event type to be available for use

minutes none none

LEAP Platform Proposal

Separation of Concerns

Message Data Store

  • High availability
  • High performance
  • Schema validation
  • Audit and logging
  • Appropriate store based on event type

Data Store Consumers

  • Educational Data Mining
  • Learning analytics
  • ConnectED products and reporting
  • Next Gen Math
  • Dev Ops dashboards and notifications

Message Data Store

  • Very lightweight message ingestion system
  • Immediate schema validation upon message POST
  • Immediate availability in:
    • document-based store
    • S3 (for EMR/Hadoop)
    • time-series database
    • Transformation pipeline (i.e. Kafka/Storm)
  • Time required to add new event type?

Parallel Development Track

  • Message Data Store (MDS) development can be stood up in AWS by early Q3/15 to support full Reading Wonders' requirements
  • Engrade directs new events to MDS
  • ConnectED learning event factory writes to MDS
  • ElasticSearch cluster provides necessary output APIs for ConnectED reporting
  • ElasticSearch cluster may provide direct output for NGMS?

Layered Approach to Data Collection

Group Created with Sketch. ConnectED Reporting DLE Dashboard Learning Anaytics Platform - Student and instructional modeling - Highly parallelized batch development - Simulation and algorithmic processing Usage and Metrics Educational Data Mining Amazon EMR (Hadoop) Realtime Query Elasticsearch Schema Validation 1.0.0 Schema Validation 1.1.0 Filter Transform Normalize Logging Route Distributed Messaging System Apache Kafka ConnectED Engrade Next Gen

Applied Analytics - Progressive Enhancement

Group Created with Sketch. Student Comprehension Main Idea Recommendation 1 Recommendation 2 Step 3 - Item-Item Collaborative Filtering Step 2 - Similar Items Model Student 1 ✓ Student 2 ✓ Student 3 ✓ 84.8 92.6 69.0 Step 1 - Student Profile Distributed Messaging System Apache Kafka Learning Anaytics Platform Which resources should I recommend to a student in order to boost their comprehension - main idea ability?

Data Platform

By James Cook

Data Platform

  • 860