Centralized Data Collection and Use
The Many Uses of Data
(Data Marts)
- Educational Data Mining
- Learning Analytics
- Application Reporting
- Usage Data/Metrics
Educational Data Mining
Concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings in which they learn.
EDM History
- Annual workshops begin in 2005
- 1st international conference in Montreal in 2008
- Journal of Educational Data Mining in 2009
- First Handbook of EDM in 2010
- International EDM Society founded in 2011
EDM Questions
- What sequence of topics is most effective for a specific student?
- Which student actions are associated with better learning and higher grades?
- Which actions indicate satisfaction and engagement?
- What features of an online environment lead to better learning outcomes?
EDM Analytics
- Combine statistics, machine learning and data mining
- Student and instructional modeling
- Highly parallelized batch development
- Traditional big data toolsets
- Simulation and algorithmic processing
Learning Analytics and Knowledge
Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs.
LAK History
- LAK conference series established in 2010
- Society for Learning Analytics (SoLAR) in 2011
LAK Questions
- When are students ready to move onto the next topic?
- When is a student at risk for not completing a course?
- What grade is a student likely to receive without intervention?
- Does a student require intervention?
EDM vs LAK?
EDM |
LAK |
|||
|---|---|---|---|---|
|
Automated discovery using human judgement |
Discovery |
Human judgement using automated discovery |
||
|
Component reduction, analysis of relationships |
Reduction |
Overall understanding of system as a whole |
||
|
Educational software and student modeling |
Origins |
Intelligent curriculum, outcome prediction and intervention |
||
|
Automated adaptation |
Adaptation |
Inform and empower instructors/learners |
||
|
Classification, clustering, Bayesian, relationship mining |
Techniques |
SNA, sentiment analysis, influence analytics, prediction |
||
Application Reporting
The real-time acquisition, faceting and aggregation of learning data at any stage of the educational hierarchy.
Reporting Questions
- How many students have completed the Unit 4, Week 3 assessment?
- How do the scores for the Grade 3 project based learning assignment compare across classes?
- Which teachers have the most students in Tier 2 for Grade 3, Comprehension?
- Which schools are using the Wonder Works intervention program most effectively?
Application Reporting
- Stand-alone and embedded reporting surfaced to user via web using graphical and tabular display
- Real-time needs require highly performant (sub-second) SLA
- Faceting and aggregation for hundreds to hundreds of thousands items
- Date-based histograms and custom aggregation scripts
Usage Data and Metrics
The collection and analysis of successive measurements made over time.
Usage/Metrics Questions
- When instructing "factors and multiples", how often do educators assign mini-games?
- What percent of above-level students use hints?
- How many times per week does the educator consult the planner?
- Where should development time be focused (feature prioritization)?
- Tracking whether customer training results in higher usage of new features.
Time Series Analysis
- Extraction of meaningful statistics
- Forecasting to predict future values
- Regression analysis on cross-series data
- Workflow analysis
- Feature input generation for EDM analysis
Platform Architecture
Different Data Types Require Different Processing Workflows
Platform Requirements
- Massively scalable architecture
- Support for parallel processing analytics (EDM/LAK)
- Support for real-time reporting/analytic queries
- Support for time series data
- Easily accommodates new learning events
- Achieves desired SLAs for each data type
SLA Variance
| SLA | EDM/LAK | Reporting | Usage/Metrics |
|---|---|---|---|
|
Availability Time for arriving event to be available to client |
minutes | < 2 seconds | < 10 seconds |
|
Query Time for query request/response roundtrip |
hours or days | < 2 seconds | < 5 seconds |
|
New Input Maintenance time for new event type to be available for use |
minutes | none | none |
LEAP Platform Proposal
Separation of Concerns
Message Data Store
- High availability
- High performance
- Schema validation
- Audit and logging
- Appropriate store based on event type
Data Store Consumers
- Educational Data Mining
- Learning analytics
- ConnectED products and reporting
- Next Gen Math
- Dev Ops dashboards and notifications
Message Data Store
- Very lightweight message ingestion system
- Immediate schema validation upon message POST
- Immediate availability in:
- document-based store
- S3 (for EMR/Hadoop)
- time-series database
- Transformation pipeline (i.e. Kafka/Storm)
- Time required to add new event type?
Parallel Development Track
- Message Data Store (MDS) development can be stood up in AWS by early Q3/15 to support full Reading Wonders' requirements
- Engrade directs new events to MDS
- ConnectED learning event factory writes to MDS
- ElasticSearch cluster provides necessary output APIs for ConnectED reporting
- ElasticSearch cluster may provide direct output for NGMS?
Layered Approach to Data Collection
Applied Analytics - Progressive Enhancement
Data Platform
By James Cook
Data Platform
- 860