Mining User Trajectories
in Electronic Textbooks
Ahcène Boubekki
Shailee Jain
Ulf Brefeld
Leuphana, Lüneburg
University of Texas at Austin
Leuphana, Lüneburg
mBook
Multimedia history textbook used in German schools in Belgium.
Since 2013 : 6 chapters
725 pages
478 galleries
3 000 users
37 000 sessions
780 000 clicks
75 tracked events
The Project:
Research questions
Do all the pupils use the textbook the same way?
Is their behavior influenced by their ability in history or IT skills?
Can we predict the pupils ability in history or IT skills from their behavior?
How does the teacher influence these behaviors?
No
Local approach
Markov Chains
Markov Chains
Cluster 1
Cluster 2
Cluster 3
Markov Chains: Tracking
Markov Chains: Modeling
Markov Chains: Clustering
Markov Chains: Summary
Advantages
Drawbacks
- Simple,
- It can handle all the events,
- It can be complexified.
- Simple,
- Tend to overfit if complexified,
- It does not handle events duration easily.
Global approach
Trajectories
Trajectories












Trajectories
How do we compare paths in a graph?
What do we need:
- a distance between nodes,
- a distance between trajectories,
Shortest path
There are many : Hausdorff, DTW...
We transform them into spatial temporal trajectories.
that satisfies three properties.
There are many : Hausdorff, DTW...
Property 1
The distance can compare two trajectories of different durations.
Property 2
The distance is independent of the speed.
Property 3
Repetition of cycles does not modify the distance.
Hausdorff

Let and be two non-empty subsets of a metric space.
We define their Hausdorff distance by
Definition




Drawbacks
Outliers govern the distance.
Independent from time/duration.
Dynamic Time Warping
Given two trajectories and , dynamic time warping (DTW) computes an alignment
Definition

Finally the distance between and is then given by:
with the following properties:


Drawbacks
Very slow.
Temporal order broken.
Let's build a distance
Property 1
Stop the comparison when the shortest trajectory ends.


The distance can compare two trajectories of different durations.
Property 2
The distance is independent of the speed.
Normalize the duration.
Property 3
Repetition of cycles does not modify the distance.
Normalize the duration.
distance
Definition:
is defined as the normalized area spanned between and until the shortest one ends.



Distance comparison
Comparison
Setting
- 1 day,
- 41 session,
- 37 users,
- 4 teachers,
- 6 classes/groups.

WW2
Antiquity
Reformation
Hypothesis
Pupils in the same class, should be grouped together.
Algorithm
k-means with up to 20 clusters.

| Hausdorff | DTW | Delta | |
|---|---|---|---|
| # Clusters | 8 | 9 | 10 |
| Homogeneity | 0.39 | 0.97 | 0.97 |

- Both successfully detects topics.
- also detects similar/divergent behaviors.
Correlations
Can Δ be used as a psychometrics indicator?
Can Δ be used as a teaching style indicator?
Correlations
Definitions:
is the average distance between one pupil and her classmates during one class session .
Correlations
Five psychometrics scores:
| Competency | Knowledge | Motivation | IT Access | IT Skills | |
|---|---|---|---|---|---|
| 0.179 | 0.096 | -0.17 | 0.023 | 0.092 | |
| PPM | 0.145 | 0.133 | 0.039 | -0.002 | 0.019 |
| EPM | 0.184 | 0.156 | -0.065 | -0.022 | 0.063 |
| Competency | Knowledge | Motivation | IT Access | IT Skills | |
|---|---|---|---|---|---|
| -0.224 | -0.165 | -0.096 | -0.069 | -0.357 | |
| PPM | -0.232 | 0.049 | 0.111 | 0.188 | -0.156 |
| EPM | -0.232 | -0.141 | -0.142 | 0.081 | 0.059 |
| Competency | Knowledge | Motivation | IT Access | IT Skills |
|---|
Activity Indicators
PPM: Pages per minutes
EPM: Events per minutes
: Average distance between one pupil and her
classmates
Settings:
400 class sessions between Feb. and July 2017.
Two teachers in two different schools.
Teacher B
Teacher A
p-value < 5% marked in bold face
Correlations
Can Δ be used as a psychometrics indicator?
Yes
Can Δ be used as a teaching style indicator?
Correlations
Definitions:
is the average distance between one pupil and her classmates during one class session .
is the average of during one class session .
Correlations
The greater the more freedom is given to the pupils.
Correlations
Correlations
| Competency | |
|---|---|
| 0.179 | |
| PPM | 0.145 |
| EPM | 0.184 |
| Competency | |
|---|---|
| -0.224 | |
| PPM | -0.232 |
| EPM | -0.232 |
Teacher B
Teacher A
The difference is significant.
| 4.48 |
| 5.76 |
Pupils diverging from the teaching style perform worst.
All correlations are significant.
Correlations
Can Δ be used as a psychometrics indicator?
Yes
Can Δ be used as a teaching style indicator?
Yes
Yes
Summary
Summary
- Track users using spatio-temporal trajectories.
- Built a distance between trajectories:
- Usual clustering algorithms can used
- Interpretable and meaningful clusterings
- Pupils: a -based indicator correlates with psychometrics scores
- Teachers: a -based indicator captures teaching style.
Mining User Trajectories
in Electronic Textbooks
Ahcène Boubekki
Shailee Jain
Ulf Brefeld
Leuphana, Lüneburg
University of Texas at Austin
Leuphana, Lüneburg
Mining User Trajectories in Electronic Text Books
By ahcene
Mining User Trajectories in Electronic Text Books
- 295