Opal 2.7

Multilines

Objectives

  • Import time series
  • Handle longitudinal data
  • Export repeatable variables in a analysis-ready format

Multilines Import

Import from data sources in tabular format (CSV, SPSS, SQL) with multiple rows per entity 

ID A B
1 a 1
1 b 2
2 a 3
2 c 1

Internal Representation

  • Entity IDs are unique
  • Variables are "repeatables"
  • Values are "sequences" in occurrence groups
ID A B
1 a
b
1
2
2 a
c
3
1

Longitudinal Data

  • Same set of variables
  • Different Data Collection Events
ID A B
1 a 1
2 a 3
ID A B
1 b 2
2 c 1

T1

T2

Longitudinal Data

  • View to merge events
ID A B Event
1 a
b
1
2
T1
T2
2 a
c
3
1
T1
T2

View of T1 + T2

Longitudinal Data

  • View to merge events
ID A B Event
1 a
b
b
1
2
1
T1
T2
T3
2 a
c
 
3
1
 
T1
T2
T3
3

c


3
T1
T2
T3

View of T1 + T2 + T3

Repeatables Mix

  • Mix of repeatable and not repeatable variables
  • Different occurrence groups
ID A B G C D
1 a
b
1
2
M x
y
z
1
2
3
2 a
c
3
1
F x
x
z
1
1
1

Export to CSV and SQL

NOT multilines

  • Value sequences are serialized as a CSV string
  • Value type is altered
  • Not suitable for subset and query
ID A B G C D
1 a,b 1,2 M x,y,z 1,2,3
2 a,c 3,1 F x,x,z 1,1,1

Export to CSV and SQL

Multilines

  • One row per occurrence
  • Value type is correct
  • Ready for subset and query
ID A B G C D
1 a 1 M x 1
1 b 2 M y 2
1 M z 3
2 a 3 F x 1
2 c 1 F x 1
2 F z 1

Future: Transposition

  • One variable per occurrence
  • Variable derivation wizard
ID A1 A2 B1 B2
1 a b 1 2
2 a c 3 1

Future: R

Repeatable variables are currently ignored when exporting to R

  • Support of tibble format with multilines
  • Support of SPSS, STATA, SAS through haven
Made with Slides.com