How to version my spatial database?
Felix Kunde - @flxku
Research Assistant
MAGDa
https://projekt.beuth-hochschule.de/magda/
10 yrs time series from sensors
Floating Car Data from 500 taxis
Goal1: Crowdsourced Traffic Plattform
Goal2: Mobility Services
Goal3: Spatial Data Streaming / Mining
Goal4: Forecasting / Anomaly Detection
Methods: Statistical & Machine Learning (Python), Linear Referencing (PostGIS), Routing (Graphhopper, Java)
http://excell-mobility.com/
Words
aka What are we talking about?
Provenance*
Keep History of Your data
Lineage*
Store History about Your data
Audit trail / Commit Log
Store History about transformations to Your data (Change Data Capture)
Bi-Temporal*
Lineage + audit Trail
Event SourCing*
Derive State from immutable list of Events
VersioN Control*
Keep Multiple versions of Your dataBASE
MVCC*
Keep Multiple versions of Your data For Concurrent Workflows
Evolution*
Define semantic versions of you data schema when it evolves
Actions
What to do with it?
Query
Browse through Data History
Restore
Recreate a previous version
RePair
Undo certain transformations
RePlay
Debug your log
(Or recover from it)
Long Running Transactions
What you get when Using GIS but with conflict handling
Merge
Union between two versions (Branches)
Migration
Change the version while running In ProductioN mode
v1
v2
v3
Tools
What Can be used
Internal
Versioning inside your database
DIY
Additional Columns, History Tables, Database Trigger
Temporal_Tables extension*
Additional Columns, Relational History Tables, Database Trigger
Pros
Simple setup
Writes are still fast
Easy access of past data
Cons
No support for schema changes
No repair functionality
No branching
Data grows and grows and grows (well, general provenance problem)
pgMemento* "extension"
Generic auditing (JSONB), Transactions instead timestamps, (Event) Trigger
Pros
Can deal with schema changes
JSONBs logs are queryable
Inspect past transformations
Rollback certain transactions covering many tables
Cons
Some performance overhead
Restore tuples is complex
Yet no branching
No audit for all DB objects (Constraints, Indexes, Grants, Extensions etc.)
QGIS Versioning*
QGIS Plugin, concurrent editing, Conflict resolution
See also: pgVersion*
Pros
Easy to setup
QGIS as frontend
PostGIS <> SpatiaLite architecture possible
Partial checkouts
Compare revisions
Cons
Some limitations for table designs
QGIS as frontend
PG-Revisions as database views
Only for GIS-like database schemas
No merging of branches
Doesn'T PostGres have a transaction Log?
Why noT scan It instead of using triggers?
Logical Decoding*
Use decoded replication streams, Output to whatever you want
Pros
No effect to writes
Robust order of commits
Many output formats possible
Auditing in a distributed system (logical replication)
Cons
Lack of available tools
Harder to implement than simply writing triggers
No automatic DDL support (but hey, there's pgAudit*)
Mind replication slots
EXTernal
Versioning Outside of your database
DIY
Put Database files under Version control (Git etc.)
Geogig*
Edit your Geometry in a Git Way
Pros
Hey, it works like Git!
branch, merge, revert etc.
QGIS and GeoServer plugins
Works with complex DBs
Good documentation
Active development
Cons
How to query previous states via SQL?
Cache limits for PG imports?
FlyWay*
Schema Migration
Pros
Simple API, easy config
Defines changes in SQL
Your DB has a version
Idempotence paradigm
Cons
Do not change a migration
No rollback, better redo everything
No diff between revisions
Decide
What you Need
WHEN
Audit Trail (CDC)
How
When you need to know what happened.
Good for short running transactions.
Not optimal if there are many data updates.
Version Control
Concurrent editing with long running transactions which requires conflict resolution
Migration
When your database runs in production (continious integration)
Lineage (Metadata)
Always
Thank YOU! QUESTIONS?
slides.com/fxku/foss4g17_dbversion
There's more
Appendix
A detailed Comparison between tools
Tool | Method | Log type | Revision |
---|---|---|---|
timetravel | Audit Trail | Extra Columns | Timestamps |
temporal_tables | Audit Trail | Shadow Tables | Timestamps |
table_version |
Audit Trail | Shadow Tables | UD revision |
table_log |
Audit Trail | Shadow Tables | Trigger seq |
audit_trigger | Audit Trail | Generic (hstore) | Transactions |
pgMemento | Audit Trail | Generic (jsonb) | Transactions |
CyanAudit | Audit Trail | Generic (pivot) | Transactions |
pgVersion | Version Ctl | Extra Columns | UD revision |
QGIS Versioning | Version Ctl | Extra Columns | UD revision |
GeoGig | Version Ctl | External (binary) | UD revision |
Flyway | Migration | External (SQL) | UD revision |
Liquibase | Migration | External (XML) | UD revision |
Tool | Tracing Var | What is logged? |
---|---|---|
timetravel | Upper bound + PK | rows |
temporal_tables | Revision ID + PK | rows (ignore columns) |
table_version | Revision ID + PK | rows |
table_log | Trigger seq + PK | rows |
audit_trigger | - | rows (ignore columns) |
pgMemento | Synth key | changed columns |
CyanAudit | PK array | rows (toggle columns) |
pgVersion | Revision ID + PK | rows |
QGIS Versioning | Revision ID + PK | rows |
GeoGig | PK or user-defined | changesets |
Flyway | ||
Liquibase |
Tool | Restore | Revert |
---|---|---|
timetravel | SQL | - |
temporal_tables | SQL | - |
table_version | SQL | - |
table_log | SQL + API | API (simple DBs) |
audit_trigger | SQL | - |
pgMemento | SQL + API | API (complex DBs) |
CyanAudit | SQL + API | API (simple DBs) |
pgVersion | - | |
QGIS Versioning | SQL | - |
GeoGig | API | API |
Flyway | API | - |
Liquibase | API | API |
Tool | DDL changes | LRT / Branches / Merging |
---|---|---|
timetravel | - | - |
temporal_tables | - | - |
table_version | - | - |
table_log | - | - |
audit_trigger | - | - |
pgMemento | tables, columns | - |
CyanAudit | columns | - |
pgVersion | - | yes / no / no |
QGIS Versioning | - | yes / yes /no |
GeoGig | tables, columns | yes / yes / yes |
Flyway | DB objects | |
Liquibase | DB objects |
Audit Trail Benchmarking
Kunde F., Sauer P. (2017) pgMemento – A Generic Transaction-Based Audit Trail for Spatial Databases. In: Gertz M. et al. (eds) Advances in Spatial and Temporal Databases. SSTD 2017. Lecture Notes in Computer Science, vol 10411. Springer, Cham
How to version my spatial database?
By fxku
How to version my spatial database?
- 5,977