Background
Motivation
Implementation
Roadmap
Version control of a proprietary DBMS
Linear versioning with timestamps
Working in different "workspaces" (hierachical versioning)
Merging of workspaces is possible
Great in theory, but working not so well in the field ...
Clear separation between production and archive state
Handling former DB states should be understandable
Performance matters
Developing a version control for PostgreSQL
Linear versioning by using audit triggers
Access to former DB states by copying the logged information into a separate schema (either as VIEW or TABLE)
But: Being able to revert single transactions
Hierarchical versioning could be done in the same way
(Did not know there were so many in advance)
New API since PG 9.4 to access replication streams
Decoding the WAL logs
Allows for bidirectional replication >> Multi-Master-Cluster
Export changes to replication slots
Auditing and versioning are both possible
Output plugins: WAL-Format >> wished format
More Infos: Docu, FOSS4G NA, Postgres Open, FOSDEM, PgConfRu
Table row >> row_to_json(OLD) >> JSON
JSON >> json_populate_record(null::test_table, JSON) >> Table row
Run PL/pgSQL scripts
Create triggers
Create Audit_ID columns
Log existing data as "INSERT"ed
transaction_log ( txid BIGINT, stmt_date TIMESTAMP, user_name TEXT, client_name TEXT); table_event_log ( id SERIAL, transaction_id BIGINT, op_id SMALLINT, table_operation VARCHAR(8), schema_name TEXT, table_name TEXT, table_relid OID); row_log ( id BIGSERIAL, event_id INTEGER, audit_id BIGINT, changes JSONB);
UPDATE building SET measured_height = 12.8 WHERE id = 1;
CREATE INDEX row_log_changes_idx ON pgmemento.row_log USING GIN (changes);
SELECT t.txid
FROM pgmemento.transaction_log t
JOIN pgmemento.table_event_log e ON t.txid = e.transaction_id
JOIN pgmemento.row_log r ON r.event_id = e.id
WHERE
r.audit_id = 4
AND
(r.changes ? 'measured_height');
-> Index Scan using row_log_audit_idx on row_log r
Index Cond: (audit_id = 4)
Filter: (changes ? 'measured_height'::text)
SELECT DISTINCT audit_id FROM pgmemento.row_log
WHERE
changes @> '{"measured_height": 0.0}'::jsonb;
-> Bitmap Index Scan on row_log_changes_idx
Index Cond: (changes @> '{"measured_height": 0.0}'::jsonb)
SELECT pgmemento.restore_schema_state(
1, -- Start of time slice
1002, -- End of time slice
'citydb', -- Source schema
'state_before_delete', -- Target schema
'TABLE', -- Relation type
ARRAY['spatial_ref_sys'] -- Exclude tables
);
) -- end of WITH block, that has collected valid audit_ids
SELECT v.log_entry FROM valid_ids f
JOIN LATERAL ( -- has to be applied for each audit_id
SELECT json_build_object(
q1.key, q1.value,
q2.key, q2.value,
...)::jsonb AS log_entry
FROM ( -- search for old values (see previous slide)
SELECT q.key, q.value FROM (
SELECT 'id' AS key, r.changes -> 'id'
FROM pgmemento.row_log r ...
)q ) q1,
SELECT q.key, q.value FROM (
SELECT 'name' AS key, r.changes -> 'name'
FROM pgmemento.row_log r ...
)q ) q2, ...
) -- previous steps are encapsulated in a WITH block
SELECT p.* FROM restore rq
JOIN LATERAL (
SELECT * FROM jsonb_populate_record(
null::citydb.building, -- template
rq.log_entry -- concatenated JSONB
)
) p ON (true)
Case 1: If deleted >> INSERT
Case 2: If updated >> UPDATE
Case 3: If inserted >> DELETE
SELECT * FROM (
(SELECT r.audit_id, r.changes, e.schema_name, e.table_name, e.op_id
FROM pgmemento.row_log r
JOIN pgmemento.table_event_log e ON r.event_id = e.id
JOIN pgmemento.transaction_log t ON t.txid = e.transaction_id
WHERE t.txid = $1 AND e.op_id > 2 -- DELETE or TRUNCATE
ORDER BY r.audit_id ASC) -- insert oldest values first
UNION ALL
(SELECT ...
WHERE t.txid = $1 AND e.op_id = 2 -- UPDATE
ORDER BY r.audit_id DESC) -- update youngest values first
UNION ALL
(SELECT ...
WHERE t.txid = $1 AND e.op_id = 1 -- INSERT
ORDER BY r.audit_id DESC -- delete youngest values first
) txid_content
ORDER BY op_id DESC -- first DELETEs, then UPDATEs and finally INSERTs
Reverts maybe a bit too simple
Filter what to restore (by time, spatial extent etc.)
No tracking of DDL commands (event triggers might help)
No merging of schemas
Need more benchmarking
More thoughts on restrictions and protection for log tables
No GUI, just SQL coding at the moment
Probably not the right option for write-extensive databases (even though only deltas are stored)
Manipulation of log tables is very easy at the moment
pgMemento is no substitution for backup and recovery tools
https://github.com/pgMemento/pgMemento
Contact:
Felix Kunde
fkunde@beuth-hochschule.de
+49(0)30 030 4504 3824