Better Data Governance with Revision Control Workflows

Do you handle data at work?

Show of hands 🤚🏻

We all know the struggle...

Ex-Data Scientist

Sharing data and managing who have what access to data should not be painful 

I believe...

We should solve this problem

Engineers are very good at solving problems...

Let's see what software engineers does

Have you heard about this thing called... git?

With git, you can...

Branches - work on your own version

Rollback - go back if something went wrong

Diff - compare changes

Merge - combine changes

Platform for multiuser access to projects (repos)

Open-source software made into the mainstream thanks to GitHub and other similar platforms that make use to features of git.

Can we use git for datasets?

Not ideal......

Database + git ?

This is how we do it in TerminusDB...

Branches

Branches is builtin for TerminusDB

You can create branches at any point (from any commits)

Rollbacks

In TerminusDB you can rollback to any commits

Make backing up more organized and managable

Diff (Patch)

Compare difference in document (data object)

Allow preview of changes, approve of changes before applying the changes

Works on any json objects

Merge

Merge is not avaliable in TerminusDB yet

But with diff and patch, merge can be available really soon in the future

For now, one option is to use rebase

When is it going to be useful?

You will never know until you try...

Before we go...

Better Data Governance with Revision Control Workflows

By Cheuk Ting Ho

Better Data Governance with Revision Control Workflows

  • 133