Super Lightweight Design for Real-time Ingestion

The magic that comes with PubSub BigQuery Subscription

What is TTL (Time-to-live)

name address expires_at
Dummy Dummy Abc Mountain Lane 2020-10-29T08:00:00Z
Dummy Dummy2 Def Mountain Lane 2020-10-29T09:00:00Z

Use Cases

Manage Sensitive Data

name address expires_at
Dummy Dummy Abc Mountain Lane 2020-10-29T08:00:00Z
Dummy Dummy2 Def Mountain Lane 2020-10-29T09:00:00Z

Only keep most recent data

id name address
1 Dummy Dummy Abc Mountain Lane
2 Dummy Dummy2 Def Mountain Lane
id datetime amount expires_at customer_id
1 ​2020-09-29T08:00:​00Z 12.00 2020-10-29T08:00:00Z 1
2 ​2020-09-29T09:00:00Z 15.00 2020-10-29T09:00:00Z 1
3 ​2020-10-29T08:00:00Z 13.00 2020-11-29T08:00:00Z 2

Customer

Transaction

Sync with another data source

id name address
1 Dummy Dummy Abc Mountain Lane
2 Dummy Dummy2 Def Mountain Lane

Customer

id name address expires_at
1 Dummy Dummy Abc Mountain Lane [datetime + 2 days]
3 Dummy Dummy3 Xyz Mountain Lane [datetime + 2 days]
2 Dummy Dummy2 Def Mountain Lane [datetime + 2 days]

Customer

Daily sync

Let's see how it works

Summary of Caveats

It is difficult to check if the TTL policy has been successfully set because there is no validation

It's taking a long time after expiration timestamp for records to be cleaned up (typically within 72 hours)

Unclear how the policy affects multiple namespaces

Setup a TTL policy takes quite a long time (15 minutes +)

Made with Slides.com