Benjamin Roth
@apneadiving
🇫🇷
💫 ⭐️ 🌟 ✨ ⚡️
| Before | After | |
|---|---|---|
| RAM Used | 180 GB | 10 GB |
| RAM per thread | 3 GB | 500Mo |
| Average time | 10 minutes | 10 seconds |
| Synchronisation tracking | no info available | all timestamps in database |
first refactoring
Before
After
The advertising data was bundled in some huge object known as the state.
Every single problems was supposed to come from it
Account
Campaigns (~100)
Ad Groups (~25k)
Ads (~100k)
Keywords (~100k)
(around 700Mo of JSON)
🏋️♂️
Account
Campaigns (~100)
Ad Groups (~25k)
Ads (~100k)
Keywords (~100k)
The whole state object was saved in database, in some byte column:
compressed JSON.
🗜
🦛
SQL is pretty well designed for storing data.
Do not store blobs. 🙄
We use a table per kind of entity:
campaigns, ads, keywords...
First, we sync
Later we have to sync
{
"reference_id": "124",
"text": "Eat my short",
"match_type": "EXACT"
}{
"reference_id": "124",
"text": "Eat my belt",
"match_type": "EXACT"
}No diff
nothing to push on API
Even later we have to sync
{
"reference_id": "124",
"text": "Eat my short",
"match_type": "EXACT"
}there is a diff
something to push on API
{
"reference_id": "124",
"text": "Eat my short",
"match_type": "EXACT"
}{
"reference_id": "124",
"text": "Eat my belt",
"match_type": "EXACT"
}Does this mean we have to store all properties of each object in database?
🙅♂️
MD5
Some String
MD5
Some other String
Store the minimum relevant data you need
🐣
triggering a synchronisation is telling the app to push data of a product to adwords
Was a matter of enqueuing some worker
CreateStateWorker.perform_async(product_id)😖
CreateStateWorker.perform_async(product_id)CreateStateWorker.perform_async(product_id)Synchronisation.create!(
status: 'pending',
product_id: product_id
)⚙️
and have a cron handle pushing jobs to queues
Whenever you are talking a lot about some concept (Synchronisation in our case),
It could be that there is an object crying for you to create it.
👷♂️
Full sync process was a cascade of workers
CreateStateWorker.perform_async(synchronisation_id)CreateDiffWorker.perform_async(synchronisation_id)At the end of the worker,
it triggered the next step
PushDiffWorker.perform_async(synchronisation_id)Which in turn triggered
(there were actually a few more steps)
Queue
Processing
Sync2 - step 1
T0
T1
T2
Time
Sync1 - step 2
Sync2 - step 2
Sync1 - step 1
Sync2 - step 1
Sync2 - step 2
Hardware matters, idle hardware is a waste of money.
We did regroup under one queue (then only one worker).
Still doesnt help having a synchro exiting the pipeline as fast as possible.
🤔
Queue for SynchronisationWorker has:
2 processes, 2 threads - 4 jobs can run in //
Say you push 10 jobs:
we had a cron pushing jobs, it was instructed to push as many as:
MAX_CONCURRENT_PROCESSES - Synchronisation.in_progress.countEnqueuing the right amount of jobs:
🤸♀️
| Problem 🚱 | Solution ✅ |
|---|---|
| ensure the same product is not enqueued twice | DB constraint on Synchronisation Status |
| ensure synchronisation is dealt with as fast as possible once enqueued | cron controls what is in the queue. |
| prioritize some products? | Scope in the cron which is responsible for pushing jobs to queue |
| synchronisation stats? | carried by each Synchronisation object in database |
| Hardware usage optimization? | One only queue we can adjust depending on the load |
| Problem 🚱 | Solution ✅ |
|---|---|
| Memory issues | No more blob MD5 comparison only |
| Speed concerns | Workers setup |
☠️ 🧟♂️
It could be useful, but let's face it, it's not a priority.
Chill...
🥶
Hatred of objects
Obsession for functions on hashes instead
🧘♂️
Entity.full_name(
first_name: 'Mo',
last_name: 'Fo'
)nasty by nature
waiting to bite you in the back...
🦖
☯️
GC.startobsession of micro optimization
=
Shitty code
+
no real time to fix it
STEP BACKWARDS to see
the in the room
🐘