March Treasure Hunt

or dealing with performance accidents

Agenda

  1. Accident
  2. Identify a bottleneck
  3. Apply a fix
  4. Confirm effect

Accident

16 March 2022: a spike in errors, Liveforce plarform become very slow or stop working.

Later we've got another accident, shorter but similar.

  • number of active users was not changed
  • number of publishes or other stuff remain the same
  • MySQL slow query logs was enabled after the first accident

First checked ideas

Identify a bottleneck

16 of March

Slowest queries

SELECT t.job_id as jobId,
t.headline as jobHeadline,
MIN(t.start_date) as jobStartTime,
GROUP_CONCAT(DISTINCT t.roleId ORDER BY t.start_date ASC SEPARATOR ',') as roles
FROM (SELECT j.id as job_id, j.headline as headline, j.date as jobDate, jr.id as roleId, jr.start_date_utc as start_date
FROM job_roles as jr
JOIN jobs as j on j.id = jr.job_id
JOIN boards as b on j.board_id = b.id
LEFT JOIN (SELECT jr.id, count(jb.id) as count
FROM boards b
JOIN jobs as j on j.board_id = b.id
JOIN job_roles as jr on jr.job_id = j.id
LEFT JOIN crew_jobs as jb on jb.job_role_id = jr.id and jb.status = 'approved'
WHERE b.agency_id = 4
GROUP BY jr.id) approved on approved.id = jr.id
LEFT JOIN (SELECT j.multi_day_job_id, MIN(jr.start_date_utc) as start_date_utc
FROM boards b
JOIN jobs as j on j.board_id = b.id
JOIN job_roles jr ON jr.job_id = j.id
WHERE b.agency_id = 4
GROUP BY j.multi_day_job_id) mdj ON mdj.multi_day_job_id = j.multi_day_job_id
WHERE b.agency_id = 4
AND (jr.is_visible_when_full = 1 or jr.required_crew_count > ifnull(approved.count, 0) or EXISTS(SELECT *
FROM crew_jobs
WHERE freelancer_id = 40203
AND status = 'approved'
AND job_role_id = jr.id))
AND jr.is_visible_on_job_board = 1
AND jr.start_date_utc >= '2022-03-16 21:59:26'
AND (mdj.start_date_utc IS NULL OR mdj.start_date_utc >= NOW())) AS t
GROUP BY jobId, t.jobDate, t.headline
ORDER BY t.jobDate, jobStartTime, jobHeadline

LIMIT 1 OFFSET 0

Crew App Job Board

Errors analyse

Reproduce

I've logged in as affected account and just by scrolling on Job Board was able to partially reproduce an issue

MySQL query visualisation

15 of March: it was quick (<1s)

18 of March: became slow (~5s)

Optimized query part

GROUP BY jobId, t.jobDate, t.headline
ORDER BY t.jobDate, jobStartTime, jobHeadline
LIMIT 10 OFFSET 0

Before:

GROUP BY jobId
ORDER BY jobStartTime, jobHeadline
LIMIT 30 OFFSET 0

After:

Apply a fix

New Query visualisation (EXPLAIN)

Time to exec: <1s

Confirm effect

Made with Slides.com