Chennai.py meetup
24th Oct 2015
Abhishek Yadav
Active in Chennai.py since Sep 2013
Talked before in Nov 2013
Co-organizer: Chennai.rb
ரூபீ ப்ரோக்ராமர
EXPLAIN show us the potential query plan along with costs
EXPLAIN helps us verify our speculations on query performance.
It also gives some insight into the algorithms Postgresql uses
EXPLAIN ANALYSE shows the query plan and costs along with executing the query.
BEGIN .. EXPLAIN ANALYSE ... ROLLBACK
Executes the query, explains it and rollsback
EXPLAIN <query>; EXPLAIN ANALYSE <query>; BEGIN; EXPLAIN ANALYSE <query>; ROLLBACK;
EXPLAIN SELECT count(*) FROM comments;
QUERY PLAN
---------------------------------------------------------------------
Aggregate (cost=1382.00..1382.01 rows=1 width=0)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=0)
(2 rows)
No. of rows output
The cost
The algorithm
Which query costs better - one with where filter or the one without?
(there are no indexes)
SELECT * FROM comments;
SELECT * FROM comments WHERE post_id=1;
EXPLAIN SELECT * FROM comments;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
(1 row)
One with the filter is more costly. Read cost is the same, CPU cost is higher
EXPLAIN SELECT * FROM comments WHERE post_id=1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on comments (cost=0.00..1382.00 rows=5 width=72)
Filter: (post_id = 1)
(2 rows)
CREATE INDEX ON comments (post_id);
EXPLAIN SELECT * FROM comments WHERE id=12345;
QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using comments_pkey on comments (cost=0.41..8.43 rows=1 width=72)
Index Cond: (id = 12345)
(2 rows)
Adding an index on filter clause may reduce the cost.
Will adding index on the gender field improve query cost ?
Is second one better than the first ?
SELECT * FROM users WHERE gender='M';
CREATE INDEX ON users (gender);
SELECT * from users WHERE gender='M';
EXPLAIN SELECT * FROM users WHERE gender='m';
QUERY PLAN
-----------------------------------------------------------
Seq Scan on users (cost=0.00..211.28 rows=4990 width=33)
Filter: ((gender)::text = 'm'::text)
(2 rows)
Low cardinality indexes are ignored!
CREATE INDEX ON users (gender);
EXPLAIN SELECT * FROM users WHERE gender='m';
QUERY PLAN
-----------------------------------------------------------
Seq Scan on users (cost=0.00..211.28 rows=4990 width=33)
Filter: ((gender)::text = 'm'::text)
(2 rows)
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id;
JOIN : Adding index on columns in ON clause.
Will that improve query performance?
CREATE INDEX ON comments (post_id);
CREATE INDEX ON posts (id);
EXPLAIN
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id;
QUERY PLAN
------------------------------------------------------------------------
Hash Right Join (cost=394.50..2589.00 rows=50000 width=150)
Hash Cond: (comments.post_id = posts.id)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
-> Hash (cost=257.00..257.00 rows=11000 width=78)
-> Seq Scan on posts (cost=0.00..257.00 rows=11000 width=78)
(5 rows)
JOIN: Will adding indexes on columns in ON clause improve cost?
CREATE INDEX ON posts (id);
CREATE INDEX on comments (post_id);
EXPLAIN
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id;
QUERY PLAN
------------------------------------------------------------------------
Hash Right Join (cost=394.50..2589.00 rows=50000 width=150)
Hash Cond: (comments.post_id = posts.id)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
-> Hash (cost=257.00..257.00 rows=11000 width=78)
-> Seq Scan on posts (cost=0.00..257.00 rows=11000 width=78)
(5 rows)
JOIN: Will adding indexes on columns in ON clause improve cost?
Does not use any index in any step.
Performs Seq Scan on all 11000 posts rows and 50000 comments.
The final Join step is also high in cost (initial as well as overall)
QUERY PLAN
------------------------------------------------------------------------
Hash Right Join (cost=394.50..2589.00 rows=50000 width=150)
Hash Cond: (comments.post_id = posts.id)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
-> Hash (cost=257.00..257.00 rows=11000 width=78)
-> Seq Scan on posts (cost=0.00..257.00 rows=11000 width=78)
(5 rows)
JOIN: Will adding indexes on columns in ON clause improve cost?
EXPLAIN
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id
WHERE posts.id < 20;
QUERY PLAN
-----------------------------------------------------------------------------------------
Nested Loop Left Join (cost=4.42..492.32 rows=95 width=150)
-> Index Scan using posts_id_idx on posts (cost=0.29..8.65 rows=21 width=78)
Index Cond: (id < 20)
-> Bitmap Heap Scan on comments (cost=4.14..22.98 rows=5 width=72)
Recheck Cond: (posts.id = post_id)
-> Bitmap Index Scan on comments_post_id_idx (cost=0.00..4.14 rows=5 width=0)
Index Cond: (posts.id = post_id)
SELECT * from comments ORDER BY updated_at LIMIT 10;
Can ORDER BY speed up using an index
CREATE INDEX ON comments (created_at);
SELECT * from comments ORDER BY created_at LIMIT 10;
Will the second be better than first?
Can ORDER BY take advantage of an index
EXPLAIN
SELECT * from comments ORDER BY updated_at LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------
Limit (cost=2337.48..2337.51 rows=10 width=72)
-> Sort (cost=2337.48..2462.48 rows=50000 width=72)
Sort Key: updated_at
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
(4 rows)
EXPLAIN
SELECT * from comments ORDER BY created_at LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=0.29..1.16 rows=10 width=72)
-> Index Scan using comments_created_at_idx on comments
(cost=0.29..4332.16 rows=50000 width=72)
(2 rows)
Abhishek Yadav
Active in Chennai.py since Sep 2013
Talked before in Nov 2013
Co-organizer: Chennai.rb
ரூபீ ப்ரோக்ராமர