Postgresql explain
Chennai.py meetup
24th Oct 2015
@h6165
Abhishek Yadav
Active in Chennai.py since Sep 2013
Talked before in Nov 2013
Co-organizer: Chennai.rb
ரூபீ ப்ரோக்ராமர
0. Definitions
The definition
EXPLAIN show us the potential query plan along with costs
EXPLAIN helps us verify our speculations on query performance.
It also gives some insight into the algorithms Postgresql uses
EXPLAIN ANALYSE shows the query plan and costs along with executing the query.
BEGIN .. EXPLAIN ANALYSE ... ROLLBACK
Executes the query, explains it and rollsback
The syntax
EXPLAIN <query>; EXPLAIN ANALYSE <query>; BEGIN; EXPLAIN ANALYSE <query>; ROLLBACK;
The anatomy
EXPLAIN SELECT count(*) FROM comments;
QUERY PLAN
---------------------------------------------------------------------
Aggregate (cost=1382.00..1382.01 rows=1 width=0)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=0)
(2 rows)
No. of rows output
The cost
The algorithm
The anatomy
- The EXPLAIN output is a tree
- It should be read bottom-up (leaf to root)
- Cost of a step is a summation of child steps
- 'rows' are the rows emitted by a step, not the rows scanned
The cost
- Quantification of the effort needed on a arbitrary scale
- The exact number doesn't mean anything. Can be used for comparisons
- Lower the better
- Is proportional to the no. of blocks read, no. of random blocks read. And to CPU operations too by a small measure
1. Explain where
Explain where: Quiz
Which query costs better - one with where filter or the one without?
(there are no indexes)
SELECT * FROM comments;
SELECT * FROM comments WHERE post_id=1;
Explain where: Quiz
EXPLAIN SELECT * FROM comments;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
(1 row)
One with the filter is more costly. Read cost is the same, CPU cost is higher
EXPLAIN SELECT * FROM comments WHERE post_id=1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on comments (cost=0.00..1382.00 rows=5 width=72)
Filter: (post_id = 1)
(2 rows)
2. Let there be index
Explain and index
CREATE INDEX ON comments (post_id);
EXPLAIN SELECT * FROM comments WHERE id=12345;
QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using comments_pkey on comments (cost=0.41..8.43 rows=1 width=72)
Index Cond: (id = 12345)
(2 rows)
Adding an index on filter clause may reduce the cost.
Explain and index: Quiz
Will adding index on the gender field improve query cost ?
Is second one better than the first ?
SELECT * FROM users WHERE gender='M';
CREATE INDEX ON users (gender);
SELECT * from users WHERE gender='M';
EXPLAIN SELECT * FROM users WHERE gender='m';
QUERY PLAN
-----------------------------------------------------------
Seq Scan on users (cost=0.00..211.28 rows=4990 width=33)
Filter: ((gender)::text = 'm'::text)
(2 rows)
Low cardinality indexes are ignored!
No
CREATE INDEX ON users (gender);
EXPLAIN SELECT * FROM users WHERE gender='m';
QUERY PLAN
-----------------------------------------------------------
Seq Scan on users (cost=0.00..211.28 rows=4990 width=33)
Filter: ((gender)::text = 'm'::text)
(2 rows)
Explain and index: Quiz
- An index on a column that can yield very few possible values
- Cardinality = no of possible values / total rows
- Database may ignore index and will do a full scan if cardinality is too low
- The limit varies across databases, but is generally 10-20%
- Explain helps us check the usefulness of an index. An index always increases write costs, so this is important
Explain and index: Low cardinality
3. Explain JOIN
Explain JOIN: Quiz
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id;
JOIN : Adding index on columns in ON clause.
Will that improve query performance?
CREATE INDEX ON comments (post_id);
CREATE INDEX ON posts (id);
EXPLAIN
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id;
QUERY PLAN
------------------------------------------------------------------------
Hash Right Join (cost=394.50..2589.00 rows=50000 width=150)
Hash Cond: (comments.post_id = posts.id)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
-> Hash (cost=257.00..257.00 rows=11000 width=78)
-> Seq Scan on posts (cost=0.00..257.00 rows=11000 width=78)
(5 rows)
JOIN: Will adding indexes on columns in ON clause improve cost?
No
CREATE INDEX ON posts (id);
CREATE INDEX on comments (post_id);
EXPLAIN
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id;
QUERY PLAN
------------------------------------------------------------------------
Hash Right Join (cost=394.50..2589.00 rows=50000 width=150)
Hash Cond: (comments.post_id = posts.id)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
-> Hash (cost=257.00..257.00 rows=11000 width=78)
-> Seq Scan on posts (cost=0.00..257.00 rows=11000 width=78)
(5 rows)
Explain JOIN: Quiz
JOIN: Will adding indexes on columns in ON clause improve cost?
Explain JOIN: Quiz
Does not use any index in any step.
Performs Seq Scan on all 11000 posts rows and 50000 comments.
The final Join step is also high in cost (initial as well as overall)
QUERY PLAN
------------------------------------------------------------------------
Hash Right Join (cost=394.50..2589.00 rows=50000 width=150)
Hash Cond: (comments.post_id = posts.id)
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
-> Hash (cost=257.00..257.00 rows=11000 width=78)
-> Seq Scan on posts (cost=0.00..257.00 rows=11000 width=78)
(5 rows)
JOIN: Will adding indexes on columns in ON clause improve cost?
Explain JOIN: Quiz
- JOIN algorithms are not capable of using any indexes on the ON columns
- The best way to optimize JOINS is by adding filters on at least one table
EXPLAIN
SELECT posts.*, comments.*
FROM posts LEFT JOIN comments
ON posts.id = comments.post_id
WHERE posts.id < 20;
QUERY PLAN
-----------------------------------------------------------------------------------------
Nested Loop Left Join (cost=4.42..492.32 rows=95 width=150)
-> Index Scan using posts_id_idx on posts (cost=0.29..8.65 rows=21 width=78)
Index Cond: (id < 20)
-> Bitmap Heap Scan on comments (cost=4.14..22.98 rows=5 width=72)
Recheck Cond: (posts.id = post_id)
-> Bitmap Index Scan on comments_post_id_idx (cost=0.00..4.14 rows=5 width=0)
Index Cond: (posts.id = post_id)
4. Explain ORDER BY
SELECT * from comments ORDER BY updated_at LIMIT 10;
Can ORDER BY speed up using an index
CREATE INDEX ON comments (created_at);
SELECT * from comments ORDER BY created_at LIMIT 10;
Will the second be better than first?
Explain Order By: Quiz
Can ORDER BY take advantage of an index
EXPLAIN
SELECT * from comments ORDER BY updated_at LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------
Limit (cost=2337.48..2337.51 rows=10 width=72)
-> Sort (cost=2337.48..2462.48 rows=50000 width=72)
Sort Key: updated_at
-> Seq Scan on comments (cost=0.00..1257.00 rows=50000 width=72)
(4 rows)
Yes
Explain Order By: Quiz
EXPLAIN
SELECT * from comments ORDER BY created_at LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=0.29..1.16 rows=10 width=72)
-> Index Scan using comments_created_at_idx on comments
(cost=0.29..4332.16 rows=50000 width=72)
(2 rows)
/Explain
@h6165
Abhishek Yadav
Active in Chennai.py since Sep 2013
Talked before in Nov 2013
Co-organizer: Chennai.rb
ரூபீ ப்ரோக்ராமர
Postgresql explain
By Abhishek Yadav
Postgresql explain
- 1,637