Postgresql explain

Chennai.py meetup

24th Oct 2015

@h6165

Abhishek Yadav

Active in Chennai.py since Sep 2013

Talked before in Nov 2013

Co-organizer: Chennai.rb

ரூபீ ப்ரோக்ராமர

0. Definitions

The definition

EXPLAIN show us the potential query plan along with costs

 

EXPLAIN helps us verify our speculations on query performance.

It also gives some insight into the algorithms Postgresql uses

 

EXPLAIN ANALYSE shows the query plan and costs along with executing the query.

 

BEGIN .. EXPLAIN ANALYSE ... ROLLBACK

Executes the query, explains it and rollsback

 

The syntax

EXPLAIN <query>;

EXPLAIN ANALYSE <query>;

BEGIN;
EXPLAIN ANALYSE <query>;
ROLLBACK;

The anatomy

EXPLAIN SELECT count(*) FROM comments;

                             QUERY PLAN
---------------------------------------------------------------------
 Aggregate  (cost=1382.00..1382.01 rows=1 width=0)
   -> Seq Scan on comments  (cost=0.00..1257.00 rows=50000 width=0)
(2 rows)

No. of rows output

The cost

The algorithm

The anatomy

  • The EXPLAIN output is a tree
  • It should be read bottom-up (leaf to root)
  • Cost of a step is a summation of child steps
  • 'rows' are the rows emitted by a step, not the rows scanned

The cost

  • Quantification of the effort needed on a arbitrary scale
  • The exact number doesn't mean anything. Can be used for comparisons
  • Lower the better
  • Is proportional to the no. of blocks read, no. of random blocks read. And to CPU operations too by a small measure

1. Explain where

Explain where: Quiz

Which query costs better - one with where filter or the one without?

(there are no indexes)

SELECT * FROM comments;
SELECT * FROM comments WHERE post_id=1;

Explain where: Quiz

EXPLAIN SELECT * FROM comments;

                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on comments  (cost=0.00..1257.00 rows=50000 width=72)
(1 row)

One with the filter is more costly. Read cost is the same, CPU cost is higher

EXPLAIN SELECT * FROM comments WHERE post_id=1;

                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on comments  (cost=0.00..1382.00 rows=5 width=72)
   Filter: (post_id = 1)
(2 rows)

2. Let there be index

Explain and index

CREATE INDEX ON comments (post_id);
EXPLAIN SELECT * FROM comments WHERE id=12345;


                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using comments_pkey on comments  (cost=0.41..8.43 rows=1 width=72)
   Index Cond: (id = 12345)
(2 rows)

Adding an index on filter clause may reduce the cost.

Explain and index: Quiz

Will adding index on the gender field improve query cost ? 

Is second one better than the first ?


SELECT * FROM users WHERE gender='M'; 

CREATE INDEX ON users (gender);
SELECT * from users WHERE gender='M'; 
EXPLAIN SELECT * FROM users WHERE gender='m';

                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on users  (cost=0.00..211.28 rows=4990 width=33)
   Filter: ((gender)::text = 'm'::text)
(2 rows)

Low cardinality indexes are ignored!

No

CREATE INDEX ON users (gender);
EXPLAIN SELECT * FROM users WHERE gender='m';


                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on users  (cost=0.00..211.28 rows=4990 width=33)
   Filter: ((gender)::text = 'm'::text)
(2 rows)

Explain and index: Quiz

  • An index on a column that can yield very few possible values
  • Cardinality = no of possible values / total rows
  • Database may ignore index and will do a full scan if cardinality is too low
  • The limit varies across databases, but is generally 10-20%
  • Explain helps us check the usefulness of an index. An index always increases write costs, so this is important

Explain and index: Low cardinality

3. Explain JOIN

Explain JOIN: Quiz

SELECT posts.*, comments.* 
  FROM posts LEFT JOIN comments
  ON posts.id = comments.post_id;

JOIN : Adding index on columns in ON clause.

Will that improve query performance?

CREATE INDEX ON comments (post_id);
CREATE INDEX ON posts (id);
EXPLAIN
  SELECT posts.*, comments.*
   FROM posts LEFT JOIN comments
   ON posts.id = comments.post_id;

                               QUERY PLAN
------------------------------------------------------------------------
 Hash Right Join  (cost=394.50..2589.00 rows=50000 width=150)
   Hash Cond: (comments.post_id = posts.id)
   ->  Seq Scan on comments  (cost=0.00..1257.00 rows=50000 width=72)
   ->  Hash  (cost=257.00..257.00 rows=11000 width=78)
         ->  Seq Scan on posts  (cost=0.00..257.00 rows=11000 width=78)
(5 rows)

JOIN: Will adding indexes on columns in ON clause improve cost?

No

CREATE INDEX ON posts (id);
CREATE INDEX on comments (post_id);

EXPLAIN
  SELECT posts.*, comments.*
   FROM posts LEFT JOIN comments
   ON posts.id = comments.post_id;

                               QUERY PLAN
------------------------------------------------------------------------
 Hash Right Join  (cost=394.50..2589.00 rows=50000 width=150)
   Hash Cond: (comments.post_id = posts.id)
   ->  Seq Scan on comments  (cost=0.00..1257.00 rows=50000 width=72)
   ->  Hash  (cost=257.00..257.00 rows=11000 width=78)
         ->  Seq Scan on posts  (cost=0.00..257.00 rows=11000 width=78)
(5 rows)

Explain JOIN: Quiz

JOIN: Will adding indexes on columns in ON clause improve cost?

Explain JOIN: Quiz

Does not use any index in any step.

Performs Seq Scan on all 11000 posts rows and 50000 comments.

The final Join step is also high in cost (initial as well as overall)

                               QUERY PLAN
------------------------------------------------------------------------
 Hash Right Join  (cost=394.50..2589.00 rows=50000 width=150)
   Hash Cond: (comments.post_id = posts.id)
   ->  Seq Scan on comments  (cost=0.00..1257.00 rows=50000 width=72)
   ->  Hash  (cost=257.00..257.00 rows=11000 width=78)
         ->  Seq Scan on posts  (cost=0.00..257.00 rows=11000 width=78)
(5 rows)

JOIN: Will adding indexes on columns in ON clause improve cost?

Explain JOIN: Quiz

  • JOIN algorithms are not capable of using any indexes on the ON columns
  • The best way to optimize JOINS is by adding filters on at least one table
 EXPLAIN
  SELECT posts.*, comments.*
   FROM posts LEFT JOIN comments
   ON posts.id = comments.post_id
   WHERE posts.id < 20;

                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=4.42..492.32 rows=95 width=150)
   ->  Index Scan using posts_id_idx on posts  (cost=0.29..8.65 rows=21 width=78)
         Index Cond: (id < 20)
   ->  Bitmap Heap Scan on comments  (cost=4.14..22.98 rows=5 width=72)
         Recheck Cond: (posts.id = post_id)
         ->  Bitmap Index Scan on comments_post_id_idx  (cost=0.00..4.14 rows=5 width=0)
               Index Cond: (posts.id = post_id)

4. Explain ORDER BY

SELECT * from comments ORDER BY updated_at LIMIT 10;

Can ORDER BY speed up using an index

CREATE INDEX ON comments (created_at);
SELECT * from comments ORDER BY created_at LIMIT 10;

Will the second be better than first?

Explain Order By: Quiz

Can ORDER BY take advantage of an index

EXPLAIN
  SELECT * from comments ORDER BY updated_at LIMIT 10;

                                 QUERY PLAN
----------------------------------------------------------------------------
 Limit  (cost=2337.48..2337.51 rows=10 width=72)
   ->  Sort  (cost=2337.48..2462.48 rows=50000 width=72)
         Sort Key: updated_at
         ->  Seq Scan on comments  (cost=0.00..1257.00 rows=50000 width=72)
(4 rows)

Yes

Explain Order By: Quiz

EXPLAIN
  SELECT * from comments ORDER BY created_at LIMIT 10;

                                  QUERY PLAN
--------------------------------------------------------------------------
 Limit  (cost=0.29..1.16 rows=10 width=72)
   ->  Index Scan using comments_created_at_idx on comments
        (cost=0.29..4332.16 rows=50000 width=72)
(2 rows)

/Explain

@h6165

Abhishek Yadav

Active in Chennai.py since Sep 2013

Talked before in Nov 2013

Co-organizer: Chennai.rb

ரூபீ ப்ரோக்ராமர

Postgresql explain

By Abhishek Yadav

Postgresql explain

  • 1,637