PostGIS Indizes

Welcher ist der richtige?

Felix Kunde

F O S S

I

G

2019

CREATE INDEX pts_spx
ON point_table
USING GIST (geom)
BRIN
SPGIST

< v2.3

< v2.5

GiST

It's a framework

PostGIS > R-Tree

Stores BBox

Leaves

Nodes

Root

Level 1

Level 2

ST_Intersects in ms

tests 100 k 1 Mio 10 Mio 100 Mio 1 Bn
no index 18.00 87.00 670 6473 135529
bulk 0.21 14.00 19 146 1568
online 0.18 15.00 29 163 1672
vacuum 0.16 0.87 18 145 1551
cluster 0.13 0.64 16 32 214

It gets big!

Impact on writes

(x4-16 points, 1.3 lines)

Fastest index

BRIN

... time geom
... 2019-03-13 POINT(13.8 50)
... 2019-03-13 POINT(13.8 50)
... ...
2019-03-14 POINT(13.8 51)
2019-03-16 POINT(13.7 51)
... ...
... 2019-03-16 POINT(13.7 51)
... 2019-03-16 POINT(13.7 51)

Block Range Index examples

(2019-03-13 09:00:00 , 2019-03-14 11:00:00)

(A-Weg , Grunaer Str.)

BBOX

(2019-03-14 11:15:00 , 2019-03-15 18:00:00)

(Grunaer Weg, Nürnberger Str.)

BBOX

(2019-03-16 09:00:00 , 2019-03-16 17:30:00)

(Oberauer Str. , Zwinglistr.)

BBOX

Block ranges. That's all.

Data must be sorted!

PostGIS > BBox per block

ORDER BY geom

ORDER BY ST_GeoHash(geom)

SET enable_seqscan = false;

BRIN vs. GiST

tests 100 k 1 Mio 10 Mio 100 Mio 1 Bn
 create gist 700 ms 8 sec 2 min 23 min 6 hrs
create brin 24 ms 0.2 sec 2 sec 18 sec 90 sec
size gist 5 MB 50 MB 500 MB 5 GB 50 GB
size brin 24 KB 24 KB 48 KB 376 KB 3,6 MB
duration x25 x23 x1,4 x1.6 x1.1

Super small

Build in seconds

Slower, but ok

sp-GiST

Framework like GIST

Unbalanced tree

No overlaps & prefixes

PostGIS > BBox in 4D

https://www.researchgate.net/figure/Adaptive-k-d-tree_fig9_2334587

kd-Tree, Quadtree

Trick: No Overlap via multiple dimensions

Each point you see on the map are in fact 4 bounding boxes which are the prefixes of the sp-GiST tree defining the bounds of child quadrants

sp-GiST vs. GiST

tests 100 k 1 Mio 10 Mio 100 Mio 1 Bn
 create gist 700 ms 8 sec 2 min 23 min 6 hrs
create spgist 344 ms 3,7 sec 50 sec 11 min 8 hrs
size gist 5 MB 50 MB 500 MB 5 GB 50 GB
size spgist 4,5 MB 44 MB 440 MB 4.3 GB 43 GB
duration x0.85 x1 x1 x1 x1.1

12% less big as GiST

2x faster for writes

Less predictable

Conclusion

Static dataset

> available RAM

overlaps

overlaps

overlaps

GiST, BRIN

sp-GiST

GiST

sp-GiST

sp-GiST

BRIN

yes

yes

yes

yes

yes

no

no

no

no

no

General advices

When to index?

Clean up bloat

Update statistics

Table rewrite

What is queried?

Index-only scans

> pg_stat_statements

> VACUUM, pg_repack

> ANALYZE

> CLUSTER

> Partial indexes

> Covering indexes

Das wars. Fragen?

https://slides.com/fxku/postgis-indexing

Used hardware

Tuxedo Infinity Book 13

Intel i7-8550U CPU 1.80GHz

Quadcore, 8 CPUs

32 GB RAM

500GB SSD disk

PostgreSQL config

PostgreSQL 11 & PostGIS 2.5

shared_buffers = 16 GB

work_mem = 128 MB

maintenance_work_mem = 4 GB

min/max_wal_level = 16/4 GB

checpoint_timeout = 30 min

checkpoint_completion_target = 0.9

random_page_cost = 1.1

cpu_tuple_cost = 0.001

cpu_index_tuple_cost = 0.001

effective_cache_size = 24 GB

default_statistics_target = 500

Repo

github.com/FxKu/postgis_indexing

Made with Slides.com