PostGIS Indizes
Welcher ist der richtige?
Felix Kunde
F O S S
I
G
2019
CREATE INDEX pts_spx
ON point_table
USING GIST (geom)
BRIN
SPGIST
< v2.3
< v2.5
GiST
It's a framework
PostGIS > R-Tree
Stores BBox
Leaves
Nodes
Root
Level 1
Level 2
ST_Intersects in ms
tests | 100 k | 1 Mio | 10 Mio | 100 Mio | 1 Bn |
---|---|---|---|---|---|
no index | 18.00 | 87.00 | 670 | 6473 | 135529 |
bulk | 0.21 | 14.00 | 19 | 146 | 1568 |
online | 0.18 | 15.00 | 29 | 163 | 1672 |
vacuum | 0.16 | 0.87 | 18 | 145 | 1551 |
cluster | 0.13 | 0.64 | 16 | 32 | 214 |
It gets big!
Impact on writes
(x4-16 points, 1.3 lines)
Fastest index
BRIN
... | time | geom |
---|---|---|
... | 2019-03-13 | POINT(13.8 50) |
... | 2019-03-13 | POINT(13.8 50) |
... | ... | |
2019-03-14 | POINT(13.8 51) | |
2019-03-16 | POINT(13.7 51) | |
... | ... | |
... | 2019-03-16 | POINT(13.7 51) |
... | 2019-03-16 | POINT(13.7 51) |
Block Range Index examples
(2019-03-13 09:00:00 , 2019-03-14 11:00:00)
(A-Weg , Grunaer Str.)
BBOX
(2019-03-14 11:15:00 , 2019-03-15 18:00:00)
(Grunaer Weg, Nürnberger Str.)
BBOX
(2019-03-16 09:00:00 , 2019-03-16 17:30:00)
(Oberauer Str. , Zwinglistr.)
BBOX
Block ranges. That's all.
Data must be sorted!
PostGIS > BBox per block
ORDER BY geom
ORDER BY ST_GeoHash(geom)
SET enable_seqscan = false;
BRIN vs. GiST
tests | 100 k | 1 Mio | 10 Mio | 100 Mio | 1 Bn |
---|---|---|---|---|---|
create gist | 700 ms | 8 sec | 2 min | 23 min | 6 hrs |
create brin | 24 ms | 0.2 sec | 2 sec | 18 sec | 90 sec |
size gist | 5 MB | 50 MB | 500 MB | 5 GB | 50 GB |
size brin | 24 KB | 24 KB | 48 KB | 376 KB | 3,6 MB |
duration | x25 | x23 | x1,4 | x1.6 | x1.1 |
Super small
Build in seconds
Slower, but ok
sp-GiST
Framework like GIST
Unbalanced tree
No overlaps & prefixes
PostGIS > BBox in 4D
https://www.researchgate.net/figure/Adaptive-k-d-tree_fig9_2334587
kd-Tree, Quadtree
Trick: No Overlap via multiple dimensions
Each point you see on the map are in fact 4 bounding boxes which are the prefixes of the sp-GiST tree defining the bounds of child quadrants
sp-GiST vs. GiST
tests | 100 k | 1 Mio | 10 Mio | 100 Mio | 1 Bn |
---|---|---|---|---|---|
create gist | 700 ms | 8 sec | 2 min | 23 min | 6 hrs |
create spgist | 344 ms | 3,7 sec | 50 sec | 11 min | 8 hrs |
size gist | 5 MB | 50 MB | 500 MB | 5 GB | 50 GB |
size spgist | 4,5 MB | 44 MB | 440 MB | 4.3 GB | 43 GB |
duration | x0.85 | x1 | x1 | x1 | x1.1 |
12% less big as GiST
2x faster for writes
Less predictable
Conclusion
Static dataset
> available RAM
overlaps
overlaps
overlaps
GiST, BRIN
sp-GiST
GiST
sp-GiST
sp-GiST
BRIN
yes
yes
yes
yes
yes
no
no
no
no
no
General advices
When to index?
Clean up bloat
Update statistics
Table rewrite
What is queried?
Index-only scans
> pg_stat_statements
> VACUUM, pg_repack
> ANALYZE
> CLUSTER
> Partial indexes
> Covering indexes
Das wars. Fragen?
https://slides.com/fxku/postgis-indexing
Used hardware
Tuxedo Infinity Book 13
Intel i7-8550U CPU 1.80GHz
Quadcore, 8 CPUs
32 GB RAM
500GB SSD disk
PostgreSQL config
PostgreSQL 11 & PostGIS 2.5
shared_buffers = 16 GB
work_mem = 128 MB
maintenance_work_mem = 4 GB
min/max_wal_level = 16/4 GB
checpoint_timeout = 30 min
checkpoint_completion_target = 0.9
random_page_cost = 1.1
cpu_tuple_cost = 0.001
cpu_index_tuple_cost = 0.001
effective_cache_size = 24 GB
default_statistics_target = 500
Repo
github.com/FxKu/postgis_indexing
PostGIS Indexing
By fxku
PostGIS Indexing
- 2,957