Normalizing in tables and formalizing structure with a schema is not useful for this portion of data, and not worth the effort. Just shove it in a document-like column
For whatever reason, some of that data becomes relevant (e.g: new feature).
But it's still not a first-class entity.
So it's still simpler to keep it as documents.
Need query capabilities on the documents!
Data
-[ RECORD 1 ]----------------------------------------------------------------
id | 1
logs | [{"level": "INFO", "text":"config changed","timestamp":1425029952107},
{"level": "ERROR", "text":"zot","timestamp":1425029952108},
{"level": "WARN", "text":"warnever","timestamp":1425029952110}]
-[ RECORD 2 ]----------------------------------------------------------------
id | 2
logs | [{"level": "INFO", "text":"config changed","timestamp":1425029952107},
{"level": "WARN", "text":"warnever","timestamp":1425029952110}]
-[ RECORD 3 ]----------------------------------------------------------------
id | 3
logs | [{"level": "INFO", "text":"config changed","timestamp":1425029952107},
{"level": "WARN", "text":"warnever","timestamp":1425029952110}]
"All devices with log lines at error level"
sample=# select distinct on (d.id)
sample-# d.*
sample-# from devices d,
sample-# json_array_elements(d.logs) l
sample-# where l->>'level' = 'ERROR';
id |
----+-----------------------------------------------------------------------------
1 | [{"level": "INFO", "text":"config changed","timestamp":1425029952107},{"l...
Query plan
QUERY PLAN
-----------------------------------------------------------------------------------------
Unique (cost=1832.25..1838.05 rows=200 width=40) (actual time=0.090..0.091 rows=1 loops
-> Sort (cost=1832.25..1835.15 rows=1160 width=40) (actual time=0.090..0.090 rows=1
Sort Key: d.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.00..1773.20 rows=1160 width=40) (actual time=0.046..0.0
-> Seq Scan on devices d (cost=0.00..21.60 rows=1160 width=40) (actual t
-> Function Scan on json_array_elements l (cost=0.00..1.50 rows=1 width=
Filter: ((value ->> 'level'::text) = 'ERROR'::text)
Rows Removed by Filter: 2
Planning time: 0.112 ms
Execution time: 0.134 ms
(11 rows)
(Interesting part is the Nested Loop, sorting and unique is for the DISTINCT clause)
Complex queries that spans multiple documents and/or deep nested structures might be very slow
"All devices with log lines at error level"
sample=# select distinct on (d.id)
sample-# d.*
sample-# from devices d,
sample-# jsonb_array_elements(d.logs) l
sample-# where l->>'level' = 'ERROR';
id |
----+-----------------------------------------------------------------------------
1 | [{"level": "INFO", "text":"config changed","timestamp":1425029952107},{"l...
Query plan
QUERY PLAN
-----------------------------------------------------------------------------------------
Unique (cost=1832.25..1838.05 rows=200 width=40) (actual time=0.090..0.091 rows=1 loops
-> Sort (cost=1832.25..1835.15 rows=1160 width=40) (actual time=0.090..0.090 rows=1
Sort Key: d.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.00..1773.20 rows=1160 width=40) (actual time=0.046..0.0
-> Seq Scan on devices d (cost=0.00..21.60 rows=1160 width=40) (actual t
-> Function Scan on jsonb_array_elements l (cost=0.00..1.50 rows=1 width=
Filter: ((value ->> 'level'::text) = 'ERROR'::text)
Rows Removed by Filter: 2
Planning time: 0.083 ms
Execution time: 0.069 ms
(11 rows)
(Interesting part is the Nested Loop, sorting and unique is for the DISTINCT clause)
Filter on a document value at 3 nesting levels:
client request timeouts (30 seconds xhr timeout)
Planning time: 0.389 ms
Execution time: 1682.812 ms
Planning time: 0.860 ms
Execution time: 237.822 ms
select
ad.*,
ai.r as rn,
m.l,
m.a
from
(select
coalesce(l.h, atg.h) as h,
coalesce(l.lCount, 0) as l,
coalesce(atg.ac, 0) as ac
from
(select
tgb.h,
count(*) as ac
from
(select distinct
d.h,
baci[1] as bi,
baci[2] as ci
from
a_d d,
jsonb_array_elements(d.dia->'cd') crc,
regexp_matches(crc->>'in','something(\d)/(\d)' ) baci
where
d.dia->>'m' = 'BLA-X2'
and crc->>'in' like 'something%'
and crc->>'in' like '%(10)%') as tgb,
(select
ahw.h as h,
b.key as bi,
c.key as ci,
count(*)
from
a_h ahw,
jsonb_each(ahw.hw->'boards') b,
jsonb_each(b.value->'cards') c,
jsonb_each(c.value->'ports') p
where
ahw.hw->>'m' = 'BLA-X2'
and jsonb_array_length(p.value->'s') > 0
group by h, bi, ci) as ac
where
tgb.h = ac.h
and tgb.bi = ac.bi
and tgb.ci = ac.ci
group by
tgb.h) as atg
full outer join
(select
d.h,
sum(cast(i->>'value' as integer)) as lCount
from a_d d, jsonb_array_elements(d.dia->'ld'->'items') i
where
d.dia->>'m' = 'BLA-X2'
and i->>'in' = 'LSOMETHING'
and i->>'value' SIMILAR TO '[0-9]+'
group by
d.h) as l
on
l.h = atg.h
where
l.lCount is null
or atg.ac is null
or l.lCount != atg.ac) as m
inner join a_d ad on ad.h = m.h
inner join a a on m.h = a.h
inner join (
select
string_agg(ring,', ') as rings,
a as h
from a_i_r ai
group by a) as ai on m.h = ai.h;
WITH somethings_count_by_h
as (select ...
from
a_d d,
jsonb_array_elements(d.dia->'cd') crc,
regexp_matches(crc->>'in','something(\d)/(\d)' ) baci
where
...),
SELECT ...