In last blogs we understood basics of Btree and Multi Column Btree. In this blog we will see Postgresql BRIN Indexes.
BRIN Index
Block Range Index – It is a revolutionary idea first proposed in 2015 by Alvaro.
The most fundamental difference in this index is rather than storing the actual values in the index and point to rows. It actually stores the range information about the pages in which the rows are stored.
For eg: Lets say we have a student table with 1000 roll no. and these rows are stored in 100 pages and pages 1-10 are adjacent then 10-20 and then 20-30 and so on. If we use BTree index it would create a tree of 1000 roll no values and no. of nodes say for eg in this tree would be 1000. But what BRIN would do is it will store the max value of roll no and min values of roll no for page ranges (lets say page range is of 10 pages) so BRIN index would store 1 values(min and max) for page range 1-20 then 1 values for 10-20 and so on effectively it would store only 10 values rather than 1000.
You see this huge diff. between the values stored in BRIN vs BTree. This marks for huge performance improvement for storing of indexes but i think you would have realized that this is a lossy index
Lossy What?
What we meant by lossy here, lets go by the example again as index is only storing roll no min and roll no max value in pages between 1-10 (which contains 100 roll no.) , now if you check for lest say roll no. – 10 and mix and max value for page 1-10 is 1 – 100 . Now as it is just min and max we are still not sure whether the values 10 exist in the pages or not. For this system needs to go to every row in pages 1-10 and check whether roll no exist or not. This is what we meant as lossy , means index is not confirmng whether the value exist or not.
When to use BRIN
If your data is such that which mostly insert only like logs or history kind of stuff and your business requirement is to query recent logs or some date range logs then it is great to use BRIN as index as this would drastically reduce size of index, index maintenance , you would generally be searching for a range.
Lets take a real life example, create a table with 100000 logs data:
CREATE TABLE calllog (call_time timestamp not null, call_result text , no_of_participant integer);
INSERT INTO calllog ( call_time, no_of_participant, call_result) SELECT g, CURRENT_TIMESTAMP + ( g || 'minute' ) :: interval, random() * 10, md5(g::text) FROM generate_series(1,8000000) as g;
Now lets search for some cal logs between now() and now() – 5 hour:
explain analyze select * from calllog where call_time between now() and now() - interval '5 hour';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on calllog (cost=0.00..254764.68 rows=1 width=45) (actual time=1469.561..1469.561 rows=0 loops=1)
Filter: ((call_time >= now()) AND (call_time <= (now() - '05:00:00'::interval)))
Rows Removed by Filter: 8000000
Planning time: 0.280 ms
Execution time: 1469.596 ms
Lets see what happens is we use BTree here :
create index ct_idx on calllog(call_time);
explain analyze select * from calllog where call_time between now() and now() - interval '5 hour';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using ct_idx on calllog (cost=0.44..8.46 rows=1 width=45) (actual time=0.040..0.041 rows=0 loops=1)
Index Cond: ((call_time >= now()) AND (call_time <= (now() - '05:00:00'::interval)))
Planning time: 2.214 ms
Execution time: 0.086 ms
You see a huge diff. from 1469 ms to 0.086 ms.
Now lets see what brin would do:
create index c_t_brin_idx on calllog using brin(call_time) with (pages_per_range = 32,autosummarize = on);
drop index ct_idx ;
explain analyze select * from calllog where call_time between now() and now() - interval '5 hour';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on calllog (cost=45.02..11385.57 rows=1 width=45) (actual time=2.259..2.260 rows=0 loops=1)
Recheck Cond: ((call_time >= now()) AND (call_time <= (now() - '05:00:00'::interval)))
-> Bitmap Index Scan on c_t_brin_idx (cost=0.00..45.02 rows=3423 width=0) (actual time=2.256..2.256 rows=0 loops=1)
Index Cond: ((call_time >= now()) AND (call_time <= (now() - '05:00:00'::interval)))
Planning time: 0.202 ms
Execution time: 2.315 ms
You see 2.315 ms still very less than 1469 ms but greater than 0.086 ms.
Why BRIN over BTree when Btree is speedy??
- Size of BRIN vs BTree
\di+ c_t_brin_idx
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+---------+--------+-------------
public | c_t_brin_idx | index | postgres | calllog | 120 kB |
\di+ ct_idx
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------+-------+----------+---------+--------+-------------
public | ct_idx | index | postgres | calllog | 171 MB |
You see there is huge difference in size and when there are lot of tables and indexes in your system such that you cannot have this 172MB in memory, those cases BRIN becomes very powerful
2. Index Maintenance – BTree indexes are costly to maintain as they would changes in every DML operation and has to be done in same transaction. While BRIN is put this offloading to vaccuming.
By Now i guess you understood lots of power of BRIN indexes and also the benefits over BTree.
You can ask your questions in the comment section we will try best to answer in your specific cases:
Next we will see parameters to optimize BRIN and then move to Gin Indexes.
Please subscribe for such indepth blogs.