Understanding Postgresql Indexes for Beginners – Part 2 (BTree MultiColumn )

This post is in continuation of Understanding Postgresql Indexes Series – Link for previous blog.

In last blog we understood about B-tree indexes and its use cases. Now we will look into Multi Column Postgresql Indexes. Lets Start :

Multi Column Index Btree :

In this we can include multiple columns into a single index eg: you have a student table with columns class_id , roll_no . With Multicolumn index we will be able to create a index (class_id,roll_no) rather than (class_id) or (roll_no).

The benefits of creating a multi column index is when we want to filter based on multiple column it works much faster. Understand it like for every value of class_id there is another btree of roll_no, now when you query like for class_id = ‘CLASS 8’ and roll_no > 10 then it will first go to ‘CLASS 8’ in this index and then from there there is a sub Btree index on roll_no which helps us to go to all roll_no > 8 very fast.

One of the important thing here is that it works best when we use ‘equal to’ constraint of the left most columns. eg: in our case our query was:

where class_id='CLASS 8' and roll_no > 10

Now in this case in our index :

create index cl_ro_idx on student(class_id,roll_no) 

In the index the left most column is class_id and we have put a equal to condition which is correct . Now lets say we have a index:

create index cl_ro_mar on student(class_id,roll_no,marks)

And we create a query which is like :

where class_id='CLASS_8' and roll_no > 10 and marks < 100

In this case lets see what happens , first the index is traversed for looking class_id = ‘CLASS 8’ same as last , not now we said roll_no > 10 for this all nodes for which roll_no was greater than 10 was traversed and also for every roll_no node there will be a tree of mark so all those trees are also traversed.

In generally it is preferred we traverse when left most variables are equal to types and last one is > or <

How to use?

Lets Create a Table:

create table student as select s as rollno, MOD(s,2)::text as class_id , s*random() as marks   from generate_Series(1,1000000) s;

Now lets create a index on class_id, roll_no,marks:

create index c_r_m_idx on student(class_id,rollno,marks);

Case 1 – Where all left indexes are equal to

explain analyze select * from student where class_id ='0' and rollno='89' and marks < 1;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using c_r_m_idx on student  (cost=0.42..8.45 rows=1 width=14) (actual time=0.081..0.082 rows=0 loops=1)
   Index Cond: ((class_id = '0'::text) AND (rollno = 89) AND (marks < '1'::double precision))
   Heap Fetches: 0
 Planning time: 0.502 ms
 Execution time: 0.123 ms

Case 1 – Where we do not put equal to on left indexes

saarthi=# explain analyze select * from student where class_id ='0' and rollno>89 and marks < 1;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13702.57 rows=49 width=14) (actual time=3.115..56.885 rows=5 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on student  (cost=0.00..12697.67 rows=20 width=14) (actual time=23.998..45.041 rows=2 loops=3)
         Filter: ((rollno > 89) AND (marks < '1'::double precision) AND (class_id = '0'::text))
         Rows Removed by Filter: 333332
 Planning time: 0.201 ms
 Execution time: 56.927 ms

Here you would see that in case where we are not putting equal to conditions we are getting Sequential Scans.

So be very careful on what type of queries you do.

Next Blog we will work on BRIN indexes and when to use Brin vs Btree.

Stay Tuned. Please Subscribe via email.

Understanding Postgresql Indexes For Beginners- Part -1 (BTree)

In this series we will understand postgresql indexes and also when to use them with sample cases.

Also we will be comparing certain indexes to understand which one to use when.

Let’s Begin:

B-Tree

The most common , highly used index on Postgresql or in any Sql DB.

What btree does is create a self balancing tree (with multiple child nodes not 2) in which the leaf nodes point to the exact row location.

Also the Btree all leaf nodes pointers point to adjacent node using doubly linked list. (this primarily help in getting the sorted data in very fast as we just reach to first element and then traverse further will see this in action).

Lets see a diagram to understand: here the leaf nodes are marked in white and they points to each other and contain row address and you can see they are in sorted order.

How to use?

First create a table with dummy data to understand, here i am creating 1 million rows:

create table student as select s as rollno, md5(random()::text) as name  from generate_Series(1,1000000) s;

Use Case 1 : Query for equal

Now lets query this to understand Btree benefits:

explain analyze select * from student where rollno=9090;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..15375.79 rows=5292 width=36) (actual time=10.074..98.045 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on student  (cost=0.00..13846.59 rows=2205 width=36) (actual time=54.934..83.723 rows=0 loops=3)
         Filter: (rollno = 9090)
         Rows Removed by Filter: 333333
 Planning time: 0.197 ms
 Execution time: 98.084 ms
(8 rows)

Now create index:

create index roll_idx on student(rollno);

Lets try again:

explain analyze select * from student where rollno=9090;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Index Scan using roll_idx on student  (cost=0.42..8.44 rows=1 width=37) (actual time=0.106..0.108 rows=1 loops=1)
   Index Cond: (rollno = 9090)
 Planning time: 0.426 ms
 Execution time: 0.154 ms

Now you can see that total time taken after index is ~500 times faster than normal as in normal case it was doing a parallel scan(in older version of postgres it was also sequential scan).

Use Case 2: Query for range

explain analyze select * from student where rollno > 9090 and rollno < 10008;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Index Scan using roll_idx on student  (cost=0.42..41.52 rows=905 width=37) (actual time=0.017..0.605 rows=917 loops=1)
   Index Cond: ((rollno > 9090) AND (rollno < 10008))
 Planning time: 0.215 ms
 Execution time: 0.715 ms
(4 rows)

Now you could see this works well in case of range.

No no it is not checking for all the range value one by one , it is using the leaf property of pointing to each other in sorted order.

Use Case in case of string:

in case of string first create index and check :

create index name_idx on student(name);
explain analyze select * from student  where name = 'b170e15823193100056e09725aec94c4';
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Index Scan using name_idx on student  (cost=0.42..8.44 rows=1 width=37) (actual time=0.115..0.116 rows=0 loops=1)
   Index Cond: (name = 'b170e15823193100056e09725aec94c4'::text)
 Planning time: 0.390 ms
 Execution time: 0.151 ms

See this works perfectly well.

When to use?

  • When you are querying a table frequently on this column by filtering
  • If you want data sorted also in that order
  • Date , Integer , String are good candidates for this
  • DO NOT USE if you are searching ilike or like type of queries. There are diff. indexes for that(in next blogs)
  • DO NOT JUST CREATE A INDEX ON EACH COLUMN . This has a huge overhead for postgres (will discuss in series)

Next – Multi Column B-Tree

please subscribe for more such indepth blogs.