Understanding Postgresql Indexes For Beginners- Part -1 (BTree)

In this series we will understand postgresql indexes and also when to use them with sample cases.

Also we will be comparing certain indexes to understand which one to use when.

Let’s Begin:

B-Tree

The most common , highly used index on Postgresql or in any Sql DB.

What btree does is create a self balancing tree (with multiple child nodes not 2) in which the leaf nodes point to the exact row location.

Also the Btree all leaf nodes pointers point to adjacent node using doubly linked list. (this primarily help in getting the sorted data in very fast as we just reach to first element and then traverse further will see this in action).

Lets see a diagram to understand: here the leaf nodes are marked in white and they points to each other and contain row address and you can see they are in sorted order.

How to use?

First create a table with dummy data to understand, here i am creating 1 million rows:

create table student as select s as rollno, md5(random()::text) as name  from generate_Series(1,1000000) s;

Use Case 1 : Query for equal

Now lets query this to understand Btree benefits:

explain analyze select * from student where rollno=9090;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..15375.79 rows=5292 width=36) (actual time=10.074..98.045 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on student  (cost=0.00..13846.59 rows=2205 width=36) (actual time=54.934..83.723 rows=0 loops=3)
         Filter: (rollno = 9090)
         Rows Removed by Filter: 333333
 Planning time: 0.197 ms
 Execution time: 98.084 ms
(8 rows)

Now create index:

create index roll_idx on student(rollno);

Lets try again:

explain analyze select * from student where rollno=9090;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Index Scan using roll_idx on student  (cost=0.42..8.44 rows=1 width=37) (actual time=0.106..0.108 rows=1 loops=1)
   Index Cond: (rollno = 9090)
 Planning time: 0.426 ms
 Execution time: 0.154 ms

Now you can see that total time taken after index is ~500 times faster than normal as in normal case it was doing a parallel scan(in older version of postgres it was also sequential scan).

Use Case 2: Query for range

explain analyze select * from student where rollno > 9090 and rollno < 10008;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Index Scan using roll_idx on student  (cost=0.42..41.52 rows=905 width=37) (actual time=0.017..0.605 rows=917 loops=1)
   Index Cond: ((rollno > 9090) AND (rollno < 10008))
 Planning time: 0.215 ms
 Execution time: 0.715 ms
(4 rows)

Now you could see this works well in case of range.

No no it is not checking for all the range value one by one , it is using the leaf property of pointing to each other in sorted order.

Use Case in case of string:

in case of string first create index and check :

create index name_idx on student(name);
explain analyze select * from student  where name = 'b170e15823193100056e09725aec94c4';
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Index Scan using name_idx on student  (cost=0.42..8.44 rows=1 width=37) (actual time=0.115..0.116 rows=0 loops=1)
   Index Cond: (name = 'b170e15823193100056e09725aec94c4'::text)
 Planning time: 0.390 ms
 Execution time: 0.151 ms

See this works perfectly well.

When to use?

  • When you are querying a table frequently on this column by filtering
  • If you want data sorted also in that order
  • Date , Integer , String are good candidates for this
  • DO NOT USE if you are searching ilike or like type of queries. There are diff. indexes for that(in next blogs)
  • DO NOT JUST CREATE A INDEX ON EACH COLUMN . This has a huge overhead for postgres (will discuss in series)

Next – Multi Column B-Tree

please subscribe for more such indepth blogs.

Advertisement