Understanding Postgresql Indexes for Beginners – Part 2 (BTree MultiColumn )

This post is in continuation of Understanding Postgresql Indexes Series – Link for previous blog.

In last blog we understood about B-tree indexes and its use cases. Now we will look into Multi Column Postgresql Indexes. Lets Start :

Multi Column Index Btree :

In this we can include multiple columns into a single index eg: you have a student table with columns class_id , roll_no . With Multicolumn index we will be able to create a index (class_id,roll_no) rather than (class_id) or (roll_no).

The benefits of creating a multi column index is when we want to filter based on multiple column it works much faster. Understand it like for every value of class_id there is another btree of roll_no, now when you query like for class_id = ‘CLASS 8’ and roll_no > 10 then it will first go to ‘CLASS 8’ in this index and then from there there is a sub Btree index on roll_no which helps us to go to all roll_no > 8 very fast.

One of the important thing here is that it works best when we use ‘equal to’ constraint of the left most columns. eg: in our case our query was:

where class_id='CLASS 8' and roll_no > 10

Now in this case in our index :

create index cl_ro_idx on student(class_id,roll_no) 

In the index the left most column is class_id and we have put a equal to condition which is correct . Now lets say we have a index:

create index cl_ro_mar on student(class_id,roll_no,marks)

And we create a query which is like :

where class_id='CLASS_8' and roll_no > 10 and marks < 100

In this case lets see what happens , first the index is traversed for looking class_id = ‘CLASS 8’ same as last , not now we said roll_no > 10 for this all nodes for which roll_no was greater than 10 was traversed and also for every roll_no node there will be a tree of mark so all those trees are also traversed.

In generally it is preferred we traverse when left most variables are equal to types and last one is > or <

How to use?

Lets Create a Table:

create table student as select s as rollno, MOD(s,2)::text as class_id , s*random() as marks   from generate_Series(1,1000000) s;

Now lets create a index on class_id, roll_no,marks:

create index c_r_m_idx on student(class_id,rollno,marks);

Case 1 – Where all left indexes are equal to

explain analyze select * from student where class_id ='0' and rollno='89' and marks < 1;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using c_r_m_idx on student  (cost=0.42..8.45 rows=1 width=14) (actual time=0.081..0.082 rows=0 loops=1)
   Index Cond: ((class_id = '0'::text) AND (rollno = 89) AND (marks < '1'::double precision))
   Heap Fetches: 0
 Planning time: 0.502 ms
 Execution time: 0.123 ms

Case 1 – Where we do not put equal to on left indexes

saarthi=# explain analyze select * from student where class_id ='0' and rollno>89 and marks < 1;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13702.57 rows=49 width=14) (actual time=3.115..56.885 rows=5 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on student  (cost=0.00..12697.67 rows=20 width=14) (actual time=23.998..45.041 rows=2 loops=3)
         Filter: ((rollno > 89) AND (marks < '1'::double precision) AND (class_id = '0'::text))
         Rows Removed by Filter: 333332
 Planning time: 0.201 ms
 Execution time: 56.927 ms

Here you would see that in case where we are not putting equal to conditions we are getting Sequential Scans.

So be very careful on what type of queries you do.

Next Blog we will work on BRIN indexes and when to use Brin vs Btree.

Stay Tuned. Please Subscribe via email.

Understand the intuition for databases

This article is for beginners who are starting their journey in software development and database looks like a jargon to them .

In this we will cover the intuition for databases , the lingo and types all will be covered in future posts.

Let’s begin by understanding through an example. Lets define our problem:

  • We are a online site which allows users to register on our website and registered users can see other registered users and send messages to them.
  • Now lets assume we are building this in an era where there are no database software available. So what will we do.

First understand our business requirement:

  • When user come on our site and register with user id and password , we should store this somewhere on a persistent storage (hard disk , available after restart).
  • We should be able to retrieve this user info from our persistent storage to validate when somebody comes for login with user id and password to check whether its correct or not.

Naive Solution 1 :

We will write user and their password in a file called "users" in the system in a sequential manner , lets look at the file:

Simple right, now if somebody want to read the userid and password , we just read the entire file check for each row , break it by semicolon ';' and check whether user id given matches the first part of the line (after breaking through semicolon) and second part matches the password provided and one.

But hey there are certain problems in the above structure (i know you have already guessed):

  • It will be very slow if user count is large (as one need to check all rows till the point user matches)
  • if somebody put semicolon in their user name( ahhh easy we wont allow it , but still there is an extra check)

So lets discuss again :

What we will do we will create multiple files for each alphabet ie user_a for user name starting with a , user_b for user name starting b and so on. this way we will reduce the size by a significant amount.

Hey but still if for a there are thousands of users still it will be a problem. Hmmm still stuck.

Lets think again:

Hey we will create a Binary Search tree type structure in File where each node points to the row on the user file (a pointer to exact location of that user in that file so that we can read that file directly), i am just showing a very very basic version for understanding down below:

Now when we do this , this solves problem to some level.

But there are so many other use cases which you will come through:

  1. delete some user – now the index is also needs to be managed
  2. backup of these files if hard disk corrupt
  3. if now i wants to store the user last name as well in this
  4. ……

There are very very few which i have written , the main point here is that all these things will be require by all companies , so database software try to solve the data persistence and query problems , so companies can focus on their business.

I think by this you got the basic intuition about the databases. Done – now go read about various type of databases which are available and what all purpose they do.

Akka (Actor) – Hello World

Theory

In this our main aim is to a very basic hello world in Akka Actor and also realize the basic diff. between on how the actor arch. is diff.

Let’s write the code first

First add the following in the maven file:

<dependency>
	        <groupId>com.typesafe.akka</groupId>
	        <artifactId>akka-actor_2.11</artifactId>
	        <version>2.4.4</version>
</dependency>

Now write the main HelloWorldActor.java file:

import akka.actor.AbstractActor;
import akka.actor.Props;
import akka.japi.pf.ReceiveBuilder;

public class HelloWorldActor extends AbstractActor {
public HelloWorldActor() {
    receive(ReceiveBuilder
        .match(PrintMessage.class, this::sayHello)
        .build());
}

private void sayHello(final PrintMessage message) {
    System.out.println(message.getMessage());
}


public static Props props() {
    return Props.create(HelloWorldActor.class);
}

public static class PrintMessage{
    private String message;
    public PrintMessage(String message) {
        this.message = message;
    }

    public String getMessage() {
        return this.message;
    }
}
}

and in HelloWorldActorMain.java file write :


import akka.actor.ActorRef;
import akka.actor.ActorSystem;

public class HelloWorldActorMain {
	
	public static void main(String[] args) {
		ActorSystem as = ActorSystem.create("hello-world");
		ActorRef helloWorldActor = as.actorOf(HelloWorldActor.props());
		helloWorldActor.tell(new HelloWorldActor.PrintMessage("Hello World"), null);
	}
}

Let’s Understand Actor Perspective

  • We created two files here one is
    • HelloWorldActor – this represents the actor class
      • on constructur we create the behaviour for the actor ie for eg: when actor recieves an Object of PrintMessage class as a Message it will call the sayHello function
      • every actor must describe its behavior on what all messages it can process.
    • HelloWorldActorMain – this is for the main function
  • In main we first created a actor system which we generally create single only per JVM
  • Then we created a ActorRef (it does not represent the HelloWorldActor class object). This is a generic object of ActorRef which takes your messages , queues it and process the actual message on HelloWorldActor object.
  • Now sent a message to Actor – saying “Hello World” in PrintMessage object form.
  • and internally actor call a function sayHello and passing the PrintMessage object as an argument
  • we must realize here that normally we directly call function sayHello of a class and here we are calling function of ActorRef of HelloWorldActor to pass a Message
  • Also one very important thing is diff. here which is normally when we call sayHello of a class same thread go and does the work, but here diff./same thread actually does the work of processing the message
  • When we call the sayHello function our message go into a Queue type object and some other thread read and works on this.

Summary

In this hello world – we write a basic Hello World in Akka (Actor based System)

Also we realized how Actor based development is diff. from the normal function call between classes.