Understanding Postgresql Indexes for Beginners – Part 4 (BRIN Indexes Optimization)

In last blog we learned about BRIN indexes , in this we will go in little detail on optimizing BRIN indexes and parameters for its optimization.

As we learned that BRIN index basically store the summary information (min and max ) for a particular page range(preferrably adjacent) such that it becomes easy to find which pages to cross check for a particular filter.

In this we will understand the BRIN index update mechanism and optimize the page range parameter.

When BRIN INDEX is updated

  • Index Creation
    • When the index is created it will create summary for all the pages of the table.
  • Table Update
    • For all page which are already summarized if any entry added / updated / removed , the index is updated in the transaction only
    • For new pages which are not summarized
      • Manual Update
        • By default BRIN does not auto update for this one needs to run the function –
          • select brin_summarize_new_values('<index name>'::regclass);
      • Auto Update
        • On autovacuum system will add the summary info for those blocks as well, we need to create index with auto vacuum on like this
          • create index c_t_brin_idx on calllog using brin(call_time) with (pages_per_range = 32,autosummarize = on);

How to optimize BRIN Indexes

Primary parameter to optimize BRIN indexes is the pages_per_range , this balance between no. of pages which needs to be scanned (because of lossy nature) vs no. of index entry created and index maintenance needs.

This parameter needs to be optimized based on individual needs , we will try to provide some guiding rules here, but before that we should do some math here and calculate the number of pages and rows per page:

Calculate no. of pages :

select relpages from pg_class where relname = 'student';;
 relpages 
----------
     5406

Now the number of pages is 5406

Now lets calculate total records :

select count(*) from student;
  count  
---------
 1000000

Now we have total rows as 1 million , lets count no. of record per page:

select 1000000/5406 as row_per_page;
 row_per_page 
--------------
          184

Now we see that row per page is 184 in our case, this will be diff. in you case so before proceeding check in your table.

and default value is 128 in page per range — so minimum values our index need to filter for every query is : 184*128 = 23552

Now lets try to see the rules, also rules needs to be balanced for best performance as in databases everything depends on your needs:

  • Table should be Large – something > 5 millions
    • if less data then benefit of small size of index would be less and Btree would also be small.
  • Data should primarily be INSERT Only
    • If we are constantly updating the data and also the data is updated in such a way that single value occur in multiple page range (eg: page 1-10 raneg is 100 -550 and page 11-20 range is 250-800), in this case the system needs to query both range pages which will slow down the filter as pages increase
  • Data should be inserted in IN ORDER
    • If we are inserting data in such a way that the column in which we are creating index that single value occur in multiple page range (eg: page 1-10 raneg is 100 -550 and page 11-20 range is 250-800), in this case the system needs to query both range pages which will slow down the filter as pages increase
  • Search record are generally in large numbers > 1000
    • if we are selecting single value Btree will generally out perform the BRIN , but if we are selecting the no. of records in some 1000’s , BRIN will start to perform better at large count.

Al in all one needs to be very clear in the above points before deciding to use BRIN.

In next blog we will move to Gin Indexes.

Subscribe our blog for more indepth tutorials.

How and when and what to log ?

We as a developers understand the importance of coding but most of us do not know the right way to log.

While seeing others code you must have seen various patterns of logging:

  • Log everything
    • on start of any function call
    • on every function call
    • before every return statement
  • Don’t log anything
    • just don’t log
  • Log only Exceptions
    • Log only when exceptions occur

I think everybody must have seen all type of developers, lets understand when to log and also in diff. scenarios , but before that first see the purpose of logging:

Purpose of Logging

  • Provides insights about the flow of your code, what happen , what are the parameters which were passed
  • You could virtually run the entire flow of your code and understand the problems while debugging any unexpected behavior in your code.

Lets go scenario by scenario

In the following i want you see the reasoning to when to log that may help you to decide in your scenarios.

  • Http Service Call / Any third party
    • to debug any service hit , you will require the following:
      • parameters which are passed to it log using info, always check whether in your environment it is already done by framework library code (like in rest filters , tomcat logs) then do not log, generally its a good practice to handle this at framework level rather than everybody taking care of this
      • Now if you are writing pure functions (not using global variables, not modifying any variable provided) then mostly you would not require to log any thing in those functions(will be writing a blog on the pure functions), as you could drive most of the things from the input.
      • If any specific decision that will completely change the flow of the code , we should generally log the decision and reason in info
      • Log any exception which happens in to the system with type error
      • any parameter you are reading from system , generally log this with fine as you can cross check this , do not just log , not a hardline if some function is called limited number of times
      • calling another http service –
        • generally you can log the parameters here with info , it becomes easy to debug the parameters passed rather than looking into other service log if it is on other machine only and no central logging.
  • Timer or Scheduler Jobs
    • In case of timer first thing is , if timer is every some seconds log the time in which it started
    • rest more or less remain the same as in above
  • Service to Service(within same process, Spring Services) Calls
    • When service call to another service the calling service should not log generally assuming the responsibility is with the called service. rest remain same as above in http one
  • Any Http Connections
    • generally we should log the parameters passed to http with info , and also should check if some parameters not to print as they are internal and should not leak
  • Microservice to Microservice Communications
    • Assuming you would be using some trace id , if that Service logs are not available generally the request and response , keeping in mind do not print business critical parameters(like password of some sort).
  • DB Queries
    • before calling a query we should generally log with info the parameters we are passing
  • Now when to log Fatal
    • In code is something very unexpected which happens like db configuration not provided log with fatal or other api url is not configured but that should be without it use case will not work the log with fatal
  • Now when to log fine
    • Generally anytime when there are decisions taken in the code one can log on fine. The configurations read by the system we can log with fine, Some internal data structures created or modified or any lifecycle event like DB connection closed we can log such things with fine. But do not add lots and lots of log , it can kill your system.

Some of the cross cutting concerns we should know

  • Flow Id or Trace Id :
    • for any logs which are part of a single flow (ie a single thread doing it in sync ) there should be a flow id or trace id in all logs.
    • Even if the flow which are started from logically the same flow but then one async api is called (@Async in Spring) in that also one should have the same flow id passed with some additional info.
    • For distributed we you can check the link .
  • What to Log:
    • Use some trace id like user id , or object id for which a set of API or function call is done so that it becomes easy to figure out for which function was called

As we can’t cover all of them i think you the above examples help to choose the right mix.

Also please be sure when you log as lots of logging could kill your system. Do read the link.

Want to be a better programmer – Read , Read , Read – But How?

If one wants to become a better programmer one thing is for sure that one needs to understand others code.

For beginners when we start writing the code, when using libraries very basic like List , Set , we get stuck in basic things like:

  • What this class do?
  • How to use this class?

Now this comes because we are missing one important trick:

  • First associate a purpose for the class be reading its definition and by name.
  • Now if you have the purpose you will be able to automatically make out what the functions in this class should be
  • Same goes for the functions first associate a purpose.

Now if you start reading the code by understanding the purpose of the class, you will be able to understand and use that class.

Now the next thing you should do is when you understood the purpose is :

  • look at their internal implementations
  • understand the data structure or variables they have declared
  • try to reason about the purpose why they have declared like that
  • Any pors / cons – alternative implementation you can think of.

I bet if one starts doing this in initial part of their programming career, this would help them:

  • to reason lots about other libraries
  • understand new libraries faster
  • also while writing new libraries one will be able to choose right data structures.

Lets see a demo of the above theory:

We all know there is Collections in java which looks somethings like this:

Java Collection Structure

Now lets start first by assigning purpose and see how things follow automatically :

  • Collection Interface – it says one can have some objects inside me.
    • so the functions should be in this class
      • one function to add a object
      • one function to remove a object
  • List Interface – it says its a collection but has a ordering for objects means you get in the order you put
    • so the extra functions in this class should be
      • get on a particular position get(int i)
      • add on a particular position
  • Set Interface – it also says it is a collection but it contains only unique objects and does not care about ordering
    • so the functions in this class
      • one is contains to check whether this already exist as it will not add a object twice

You see by just understanding the class purpose we could easily make out the functions , their purpose.

I am leaving the next part to you guys on looking at the internal implementation. Will make a next blog for this.

Naming is the most important thing

There are only two hard things in Computer Science: cache invalidation and naming things.

This quote is one of my favorite programming quotes and we will discuss the reason why here

We will discuss the naming things part here.

In your entire career in coding what you will do most is reading others code or reading others libraries or new technologies and understanding them, solving bugs in the existing code. If you agree lets move forward.

Now a small story, there was a bug in our existing system and one of my fellow programmer was working on that. The bug was in the core libraries written years back and to add to his woes there was no documentation neither external and nor on code.

He was working on it for days with no result and we were discussing the problem and he said i was unable to understand the code, now i start debugging with him and the bug was solved in minutes (not to brag about my skills), and he said how you are able to go to the problem class and function so easily. I said its just intuition from the naming of the classes and function.

Here i understood its still not natural or important for people the naming convention.

Story part over.

Lets discuss that – what is this intuition thing all about . What i learnt in my coding career that before naming a class first associate a purpose with that class and name the class such that one can understand most part of the purpose with that name and then start adding function in that class and any function if not in alignment to that class purpose then that function should not be there and also through the function name one should be able to make out its implementation without looking at the code.

Now people who read lots of code will realize that’s how most of the good code are written and most of the libraries you will come around will follow this.

Now with the intuition you got from the name without reading the code one can make out the functions and so on. This thing happen in real life as well, if i say something a screen you know mostly what it does (if you think in right context).

Similarly if we start naming our classes or functions correctly this not just help us but help other programmers who will be working on it later.

So be very careful about when you name anything (class , function ) it should convey the purpose.

Also as a developer you should and must build the intution from naming and read code.

Start naming things correctly and read code with the right intuition – dont just jump into the code.