In last blog we learned about Gin indexes. By now we have covered the basic indexes we use in Postgresql.
Now lets look at one of the ways by which we can optimize our indexes for business specific use case.
Partial Indexing means that we can create index on some subset of the actual rows of the table.
This type of indexing is very very useful in cases where queries are required on the subset very often.
Lets take a example:
We have a ticketing table(subject text, status varchar) which contains subject of tickets and status whether the ticket is OPEN or CLOSED. Now lets say in our case we primarily search on open tickets only and in general CLOSED Tickets would be increasing constantly.
Lets assume on our ticket table size – 100 million rows , out of which 5 million are OPEN and 95 million are closed.
Now if we create a index on entire table (100 million rows) our index would also be large and every time if we want to search on OPEN ticket only , still we are loading that huge index.
Now with Partial index we can create Gin (text search) only on OPEN ticket rows our index would only work on OPEN Tickets (only 5 million rows), in such case our index would be very small and efficient and fast to load in memory.
How to use
lets create table for tickets only:
create table tickets(subject tsvector , status varchar); INSERT INTO tickets ( subject , status) SELECT md5(random()::text)::tsvector,CASE WHEN g < 500000 THEN 'OPEN' ELSE 'CLOSED' END FROM generate_series(1,8000000) as g;
Now lets create a complete index and partial index:
create index t_s_idx on tickets using gin(subject);
Partial Index :
create index t_s_open_idx on tickets using gin(subject) where status='OPEN'; create index t_s_open_idx on tickets using gin(subject) where <CONDITION EXPRESSION>
Size diff. between complete and partial
\di+ t_s_open_idx List of relations Schema | Name | Type | Owner | Table | Size | Description --------+--------------+-------+----------+---------+-------+------------- public | t_s_open_idx | index | postgres | tickets | 40 MB | (1 row) saarthi=# \di+ t_s_idx List of relations Schema | Name | Type | Owner | Table | Size | Description --------+---------+-------+----------+---------+--------+------------- public | t_s_idx | index | postgres | tickets | 662 MB | (1 row)
It is clearly visible that index size is 662 MB vs 40 MB and in general as OPEN tickets will not be constantly increasing this index would be of same size and mostly can remain in memory.
When Partial index would work
Partial Index would work only when the query also contain the search condition which is present while creating the index.
Like in our ticketing case index has condition status =’OPEN’ and if the query also has the condition status=’OPEN’ only then it works.
Most index like Btree, Gin , BRIN can be used as a partial index. Using partial index is one of the most powerful tool to speed up your system.