Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. Default values are set to ensure that PostgreSQL runs everywhere, with the least resources it can consume and so that it doesn’t cause any vulnerabilities. It has default settings for all of the database parameters. It is primarily the responsibility of the database administrator or developer to tune PostgreSQL according to their system’s workload. In this blog, we will establish basic guidelines for setting PostgreSQL database parameters to improve database performance according to workload.
Bear in mind that while optimizing PostgreSQL server configuration improves performance, a database developer must also be diligent when writing queries for the application. If queries perform full table scans where an index could be used or perform heavy joins or expensive aggregate operations, then the system can still perform poorly even if the database parameters are tuned. It is important to pay attention to performance when writing database queries.
Nevertheless, database parameters are very important too, so let’s take a look at the eight that have the greatest potential to improve performance , But before that it is important to understand the memory architecture basic of postgres :
Memory in PostgreSQL can be classified into two categories:
- Local Memory area: It is allocated by each backend process for its own use.
- Shared memory area: It is used by all processes of a PostgreSQL server.
So Now i will explain tunable parameters and what should be the value of these tunable parameters : .
PostgreSQL uses its own buffer and also uses kernel buffered IO. That means data is stored in memory twice, first in PostgreSQL buffer and then kernel buffer. Unlike other databases, PostgreSQL does not provide direct IO. This is called double buffering. The PostgreSQL buffer is called shared_buffer which is the most effective tunable parameter for most operating systems. This parameter sets how much dedicated memory will be used by PostgreSQL for cache.
The default value of shared_buffer is set very low and you will not get much benefit from that. It’s low because certain machines and operating systems do not support higher values. But in most modern machines, you need to increase this value for optimal performance.
The recommended value is 25% of your total machine RAM. You should try some lower and higher values because in some cases we achieve good performance with a setting over 25%. The configuration really depends on your machine and the working data set. If your working set of data can easily fit into your RAM, then you might want to increase the shared_buffer value to contain your entire database, so that the whole working set of data can reside in cache. That said, you obviously do not want to reserve all RAM for PostgreSQL.
In production environments, it is observed that a large value for shared_buffer gives really good performance, though you should always benchmark to find the right balance.
Alternatively, while a larger shared_buffers value can increase performance in ‘read heavy’ use cases, having a large shared_buffer value can be detrimental for ‘write heavy’ use cases, as the entire contents of shared_buffers must be processed during writes.
Please note that the database server needs to be restarted after this change.
testdb=# SHOW shared_buffers;
Write-Ahead Logging (WAL) is a standard method for ensuring integrity of data. Much like in the shared_buffers setting, PostgreSQL writes WAL records into buffers and then these buffers are flushed to disk.
The default size of the buffer is set by the wal_buffers setting- initially at 16MB. If the system being tuned has a large number of concurrent connections, then a higher value for wal_buffers can provide better performance.
effective_cache_size has the reputation of being a confusing PostgreSQL settings, and as such, many times the setting is left to the default value.
The effective_cache_size value provides a ‘rough estimate’ of the number of how much memory is available for disk caching by the operating system and within the database itself, after taking into account what’s used by the OS itself and other applications.
This value is used only by the PostgreSQL query planner to figure out whether plans it’s considering would be expected to fit in RAM or not. As such, it’s a bit of a fuzzy number to define for general use cases.
A conservative value for effective_cache_size would be ½(50%) of the total memory available on the system. Most commonly, the value is set to 75% of the total system memory on a dedicated DB server, but can vary depending on the specific discrete needs on a particular server workload.
If the value for effective_cache_size is too low, then the query planner may decide not to use some indexes, even if they would help greatly increase query speed.
So Conclusively , General recommendation for effective_cache_size is as follows.
- Set the value to the amount of file system cache available. On UNIX/Linux like systems, add the free+cached numbers from free or top commands to get an estimate
- If you don’t know, set the value to the 50% or 75% of total system memory specially if dedicated DB server
This configuration is used for complex sorting. If you have to do complex sorting then increase the value of work_mem for good results. In-memory sorts are much faster than sorts spilling to disk. Setting a very high value can cause a memory bottleneck for your deployment environment because this parameter is per user sort operation. Therefore, if you have many users trying to execute sort operations, then the system will allocate work_mem * total sort operations for all users. Setting this parameter globally can cause very high memory usage. So it is highly recommended to modify this at the session level.
|12345678910||testdb=# SET work_mem TO “2MB”;testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b; QUERY PLAN ———————————————————————————–Gather Merge (cost=509181.84..1706542.14 rows=10000116 width=24) Workers Planned: 4 -> Sort (cost=508181.79..514431.86 rows=2500029 width=24) Sort Key: b -> Parallel Seq Scan on bar (cost=0.00..88695.29 rows=2500029 width=24)(5 rows)|
The initial query’s sort node has an estimated cost of 514431.86. Cost is an arbitrary unit of computation. For the above query, we have a work_mem of only 2MB. For testing purposes, let’s increase this to 256MB and see if there is any impact on cost.
|123456789||testdb=# SET work_mem TO “256MB”;testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b; QUERY PLAN ———————————————————————————–Gather Merge (cost=355367.34..1552727.64 rows=10000116 width=24) Workers Planned: 4 -> Sort (cost=354367.29..360617.36 rows=2500029 width=24) Sort Key: b -> Parallel Seq Scan on bar (cost=0.00..88695.29 rows=2500029 width=24)|
The query cost is reduced to 360617.36 from 514431.86 — a 30% reduction.
So Conclusively , Setting the value Higher alway results beer sorting and hashing , But setting it in local queries is always recommended . Set value high in queries where you expect high sorting otherwise low global value is good .
maintenance_work_mem is a memory setting used for maintenance tasks. The default value is 64MB. Setting a large value helps in tasks like VACUUM, RESTORE, CREATE INDEX, ADD FOREIGN KEY and ALTER TABLE.
It is necessary to remember that when autovacuum runs, up to autovacuum_max_workers times this memory may be allocated, so be careful not to set the default value too high.
The default value of maintenance_work_mem = 64MB.
General recommendation to set maintenance_work_mem is as follows.
- Set the value 10% of system memory, up to 1GB
- Maybe you can set it even higher if you are having VACUUM problems
We can also temporarily increase this memory while creating indexes or at the time of dump restores or while performing full vacuums .
This is used to enforce that commit will wait for WAL to be written on disk before returning a success status to the client. This is a trade-off between performance and reliability. If your application is designed such that performance is more important than the reliability, then turn off synchronous_commit. This means that there will be a time gap between the success status and a guaranteed write to disk. In the case of a server crash, data might be lost even though the client received a success message on commit. In this case, a transaction commits very quickly because it will not wait for a WAL file to be flushed, but reliability is compromised.
This parameter sets the maximum number of temporary buffers used by each database session. The session local buffers are used only for access to temporary tables. The setting of this parameter can be changed within individual sessions but only before the first use of temporary tables within the session.
PostgreSQL database utilizes this memory area for holding the temporary tables of each session, these will be cleared when the connection is closed.
The default value of temp_buffer = 8MB.
There are more parameters that can be tuned to gain better performance but those have less impact than the ones highlighted here. In the end, we must always keep in mind that not all parameters are relevant for all applications types. Some applications perform better by tuning a parameter and some don’t. Tuning PostgreSQL Database Parameters must be done for the specific needs of an application and the OS it runs on.
Also , Performance tuning does not only depends on postgres configuration parameters , there are many system parameters also on which postgresql performance depends So, in my next blog i will explain some system parameters which can affect postgresql performance .