PostgreSQL Performance Tuning: Indexes, VACUUM, Pooling | HostKonek

PostgreSQL Performance Starts With Understanding the Defaults

PostgreSQL ships with conservative default settings that are designed to run on modest hardware without crashing. These defaults are safe, but they are not optimized for production workloads. A fresh PostgreSQL installation on a VPS with 8 GB of RAM will use only a fraction of the available resources unless you tune the configuration. The good news is that a handful of well-chosen settings produce the majority of the performance improvement.

This guide covers the three areas that matter most for hosting customers running PostgreSQL: indexing strategy, autovacuum tuning, and connection pooling. Get these right, and your database handles real-world traffic without breaking a sweat.

Indexing: The Biggest Performance Lever

An index is a data structure that allows PostgreSQL to find rows quickly without scanning the entire table. Without an index, every query on a large table triggers a sequential scan — reading every row to find matches. With the right index, the same query reads a tiny fraction of the data. The performance difference can be orders of magnitude.

When to Add an Index

The simplest rule: add an index on any column that appears frequently in WHERE clauses, JOIN conditions, or ORDER BY clauses. If your application runs a query like SELECT * FROM orders WHERE customer_id = 123 regularly, an index on customer_id ensures that query uses an index scan instead of a sequential scan.

Use EXPLAIN ANALYZE before and after adding an index to verify the impact. This command shows the query plan and actual execution time. If you see "Seq Scan" on a large table and the query is slow, an index is likely the fix.

Types of Indexes

B-tree (default): The most common index type. Works well for equality and range queries. This is what you get when you run CREATE INDEX without specifying a type.
GIN (Generalized Inverted Index): Excellent for full-text search, JSONB columns, and array columns. If you query JSONB fields with containment operators (@>) or use full-text search, GIN indexes are essential.
GiST (Generalized Search Tree): Used for geometric data, range types, and full-text search. If you use PostGIS or range queries, GiST indexes are relevant.
Partial indexes: Indexes that cover only a subset of rows, defined by a WHERE clause. For example, CREATE INDEX ON orders (status) WHERE status = 'pending' creates a small, focused index that speeds up queries for pending orders without bloating the index with completed orders.
Composite indexes: Indexes on multiple columns. Useful when queries filter on combinations of columns. The column order matters — put the most selective column first.

Index Anti-Patterns

More indexes are not always better. Each index consumes disk space and adds overhead to INSERT, UPDATE, and DELETE operations because the index must be updated alongside the table. Do not index every column speculatively. Index based on actual query patterns, and remove unused indexes periodically. Query pg_stat_user_indexes to find indexes that are never used.

Autovacuum: Keeping the Database Healthy

PostgreSQL uses MVCC (Multi-Version Concurrency Control) to handle concurrent transactions. When a row is updated or deleted, the old version is not immediately removed — it is marked as dead. This allows other transactions that started earlier to still see the old version. The accumulation of dead rows is called "bloat," and it degrades performance over time as PostgreSQL must scan through dead rows to find live ones.

What VACUUM Does

VACUUM reclaims space occupied by dead rows and makes it available for reuse. It also updates the visibility map and statistics that the query planner uses to make decisions. Without regular vacuuming, tables grow bloated, queries slow down, and eventually you hit transaction ID wraparound — a critical situation that forces PostgreSQL to stop accepting writes until a full vacuum completes.

Tuning Autovacuum

PostgreSQL's autovacuum daemon runs automatically, but the default settings are too conservative for busy tables. Key parameters to adjust:

autovacuum_vacuum_scale_factor: Default is 0.2 (vacuum when 20% of rows are dead). For large tables, this means millions of dead rows accumulate before a vacuum runs. Lower this to 0.05 or even 0.01 for high-write tables.
autovacuum_vacuum_threshold: The minimum number of dead rows before a vacuum is triggered. The default (50) is fine for small tables but may need raising for tables with frequent tiny updates to avoid unnecessary vacuums.
autovacuum_vacuum_cost_delay: Controls how aggressively autovacuum works. Lower values (e.g., 2ms instead of the default 20ms) make autovacuum more aggressive, which helps it keep up with high-write workloads.
autovacuum_max_workers: Default is 3. If you have many tables, increase this so more tables can be vacuumed concurrently.

You can also set autovacuum parameters per table using ALTER TABLE ... SET, which lets you tune aggressively for high-write tables without affecting low-write tables.

Connection Pooling with pgBouncer

Every PostgreSQL connection consumes memory (roughly 5-10 MB per connection). Applications that open connections carelessly — or frameworks that create a new connection per request — can exhaust PostgreSQL's connection limit and cause errors or severe performance degradation.

The Problem

Web applications often use more connections than they actually need at any given moment. A pool of 100 connections might only have 10 actively running queries, with the other 90 sitting idle. PostgreSQL does not distinguish between active and idle connections — each one consumes memory and counts against the limit.

The Solution: pgBouncer

pgBouncer sits between your application and PostgreSQL, managing a pool of database connections. When your application opens a "connection," it actually connects to pgBouncer, which assigns a real PostgreSQL connection only when a query is executed. When the query finishes, the connection is returned to the pool for reuse.

In transaction pooling mode (the most common for web applications), pgBouncer multiplexes many application connections onto a smaller number of PostgreSQL connections. An application with 200 connections can often be served by 20 to 30 real PostgreSQL connections, dramatically reducing memory usage and improving performance.

Deployment Tips

Run pgBouncer on the same server as PostgreSQL to minimize latency.
Set the PostgreSQL max_connections to the total number of connections pgBouncer will use, plus a few reserved for admin tasks.
Monitor pool utilization. If your application frequently waits for available connections, increase the pool size or optimize slow queries that hold connections too long.

Essential Configuration Parameters

Beyond indexing, vacuuming, and connection pooling, these PostgreSQL settings have the most impact on performance:

shared_buffers: Set to roughly 25% of total RAM. This is PostgreSQL's internal cache for frequently accessed data.
effective_cache_size: Set to roughly 50-75% of total RAM. This tells the query planner how much memory is available for caching (including OS-level cache), influencing its cost estimates.
work_mem: Memory allocated per operation for sorting and hashing. Start at 16-64 MB for VPS environments. Be careful — this is per operation, not per connection, so high values can consume memory quickly under concurrent load.
maintenance_work_mem: Memory for maintenance operations like VACUUM and CREATE INDEX. Set to 256 MB to 1 GB depending on available RAM. Higher values speed up these operations.

Monitoring Performance

Tuning without monitoring is guesswork. Enable pg_stat_statements to track query execution statistics — total time, call count, and average time per query. This extension is the most valuable tool for identifying slow queries. Review the top queries by total time weekly and investigate any that stand out.

Monitor table bloat with queries against pg_stat_user_tables to see how many dead tuples each table has. Monitor connection counts to ensure you are not approaching limits. Monitor autovacuum activity to confirm it is keeping up with your write rate.

Putting It Together

PostgreSQL performance tuning is not about chasing benchmarks — it is about matching the database configuration to your actual workload. Start with proper indexes based on real query patterns. Tune autovacuum to keep tables clean. Add pgBouncer to manage connections efficiently. Adjust memory settings to use the resources your VPS provides. Then monitor, measure, and iterate. A well-tuned PostgreSQL instance on a modest VPS can handle far more traffic than a poorly tuned one on expensive hardware.

PostgreSQL Performance Tuning for Hosting: Indexing, VACUUM, and Connection Pooling