Skip to content
Go back

PostgreSQL Performance Tuning: A Comprehensive Guide

PostgreSQL is a powerful relational database, but out-of-the-box performance isn’t always optimal for your specific workload. This guide covers essential performance tuning techniques that can dramatically improve query response times and throughput.

Understanding PostgreSQL Performance

Before diving into optimizations, it’s crucial to understand where bottlenecks occur. PostgreSQL performance issues typically fall into three categories:

  1. Query inefficiency - Poorly written queries or missing indexes
  2. Configuration problems - Default settings don’t match your hardware
  3. Resource constraints - CPU, memory, or disk I/O limitations

Indexing Strategies

Indexes are the foundation of query performance. A well-designed index can reduce query time from seconds to milliseconds.

B-Tree Indexes: The Default Choice

B-tree indexes work well for most scenarios, especially equality and range queries:

-- Create a simple B-tree index
CREATE INDEX idx_users_email ON users(email);

-- B-tree indexes support these query patterns:
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users WHERE age BETWEEN 18 AND 65;

Composite Indexes: Order Matters

When indexing multiple columns, column order is critical:

-- This index is optimal for queries filtering by status first, then created_at
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

-- Efficient: Uses the index
SELECT * FROM orders WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days';

-- Less efficient: Only uses first column of index
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days';

Rule of thumb: Place the most selective column first, or the column used in WHERE clauses most frequently.

Partial Indexes: Index Only What Matters

Partial indexes are smaller and faster because they only index a subset of rows:

-- Only index active users
CREATE INDEX idx_active_users_email
ON users(email)
WHERE status = 'active';

-- This query uses the partial index efficiently
SELECT * FROM users WHERE email = 'user@example.com' AND status = 'active';

This is particularly useful for queries that always filter on a specific condition.

Covering Indexes: Include Additional Columns

PostgreSQL 11+ supports the INCLUDE clause to add non-key columns to an index:

CREATE INDEX idx_products_category_include
ON products(category)
INCLUDE (name, price);

-- This query can be answered entirely from the index (index-only scan)
SELECT name, price FROM products WHERE category = 'electronics';

This enables index-only scans, avoiding the need to access the table heap.

Query Optimization Techniques

Analyze Your Queries with EXPLAIN

The EXPLAIN command shows PostgreSQL’s query execution plan:

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name;

Key metrics to watch:

Avoid SELECT *

Only select columns you need:

-- Bad: Fetches unnecessary data
SELECT * FROM users WHERE id = 123;

-- Good: Only fetches required columns
SELECT id, name, email FROM users WHERE id = 123;

This reduces I/O and makes covering indexes possible.

Use JOINs Instead of Subqueries

Modern PostgreSQL optimizes JOINs better than correlated subqueries:

-- Slower: Correlated subquery
SELECT u.name,
       (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;

-- Faster: JOIN with GROUP BY
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

Batch Operations

Use batch inserts and updates instead of row-by-row operations:

-- Slow: Individual inserts
INSERT INTO logs (message) VALUES ('Log 1');
INSERT INTO logs (message) VALUES ('Log 2');
-- ... thousands more

-- Fast: Batch insert
INSERT INTO logs (message) VALUES
('Log 1'),
('Log 2'),
('Log 3')
-- ... thousands in one statement
-- Or use COPY for even better performance

Configuration Tuning

PostgreSQL’s default configuration is conservative. Tune these settings based on your hardware:

Memory Settings

-- postgresql.conf

-- Shared buffers: 25% of system RAM (up to 8-16GB)
shared_buffers = 4GB

-- Effective cache size: 50-75% of system RAM
effective_cache_size = 12GB

-- Work mem: RAM for sorts/hashes per operation
work_mem = 64MB

-- Maintenance work mem: For VACUUM, CREATE INDEX
maintenance_work_mem = 1GB

Checkpoint and WAL Settings

-- Reduce checkpoint frequency for better write performance
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

-- Increase WAL segment size
max_wal_size = 4GB
min_wal_size = 1GB

Connection Pooling

Instead of creating connections per request, use a connection pool:

-- Limit max connections
max_connections = 200

-- Use PgBouncer or similar pooler for better connection management

Application-side pooling (PgBouncer, connection pools in your app) is crucial for high-concurrency workloads.

Maintenance Operations

Regular maintenance prevents performance degradation:

VACUUM and ANALYZE

-- Manual vacuum (usually automatic via autovacuum)
VACUUM ANALYZE users;

-- For heavily updated tables, more aggressive settings:
ALTER TABLE users SET (autovacuum_vacuum_scale_factor = 0.05);
ALTER TABLE users SET (autovacuum_analyze_scale_factor = 0.05);

REINDEX for Index Bloat

Over time, indexes can become bloated:

-- Rebuild an index
REINDEX INDEX idx_users_email;

-- Rebuild all indexes on a table (locks table)
REINDEX TABLE users;

-- Concurrent reindex (PostgreSQL 12+, doesn't block writes)
REINDEX INDEX CONCURRENTLY idx_users_email;

Monitoring and Diagnostics

Identify Slow Queries

Enable slow query logging:

-- Log queries taking longer than 1 second
log_min_duration_statement = 1000

-- Log query plans for slow queries
auto_explain.log_min_duration = 1000

Query pg_stat_statements

The pg_stat_statements extension tracks query performance:

CREATE EXTENSION pg_stat_statements;

-- Find slowest queries by total time
SELECT query, total_exec_time, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Monitor Index Usage

Find unused indexes:

SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE 'pg_toast%';

These indexes are candidates for removal.

Real-World Performance Tips

1. Partition Large Tables

For tables with billions of rows, partitioning improves query performance:

-- Create partitioned table
CREATE TABLE events (
    id BIGSERIAL,
    event_time TIMESTAMP NOT NULL,
    data JSONB
) PARTITION BY RANGE (event_time);

-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Queries filtering by event_time only scan relevant partitions.

2. Use Materialized Views for Complex Aggregations

CREATE MATERIALIZED VIEW user_stats AS
SELECT
    user_id,
    COUNT(*) as total_orders,
    SUM(amount) as total_spent
FROM orders
GROUP BY user_id;

-- Refresh periodically
REFRESH MATERIALIZED VIEW user_stats;

-- Or concurrent refresh (doesn't block reads)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

3. Consider Read Replicas

For read-heavy workloads, set up streaming replication:

Conclusion

PostgreSQL performance tuning is an iterative process:

  1. Measure: Use EXPLAIN ANALYZE and pg_stat_statements
  2. Optimize: Add indexes, rewrite queries, tune configuration
  3. Monitor: Track query performance over time
  4. Iterate: Continuously refine based on real workload patterns

Start with indexing and query optimization - these provide the biggest wins. Configuration tuning and advanced techniques like partitioning come later, once you’ve addressed the low-hanging fruit.

Remember: Premature optimization is the root of all evil. Profile first, optimize second.


Share this post on:

Next Post
Understanding Database Indexes: B-Trees vs Hash Indexes