PostgreSQL is a powerful relational database, but out-of-the-box performance isn’t always optimal for your specific workload. This guide covers essential performance tuning techniques that can dramatically improve query response times and throughput.
Understanding PostgreSQL Performance
Before diving into optimizations, it’s crucial to understand where bottlenecks occur. PostgreSQL performance issues typically fall into three categories:
- Query inefficiency - Poorly written queries or missing indexes
- Configuration problems - Default settings don’t match your hardware
- Resource constraints - CPU, memory, or disk I/O limitations
Indexing Strategies
Indexes are the foundation of query performance. A well-designed index can reduce query time from seconds to milliseconds.
B-Tree Indexes: The Default Choice
B-tree indexes work well for most scenarios, especially equality and range queries:
-- Create a simple B-tree index
CREATE INDEX idx_users_email ON users(email);
-- B-tree indexes support these query patterns:
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users WHERE age BETWEEN 18 AND 65;
Composite Indexes: Order Matters
When indexing multiple columns, column order is critical:
-- This index is optimal for queries filtering by status first, then created_at
CREATE INDEX idx_orders_status_created ON orders(status, created_at);
-- Efficient: Uses the index
SELECT * FROM orders WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days';
-- Less efficient: Only uses first column of index
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days';
Rule of thumb: Place the most selective column first, or the column used in WHERE clauses most frequently.
Partial Indexes: Index Only What Matters
Partial indexes are smaller and faster because they only index a subset of rows:
-- Only index active users
CREATE INDEX idx_active_users_email
ON users(email)
WHERE status = 'active';
-- This query uses the partial index efficiently
SELECT * FROM users WHERE email = 'user@example.com' AND status = 'active';
This is particularly useful for queries that always filter on a specific condition.
Covering Indexes: Include Additional Columns
PostgreSQL 11+ supports the INCLUDE clause to add non-key columns to an index:
CREATE INDEX idx_products_category_include
ON products(category)
INCLUDE (name, price);
-- This query can be answered entirely from the index (index-only scan)
SELECT name, price FROM products WHERE category = 'electronics';
This enables index-only scans, avoiding the need to access the table heap.
Query Optimization Techniques
Analyze Your Queries with EXPLAIN
The EXPLAIN command shows PostgreSQL’s query execution plan:
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name;
Key metrics to watch:
- Seq Scan: Full table scan (often slow for large tables)
- Index Scan: Using an index (usually fast)
- Cost: PostgreSQL’s estimate of query expense
- Actual time: Real execution time
Avoid SELECT *
Only select columns you need:
-- Bad: Fetches unnecessary data
SELECT * FROM users WHERE id = 123;
-- Good: Only fetches required columns
SELECT id, name, email FROM users WHERE id = 123;
This reduces I/O and makes covering indexes possible.
Use JOINs Instead of Subqueries
Modern PostgreSQL optimizes JOINs better than correlated subqueries:
-- Slower: Correlated subquery
SELECT u.name,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;
-- Faster: JOIN with GROUP BY
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
Batch Operations
Use batch inserts and updates instead of row-by-row operations:
-- Slow: Individual inserts
INSERT INTO logs (message) VALUES ('Log 1');
INSERT INTO logs (message) VALUES ('Log 2');
-- ... thousands more
-- Fast: Batch insert
INSERT INTO logs (message) VALUES
('Log 1'),
('Log 2'),
('Log 3')
-- ... thousands in one statement
-- Or use COPY for even better performance
Configuration Tuning
PostgreSQL’s default configuration is conservative. Tune these settings based on your hardware:
Memory Settings
-- postgresql.conf
-- Shared buffers: 25% of system RAM (up to 8-16GB)
shared_buffers = 4GB
-- Effective cache size: 50-75% of system RAM
effective_cache_size = 12GB
-- Work mem: RAM for sorts/hashes per operation
work_mem = 64MB
-- Maintenance work mem: For VACUUM, CREATE INDEX
maintenance_work_mem = 1GB
Checkpoint and WAL Settings
-- Reduce checkpoint frequency for better write performance
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
-- Increase WAL segment size
max_wal_size = 4GB
min_wal_size = 1GB
Connection Pooling
Instead of creating connections per request, use a connection pool:
-- Limit max connections
max_connections = 200
-- Use PgBouncer or similar pooler for better connection management
Application-side pooling (PgBouncer, connection pools in your app) is crucial for high-concurrency workloads.
Maintenance Operations
Regular maintenance prevents performance degradation:
VACUUM and ANALYZE
-- Manual vacuum (usually automatic via autovacuum)
VACUUM ANALYZE users;
-- For heavily updated tables, more aggressive settings:
ALTER TABLE users SET (autovacuum_vacuum_scale_factor = 0.05);
ALTER TABLE users SET (autovacuum_analyze_scale_factor = 0.05);
REINDEX for Index Bloat
Over time, indexes can become bloated:
-- Rebuild an index
REINDEX INDEX idx_users_email;
-- Rebuild all indexes on a table (locks table)
REINDEX TABLE users;
-- Concurrent reindex (PostgreSQL 12+, doesn't block writes)
REINDEX INDEX CONCURRENTLY idx_users_email;
Monitoring and Diagnostics
Identify Slow Queries
Enable slow query logging:
-- Log queries taking longer than 1 second
log_min_duration_statement = 1000
-- Log query plans for slow queries
auto_explain.log_min_duration = 1000
Query pg_stat_statements
The pg_stat_statements extension tracks query performance:
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries by total time
SELECT query, total_exec_time, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Monitor Index Usage
Find unused indexes:
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE 'pg_toast%';
These indexes are candidates for removal.
Real-World Performance Tips
1. Partition Large Tables
For tables with billions of rows, partitioning improves query performance:
-- Create partitioned table
CREATE TABLE events (
id BIGSERIAL,
event_time TIMESTAMP NOT NULL,
data JSONB
) PARTITION BY RANGE (event_time);
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Queries filtering by event_time only scan relevant partitions.
2. Use Materialized Views for Complex Aggregations
CREATE MATERIALIZED VIEW user_stats AS
SELECT
user_id,
COUNT(*) as total_orders,
SUM(amount) as total_spent
FROM orders
GROUP BY user_id;
-- Refresh periodically
REFRESH MATERIALIZED VIEW user_stats;
-- Or concurrent refresh (doesn't block reads)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;
3. Consider Read Replicas
For read-heavy workloads, set up streaming replication:
- Route reads to replicas
- Keep writes on primary
- Use connection pooling to distribute load
Conclusion
PostgreSQL performance tuning is an iterative process:
- Measure: Use
EXPLAIN ANALYZEandpg_stat_statements - Optimize: Add indexes, rewrite queries, tune configuration
- Monitor: Track query performance over time
- Iterate: Continuously refine based on real workload patterns
Start with indexing and query optimization - these provide the biggest wins. Configuration tuning and advanced techniques like partitioning come later, once you’ve addressed the low-hanging fruit.
Remember: Premature optimization is the root of all evil. Profile first, optimize second.