Database schema changes are risky. Traditional migrations require downtime, but modern applications demand 24/7 availability. This guide shows how to evolve your database schema without taking your application offline.
The Problem with Traditional Migrations
Standard migration approaches have a critical flaw:
-- Traditional approach: Breaking change
ALTER TABLE users DROP COLUMN middle_name;
What happens:
- Deploy migration → column disappears
- Old application code tries to read
middle_name→ ERROR - Application breaks until new code is deployed
Even with perfect coordination, there’s a window where the schema and code are incompatible.
Zero-Downtime Migration Principles
Follow these principles to migrate safely:
1. Backward Compatibility
Every schema change must work with both old and new application code:
-- Bad: Breaks old code immediately
ALTER TABLE users RENAME COLUMN name TO full_name;
-- Good: Add new column, keep old one
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
UPDATE users SET full_name = name WHERE full_name IS NULL;
2. Expand-Contract Pattern
The safest migration strategy:
Expand Phase:
- Add new schema elements (columns, tables, indexes)
- Old code ignores new schema
- New code uses new schema
Contract Phase (after old code is gone):
- Remove old schema elements
- Clean up redundant data
3. Multi-Step Deployments
Never combine schema changes with code changes in a single deployment. Instead:
- Deploy schema expansion
- Deploy code that works with both old and new schema
- Migrate data (if needed)
- Deploy code that uses only new schema
- Deploy schema contraction
Common Migration Patterns
Pattern 1: Adding a Column
Safe approach:
-- Step 1: Add column with default value
ALTER TABLE products ADD COLUMN description TEXT DEFAULT '';
-- Step 2: Deploy code that writes to description
-- Old code: Ignores description
-- New code: Reads and writes description
-- Step 3: Backfill existing rows (if needed)
UPDATE products SET description = name WHERE description = '';
-- Step 4: Remove default constraint (if needed)
ALTER TABLE products ALTER COLUMN description DROP DEFAULT;
-- Step 5: Add NOT NULL constraint (if needed)
-- Only after all rows have values!
ALTER TABLE products ALTER COLUMN description SET NOT NULL;
Timeline:
- Deploy 1: Schema change (instant)
- Deploy 2: Code using new column (minutes later)
- Deploy 3: Backfill data (hours/days later, off-peak)
- Deploy 4: Final constraints (days later)
Pattern 2: Removing a Column
Use expand-contract pattern:
-- Phase 1: EXPAND - Stop writing to column
-- Deploy code that doesn't write to legacy_field
-- Wait for all deployments to complete
-- Phase 2: Verify no writes
SELECT * FROM audit_log
WHERE table_name = 'users'
AND column_name = 'legacy_field'
AND operation = 'UPDATE'
AND created_at > NOW() - INTERVAL '1 day';
-- Phase 3: CONTRACT - Drop column
ALTER TABLE users DROP COLUMN legacy_field;
Important: Wait days or weeks between phases to ensure no rollbacks are needed.
Pattern 3: Renaming a Column
Renaming requires a multi-step approach:
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
-- Step 2: Copy data
UPDATE users SET email_address = email WHERE email_address IS NULL;
-- Step 3: Create trigger to keep columns in sync
CREATE OR REPLACE FUNCTION sync_email_columns()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
NEW.email_address := NEW.email;
NEW.email := NEW.email_address;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER sync_email
BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_email_columns();
-- Step 4: Deploy code that uses email_address
-- Both columns stay in sync via trigger
-- Step 5 (weeks later): Drop old column and trigger
DROP TRIGGER sync_email ON users;
DROP FUNCTION sync_email_columns();
ALTER TABLE users DROP COLUMN email;
Pattern 4: Changing Column Type
Type changes are particularly dangerous. Use expand-contract:
-- Current: price DECIMAL(10,2)
-- Target: price DECIMAL(15,4)
-- Step 1: Add new column
ALTER TABLE products ADD COLUMN price_new DECIMAL(15,4);
-- Step 2: Backfill data
UPDATE products SET price_new = price WHERE price_new IS NULL;
-- Step 3: Create sync trigger
CREATE TRIGGER sync_price
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION sync_price_columns();
-- Step 4: Deploy code reading from price_new
-- Step 5: Deploy code writing to price_new only
-- Step 6: Drop trigger and old column
For complex type changes (e.g., STRING → JSON):
-- Parse and migrate data in batches
DO $$
DECLARE
batch_size INT := 1000;
processed INT := 0;
BEGIN
LOOP
UPDATE products
SET metadata_json = metadata_string::jsonb
WHERE id IN (
SELECT id FROM products
WHERE metadata_json IS NULL
LIMIT batch_size
);
GET DIAGNOSTICS processed = ROW_COUNT;
EXIT WHEN processed = 0;
-- Sleep to avoid overwhelming database
PERFORM pg_sleep(1);
END LOOP;
END $$;
Pattern 5: Adding an Index
Large indexes can lock tables. Use concurrent index creation:
-- Bad: Locks table for duration of index build
CREATE INDEX idx_users_email ON users(email);
-- Good: Builds index without blocking writes (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- For MySQL (5.6+):
ALTER TABLE users ADD INDEX idx_users_email (email), ALGORITHM=INPLACE, LOCK=NONE;
Important: Concurrent index creation is slower and uses more resources, but doesn’t block production traffic.
Pattern 6: Splitting a Table
When a table becomes too large or serves multiple purposes:
-- Current: users table with orders data
-- Goal: Separate users and user_orders tables
-- Step 1: Create new table
CREATE TABLE user_orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id),
order_date DATE,
total DECIMAL(10,2),
-- other order fields
);
CREATE INDEX idx_user_orders_user_id ON user_orders(user_id);
-- Step 2: Migrate data
INSERT INTO user_orders (user_id, order_date, total)
SELECT id, last_order_date, last_order_total
FROM users
WHERE last_order_date IS NOT NULL;
-- Step 3: Deploy code that writes to both places
-- Step 4: Verify data consistency
-- Step 5: Deploy code that reads from new table
-- Step 6: Remove old columns from users table
Advanced Techniques
Feature Flags for Schema Changes
Use feature flags to control schema usage:
# Application code
def get_user_email(user_id):
if feature_flags.is_enabled('use_new_email_column'):
return db.query("SELECT email_address FROM users WHERE id = %s", user_id)
else:
return db.query("SELECT email FROM users WHERE id = %s", user_id)
Benefits:
- Test new schema with subset of traffic
- Instant rollback without deployment
- Gradual migration
Shadow Traffic
Validate new schema without risk:
def write_order(order_data):
# Write to production schema
db.execute("INSERT INTO orders_v1 (data) VALUES (%s)", order_data)
# Shadow write to new schema (don't use result)
try:
db.execute("INSERT INTO orders_v2 (data) VALUES (%s)", transform_data(order_data))
except Exception as e:
log_error("Shadow write failed", e) # Log but don't fail request
Online Schema Change Tools
For massive tables, use specialized tools:
PostgreSQL:
- pg_repack: Rebuild tables without locking
MySQL:
- gh-ost: GitHub’s online schema change tool
- pt-online-schema-change: Percona’s tool
Example with gh-ost:
gh-ost \
--host=prod-db \
--database=myapp \
--table=users \
--alter="ADD COLUMN bio TEXT" \
--execute
Benefits:
- Changes applied with minimal locking
- Progress monitoring
- Pause/resume capability
- Automatic rollback on error
Migration Testing
Test in Production-Like Environment
-- Create staging replica
pg_basebackup -D /var/lib/postgresql/staging -Fp -Xs -P -R
-- Test migration
psql staging < migration.sql
-- Verify performance
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('users'));
Rollback Strategy
Every migration needs a rollback plan:
-- migration_up.sql
BEGIN;
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);
COMMIT;
-- migration_down.sql
BEGIN;
DROP INDEX idx_users_phone;
ALTER TABLE users DROP COLUMN phone;
COMMIT;
Test rollback before deploying:
# Apply migration
psql < migration_up.sql
# Test application
# Rollback
psql < migration_down.sql
Real-World Example: Migrating User Authentication
Goal: Move from username-based to email-based authentication
Current schema:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL
);
Target schema:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL
);
Migration steps:
-- Week 1: Add email column
ALTER TABLE users ADD COLUMN email VARCHAR(255);
CREATE UNIQUE INDEX CONCURRENTLY idx_users_email ON users(email);
-- Deploy: Application collects email during registration
-- Users table: username (unique), email (nullable, becoming populated)
-- Week 2: Backfill emails for existing users
-- Prompt users to add email, or generate from username
UPDATE users SET email = username || '@legacy.example.com'
WHERE email IS NULL;
-- Week 3: Make email required
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
-- Deploy: Application authenticates with email OR username
-- Users table: username (unique), email (unique, not null)
-- Week 4: Deprecate username-based authentication
-- Deploy: Application only authenticates with email
-- Add warning banner: "Username login will be removed"
-- Week 5+: Monitor for username usage
SELECT COUNT(*) FROM auth_logs
WHERE auth_method = 'username'
AND created_at > NOW() - INTERVAL '7 days';
-- Week 8: Remove username column (if usage == 0)
DROP INDEX idx_users_username;
ALTER TABLE users DROP COLUMN username;
-- Deploy: Remove username support from application
Monitoring During Migrations
Track key metrics:
-- Lock monitoring (PostgreSQL)
SELECT
pid,
usename,
pg_blocking_pids(pid) as blocked_by,
query as blocked_query
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0;
-- Long-running queries
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - pg_stat_activity.query_start > interval '5 minutes';
-- Table bloat after migration
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Best Practices Checklist
Before any migration:
- Test migration on production-like data
- Measure migration time on full dataset
- Write rollback script and test it
- Plan deployment timeline (multiple phases)
- Set up monitoring and alerts
- Communicate with team about deployment
- Have database backup ready
- Schedule during low-traffic period
- Prepare feature flag for quick rollback
- Document what changed and why
Conclusion
Zero-downtime migrations require patience and planning:
- Never break backward compatibility
- Use expand-contract pattern for significant changes
- Deploy in multiple phases separated by time
- Test everything in staging first
- Monitor closely during and after migration
The key insight: Database changes can’t be atomic when you have running application code. Accept this reality and design migrations that work with it.
A week-long migration might feel slow, but it’s infinitely faster than recovering from a production outage. Take your time, test thoroughly, and migrate safely.