It's 2:47am. Your on-call phone just went off. The application is slow, tickets are piling up, and someone in Slack has already typed "the database is down" (it's not). You SSH in, run your first EXPLAIN ANALYZE, and see it: a sequential scan on a 40-million-row table. Again.
PostgreSQL indexing is one of those topics where the gap between knowing the syntax and understanding the strategy is enormous. Everyone knows you can run CREATE INDEX. Far fewer DBAs know which index to create, when not to create one, and how to verify an index is actually being used. This post is the latter.
Start With the Query, Not the Index
The most common mistake I see from junior DBAs and developers alike is adding indexes based on intuition. "This column is in our WHERE clause, so let's index it." That thinking leads to tables with 30 indexes, bloated writes, and a planner that sometimes ignores your index entirely because the statistics said it wasn't worth it.
Start with the query. Run EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) and read the output. You're looking for:
- Seq Scan on large tables — the smoking gun. If the table has millions of rows and there's no index, every query is a full scan.
- High rows removed by filter — you have an index, but the planner is scanning too many rows before filtering. A partial index may help.
- Nested Loop with large outer tables — a join is iterating the outer table and hitting the inner table with index lookups. The inner column needs an index, not the outer.
- Shared hit vs. read ratios — high reads from disk mean your working set doesn't fit in
shared_buffers. That's a memory problem, not an index problem.
The planner tells you everything. Learn to read it before you reach for CREATE INDEX.
The Index Types PostgreSQL Actually Has
Most PostgreSQL shops use B-tree indexes for everything and call it a day. That works, but you're leaving performance on the table. Here's when to use what:
- B-tree — Default, general purpose. Equality, range (<, >, BETWEEN), and ordering. Use this unless you have a specific reason not to.
- Hash — Equality only. Slightly faster than B-tree for pure equality lookups, but can't do range queries or ORDER BY. Rarely worth the trade-off.
- GIN (Generalized Inverted Index) — Array containment, full-text search, JSONB operators. If you're querying inside arrays or JSON columns, B-tree won't help you — GIN will.
- GiST — Geometric types, full-text, range types. Use when GIN doesn't cover your operator class.
- BRIN (Block Range Index) — Extremely small index for naturally ordered data like timestamps or sequential IDs. If you have a billion-row event log ordered by
created_at, a BRIN index is orders of magnitude smaller than a B-tree and nearly as effective for range scans.
The BRIN case is the one I see missed most often. A 10TB events table with a sequential timestamp column doesn't need a 50GB B-tree index. A BRIN index on created_at costs almost nothing and cuts range scans dramatically — and it updates in near-real-time as new rows are written.
Partial Indexes: The Underused Power Tool
A partial index only indexes rows that match a condition. The index is smaller, faster to update, and often more selective — which means the planner is more likely to use it.
Classic use cases:
-- Index only unprocessed jobs (not the 98% already completed)
CREATE INDEX idx_jobs_pending ON jobs (created_at)
WHERE status = 'pending';
-- Index only non-deleted rows for soft-delete tables
CREATE INDEX idx_users_active ON users (email)
WHERE deleted_at IS NULL;
-- Index only high-value orders for reporting
CREATE INDEX idx_orders_large ON orders (customer_id, created_at)
WHERE total_amount > 10000;
The job queue example is particularly impactful. If your jobs table has 50 million rows and only 5,000 are pending at any time, a full index has 50 million entries. A partial index on pending rows has 5,000. Writes are faster, the index fits in memory, and every query on pending jobs benefits.
Your query must include the same WHERE condition for the planner to use a partial index. That's the only catch — and it's worth designing your queries around when the selectivity gain is this large.
Covering Indexes: Eliminating Heap Fetches
When PostgreSQL satisfies a query entirely from the index without touching the main table (the heap), it's called an index-only scan. These are significantly faster because they skip the heap fetch entirely — no random I/O to the table pages.
To enable index-only scans, use INCLUDE to add non-key columns to the index:
-- Query: SELECT email, created_at FROM users WHERE user_id = $1
-- Standard index: still requires heap fetch for email + created_at
CREATE INDEX idx_users_id ON users (user_id);
-- Covering index: enables index-only scan, no heap fetch
CREATE INDEX idx_users_covering ON users (user_id) INCLUDE (email, created_at);
The INCLUDE columns are stored in the leaf nodes but not the B-tree structure itself, so they don't add overhead to comparisons or sorting — just storage. Use this on your most-queried lookup patterns where the SELECT list is predictable. The performance difference on high-traffic tables can be dramatic: I've seen covering indexes cut query time in half on tables processing millions of lookups per hour.
Finding Unused and Bloated Indexes
PostgreSQL tracks index usage in pg_stat_user_indexes. After your app has been running for at least a few days, run this:
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname NOT IN ('pg_catalog', 'pg_toast')
ORDER BY pg_relation_size(indexrelid) DESC;
Zero scans on a 2GB index? That index is costing you write overhead and storage with zero read benefit. Drop it. I've run this on new client databases and found 15 to 20 unused indexes on a single table — each one added by someone who thought it might help, never verified, and left behind. Dead weight your writes are paying for on every transaction.
For index bloat caused by high rates of updates and deletes, check sizes against expected data volume. If an index is far larger than the data it covers, a REINDEX CONCURRENTLY will reclaim space without taking an exclusive lock. Schedule this during off-peak hours and monitor progress in pg_stat_progress_create_index.
Index Maintenance Rules That Matter in Production
- Always use
CREATE INDEX CONCURRENTLYon production tables. Regular index creation takes an exclusive lock that blocks all reads and writes. On a busy table, that's a brief but real outage. - Monitor autovacuum on high-write tables. Bloated tables make index scans slower because dead tuple ratios skew cost estimates. Tune
autovacuum_vacuum_scale_factordown for tables with heavy insert/update/delete patterns. - Run
ANALYZEafter bulk loads. A bulk INSERT of 5 million rows without a subsequentANALYZEmeans the planner uses stale statistics and may skip perfectly good indexes entirely. - Watch
seq_scanvsidx_scaninpg_stat_user_tables. High sequential scans on a large table that has indexes usually means statistics are stale, the index isn't being chosen, or the planner's cost estimates are misconfigured for your hardware.
The 3am Checklist for Slow PostgreSQL Queries
When you get paged for a slow PostgreSQL query, work through this in order:
- Run
EXPLAIN (ANALYZE, BUFFERS)— identify seq scans, high filter rows, heap fetches - Check
pg_stat_user_indexes— is the expected index being scanned at all? - Check table statistics freshness — when did autovacuum last run on this table?
- Consider a partial index if only a small fraction of rows are queried
- Consider a covering index if the SELECT columns are predictable and the query is high-traffic
- Consider BRIN for large, naturally-ordered time-series or sequential data
- After adding any index, verify with
EXPLAIN— never assume the planner will use it
PostgreSQL index tuning isn't magic. It's methodical: read the query plan, understand what the planner sees, and give it the right structure to work with. The DBAs who are fast at this aren't guessing — they've built the habit of reading EXPLAIN output until it's second nature.
If your PostgreSQL instances are showing signs of performance degradation — slow queries, growing response times, or unexplained CPU spikes — our remote DBA performance tuning service can diagnose and resolve these issues without the cost of a full-time hire. Take the free database assessment to see exactly where your databases stand.