Rohan Yeole - Homepage Rohan Yeole

Django + PostgreSQL Performance: Indexes, Connection Pooling, and Query Plans

May 8, 20261 min read

Most Django applications slow down at the database layer, not the view layer. A Gunicorn worker that takes 800ms to respond is almost never slow because Python is slow — it is slow because the database query it runs is doing a sequential scan on a million-row table. Here is how to find and fix that.

Step 1: Identify Slow Queries

Before adding any indexes, measure which queries are actually slow.

Django Debug Toolbar (development)

Install django-debug-toolbar and watch the SQL panel. Queries taking more than 50ms in development will be proportionally worse in production.

PostgreSQL pg_stat_statements (production)

This extension tracks all executed queries and their aggregate statistics:

-- Enable once per database
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find the slowest queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

This shows you the actual slow queries in production — not the ones you guess are slow.

Step 2: Read EXPLAIN ANALYZE Output

Once you have a slow query, run EXPLAIN ANALYZE to see the execution plan:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';

Output to understand:

Seq Scan on orders  (cost=0.00..8432.00 rows=1 width=80) (actual time=45.123..45.123 rows=1)
  Filter: ((user_id = 42) AND (status = 'pending'))
  Rows Removed by Filter: 250000
Planning Time: 0.5 ms
Execution Time: 45.6 ms

The key signals: - Seq Scan — reading every row. Bad on large tables. Add an index. - Index Scan — using an index. Good. - Rows Removed by Filter — how many rows were discarded. High number with Seq Scan = needs index. - cost (estimated) vs actual time — large discrepancy means stale table statistics. Run ANALYZE orders;.

Step 3: Add the Right Indexes

Single-column index

class Order(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE, db_index=True)
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField(auto_now_add=True, db_index=True)

db_index=True creates a B-tree index on that column. Django does this automatically for ForeignKey fields.

A B-tree index stores keys in a sorted balanced tree. Each internal node contains key ranges and pointers to child nodes; each leaf node contains a (key, heap pointer) pair pointing to the actual row's location in the table's physical heap. PostgreSQL stores nodes in 8KB pages. On a table with one million rows, the tree is roughly 4 levels deep, so finding a single row requires 4 page reads (O(log n)). Without the index, PostgreSQL's sequential scan reads every page in the table — for a million rows at ~100 bytes per row, that's roughly 12,500 8KB pages to read (O(n)). At 10 million rows, that's 125,000 pages, and the difference between the indexed and unindexed case is the difference between 4ms and 4 seconds.

Composite index (for multi-column filters)

When you always query by both user_id AND status, a composite index is more efficient than two separate indexes:

class Order(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    status = models.CharField(max_length=20)

    class Meta:
        indexes = [
            models.Index(fields=["user", "status"], name="order_user_status_idx"),
        ]

Column order matters in composite indexes: put the most selective column first (the one that filters out the most rows). If user_id filters better than status, put user first.

Partial index (index only a subset of rows)

If you frequently query for a specific status (like pending orders), a partial index is smaller and faster than a full index:

from django.db.models import Q

class Meta:
    indexes = [
        models.Index(
            fields=["user", "created_at"],
            name="pending_orders_idx",
            condition=Q(status="pending"),
        )
    ]

This index only includes rows where status = 'pending'. If 90% of orders are completed, you have a tiny index that covers the hot path.

For text search on large string fields:

from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVectorField

class Article(models.Model):
    title = models.CharField(max_length=255)
    search_vector = SearchVectorField(null=True)

    class Meta:
        indexes = [
            GinIndex(fields=["search_vector"]),
        ]

Step 4: Connection Pooling with pgBouncer

Every Django request opens and closes a database connection. At low traffic this is fine. At high traffic, the connection overhead becomes significant — PostgreSQL has a hard limit on concurrent connections (typically 100–200 on a VPS), and opening a new connection for each request can exhaust the limit.

What pgBouncer does

pgBouncer is a lightweight proxy that sits between Django and PostgreSQL. It maintains a pool of persistent connections and reuses them across requests. From Django's perspective, nothing changes.

Install pgBouncer (Ubuntu/Debian)

sudo apt install pgbouncer

Configure /etc/pgbouncer/pgbouncer.ini:

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

Point Django to pgBouncer instead of PostgreSQL:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": "mydb",
        "USER": "myuser",
        "PASSWORD": "mypassword",
        "HOST": "127.0.0.1",
        "PORT": "6432",  # pgBouncer port, not 5432
        "CONN_MAX_AGE": 0,  # Let pgBouncer manage pooling
    }
}

With pool_mode = transaction, each transaction gets a connection from the pool and returns it when the transaction ends. This is the most efficient mode for Django.

Django's built-in persistent connections

If pgBouncer is overkill for your scale, Django's CONN_MAX_AGE keeps connections alive across requests:

DATABASES = {
    "default": {
        # ...
        "CONN_MAX_AGE": 60,  # Reuse connections for 60 seconds
    }
}

Set this to None for permanent connections (be careful — one per worker, times number of workers, must not exceed PostgreSQL's max_connections).

Step 5: Keep Statistics Fresh

PostgreSQL's query planner uses table statistics to choose execution plans. Outdated statistics lead to bad plans. Run ANALYZE regularly:

-- Manual
ANALYZE orders;

-- Autovacuum handles this automatically in most setups
-- Check if autovacuum is running
SELECT relname, last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY last_autoanalyze ASC;

If last_autoanalyze is null or very old for large tables, autovacuum may need tuning.

Quick Reference

ProblemFix
Seq Scan on large tableAdd index on filtered columns
Multi-column filter slowAdd composite index (most selective column first)
Index too large / rarely needed rowsPartial index with condition=Q(...)
Too many connections / connection refusedAdd pgBouncer
Query plan seems wrongRun ANALYZE tablename; to refresh statistics
Slow text searchUse SearchVectorField + GIN index

If you need a Django + PostgreSQL application built for real-world scale — with proper indexing, connection pooling, and query optimization from the start — hire me as a Django developer or as a backend developer.

Chat with me on WhatsApp