Query execution plan showing plan estimation errors

Why Slow Queries Lie to You

Your query plan says it will scan 50 rows. The actual execution scans 2.3 million. Your database isn't broken — it's doing exactly what it was designed to do. The problem is that execution plans are built on statistics that go stale, and most teams never notice until a query that worked in staging falls apart in production under real load.

The statistics problem

Every major relational database maintains a set of table statistics: row counts, column cardinality estimates, histogram distributions. The query planner uses these statistics to estimate how many rows each node in a plan will return, and therefore which join strategy and scan type to use.

The problem is that statistics are only as fresh as your last ANALYZE. On a table that receives tens of thousands of inserts per day, day-old statistics can be dramatically wrong. Postgres, MySQL, and others auto-analyze on schedules, but high-write workloads outpace those schedules constantly.

Why nested loop joins betray you

When the planner underestimates row counts, it often chooses a nested loop join because it looks cheap at small scale. Nested loop joins are excellent for joining a few dozen rows. They're catastrophic for joining tens of thousands. By the time you see query timeout errors, the planner has already committed to its strategy and it's too late.

What to actually do about it

The short-term fix is obvious: run ANALYZE on your hot tables more frequently. The medium-term fix is to look at extended statistics for correlated columns — the planner can't reason about column correlation by default, and it matters in a lot of real schemas.

The harder and more important fix is instrumentation. You need to know when actual row counts diverge significantly from estimated row counts across your query workload. That divergence is the signal that your statistics have drifted and your plans are lying to you. Without automated detection, you find out about it from users, not from monitoring.

Dreambase surfaces plan vs. actual divergence as a first-class signal, flags queries where the planner's estimates are off by more than a configurable threshold, and triggers a statistics refresh automatically. It's the kind of thing that should be infrastructure-level behavior — not something individual engineers catch on investigation.