Product

The database architecture for LLM-first applications

Dreambase is not pgvector — the HNSW index was co-designed with the storage engine, not added as an extension. It is not a vector store with a SQL adapter bolted on. The query planner treats vector similarity and scalar predicates as equal-class citizens and optimizes across both in one plan.

Core concepts

Every row carries both representations

In Dreambase, a table column can be typed as vector(dims) alongside standard SQL column types. There is no separate index object — the vector column is part of the row schema.

This means a single INSERT writes both structured metadata and the vector representation atomically. A delete removes both. An update refreshes both in the same transaction.

  • ACID guarantees cover vector columns as first-class citizens
  • No dual-write, no tombstone sync, no eventual-consistency lag
  • Schema migrations (ADD COLUMN, ALTER TYPE) work on vector cols
schema.sql
-- Define hybrid table
CREATE TABLE documents (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id    TEXT NOT NULL,
    content    TEXT,
    category   TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    embedding  VECTOR(1536)
);

-- HNSW index created automatically
-- No separate vector store needed

Cost-based hybrid query planner

The query optimizer looks at both SQL predicate selectivity and vector index statistics to decide scan order. When a WHERE clause would eliminate 95% of rows, scalar filter runs first — you're doing ANN on a small candidate set, not the whole table.

When selectivity is low, the planner flips: ANN scan produces top-K candidates, then SQL predicates apply. The crossover point is learned from actual index statistics, not a hardcoded threshold.

  • No query hints required — the planner makes the right call automatically
  • EXPLAIN HYBRID shows estimated cost for both paths
  • Supports JOINs, GROUP BY, window functions on hybrid tables
explain_hybrid.sql
EXPLAIN HYBRID
SELECT id, content
FROM documents
WHERE user_id = 'u_441'
ORDER BY embedding NEAR $1
LIMIT 5;

-- Output:
-- Plan: scalar-first (selectivity 0.003)
--   Scalar filter: user_id = 'u_441' → 2,847 rows
--   ANN on 2,847-row candidate set
-- Est. cost: 4.2ms at current index size

Embeddings that are always current

The fundamental problem with dual-store architecture is temporal: your SQL database reflects the latest state of data, but your vector store reflects the last time you ran an embedding job. In production, that lag is typically 15 minutes to 48 hours.

In Dreambase, the embedding column is part of the row. When you UPDATE a row — whether from an API write, a migration, or a background job — the embedding is part of the same transaction. The HNSW index reflects the committed state of the table, always.

  • No separate embedding pipeline to maintain or monitor
  • No "stale embedding" class of production incidents
  • Batch embedding updates work as normal SQL batch UPDATE
update.py
# Update content + embedding atomically
new_embedding = embed(new_content)
db.execute(
    "UPDATE documents SET content=%s,"
    " embedding=%s WHERE id=%s",
    [new_content, new_embedding, doc_id]
)
# Vector index reflects this commit.
# No lag. No sync job needed.

Python, Node, and PostgreSQL wire protocol

Dreambase speaks the PostgreSQL wire protocol. That means any Postgres-compatible client works out of the box — psycopg2, asyncpg, node-postgres, SQLAlchemy, Prisma. No special driver required for basic connectivity.

The native SDKs add convenience wrappers for vector column handling (automatic float32 array serialization, type-safe query builders, async support) and advanced features like EXPLAIN HYBRID parsing.

  • pip install dreambase
  • npm install dreambase
  • REST API for non-SDK environments
  • Full async support (asyncio, async/await)
async_client.py
import asyncio
import dreambase

async def main():
    async with dreambase.AsyncClient(
        url="postgresql://..."
    ) as db:
        rows = await db.query(
            "SELECT * FROM docs"
            " ORDER BY emb NEAR $1 LIMIT 5",
            [query_vec]
        )
        return rows

asyncio.run(main())
Specifications

Technical specifications

Hard numbers. The limits listed are enforced, not estimated. The latency figures are measured on Production-tier managed cloud at the stated row counts.

Dreambase hybrid query execution flow showing scalar filter and ANN scan paths converging
Vector index typeHNSW (hierarchical navigable small world)
SQL dialectPostgreSQL-compatible (wire protocol)
Max vector dimensions4096
Supported embedding modelsAny model producing float32 arrays
Query latency (p99)< 50ms at 1M rows (hybrid)
Storage engineCustom column store with vector co-location
DeploymentCloud-hosted (managed) or self-hosted Docker
Comparison

Why not Pinecone + Postgres?

Each option has real trade-offs. If you need sub-millisecond ANN at 100M+ vectors with write throughput above 50K/s, Dreambase is not the right tool — a purpose-built vector store is. If you need fresh retrieval, SQL joins on the same data, and fewer moving parts to operate, read on.

Capability Pinecone + Postgres pgvector Dreambase
Sync complexity High — custom sync job required None — same DB, but triggers needed Zero — atomic writes
Query freshness Lag = embedding job interval Fresh, but ANN quality limited Always current
Hybrid SQL + vector query Two separate queries + app-layer join Possible, but planner unaware of vectors Native, cost-optimized
Full SQL (joins, windows) Yes (Postgres side) Yes Yes
Ops overhead Two services, two monitoring stacks Low — single Postgres instance Low — single managed service
ANN index quality at scale High (purpose-built) Medium — HNSW added post-launch High — HNSW co-designed with storage

The docs have the full architecture detail.