What is a hybrid vector-SQL database?

When most engineering teams reach for a vector store, they do so in addition to their existing relational database — not instead of it. They already have Postgres running user records, documents, product catalogs, or conversation history. They add Pinecone, or Weaviate, or Qdrant on top, wire up an embedding job to keep the two in sync, and call it done. This architecture works. It also quietly accumulates operational debt at a rate that doesn't show up until month six of production.

A hybrid vector-SQL database is a different premise entirely. Rather than splitting vector storage and relational storage across two systems and bridging them with a synchronization layer, it stores both in the same row. The vector is a first-class column alongside your integers, timestamps, and text. A single query can filter on user_id = 42 and retrieve the top-5 most semantically similar documents at the same time, without any JOIN, without any cross-system call, without any freshness gap between what the SQL side knows and what the vector index contains.

The question we get most often is: "why does this matter enough to build a new database?" The answer lives in the details of what the dual-store architecture costs you, which most teams discover after they've already paid the price.

What "hybrid" actually means at the storage level

Hybrid, in this context, has a precise technical meaning. It is not a SQL database with an extension bolted on (pgvector is an extension, not a co-designed storage layer). It is not a vector store with a metadata filter on top (Pinecone namespaces and Weaviate properties are not a relational query planner). It means the row storage, the vector index, and the query planner are designed together, from the start, to serve both query types with awareness of each other's cost model.

In Dreambase, every table has a vector column alongside structured columns. The HNSW index is co-located with row storage, meaning an ANN scan and a scalar predicate scan operate on the same physical pages. The query planner knows the estimated selectivity of both predicates and picks an execution order that minimizes the total scan footprint. This is architecturally different from calling an embedding API, pushing to a vector store, and running a SQL query separately, then intersecting results in application code.

The dual-store problem in practice

Consider a document management application running roughly 2 million documents, updated at an average rate of ~8,000 writes per day. The team runs an embedding job every 6 hours. On the surface, this sounds manageable. In practice:

At any given time, up to 2,000 documents in Postgres have been updated but haven't yet been re-embedded. A semantic search query issued against the vector store returns results that don't reflect those updates.
When a document is deleted from Postgres, it persists in the vector index until the next job run. Retrieval can return deleted content.
The sync job requires its own error handling, retry logic, and idempotency guarantees. When it falls behind — and it does fall behind, because embedding API rate limits are real — the gap compounds.
Infrastructure cost is double: two databases means two connection pools, two sets of backups, two pricing models, two on-call rotations.

We are not saying the dual-store pattern is wrong for every team. Teams with very high write throughput and a tolerance for eventual consistency in retrieval sometimes find it appropriate. What we are saying is that the pattern's costs are systematically underestimated at the time a team adopts it, because those costs are operational and they accrue over time rather than appearing on day one.

How co-location changes the operational model

When vectors and structured data live in the same row, several properties follow automatically. They are written atomically — there is no window where the SQL side knows about a row that the vector side doesn't. They are deleted together. There is no orphaned vector pointing at a deleted document. There is no sync job to maintain, monitor, or debug. Freshness is a property of the write path, not a separate system concern.

This matters most in applications where data changes frequently and retrieval accuracy depends on current state. A customer support knowledge base where articles are edited daily. A codebase search tool where functions are refactored constantly. An agent memory store where conversation turns are added at high volume. In all of these, even a 6-hour lag between source-of-truth and vector index can meaningfully degrade the answers your LLM produces.

The hybrid model also opens query patterns that are cumbersome or impossible in a dual-store architecture. A query like "find the 3 most semantically similar documents to this embedding, written by author_id 17, in the past 30 days, not marked deleted" is a single call in Dreambase:

results = db.query(
    table="documents",
    vector_col="embedding",
    query_vector=my_embedding,
    top_k=3,
    where="author_id = 17 AND created_at > NOW() - INTERVAL '30 days' AND deleted = false"
)

The equivalent in a dual-store setup requires fetching candidates from the vector store, then filtering against Postgres, then re-ranking if the post-filter candidate set is too small. The round-trip cost adds up at scale. More importantly, the SQL predicates run on data that might not match the state the vector store indexed.

When a hybrid database makes sense — and when it doesn't

A hybrid vector-SQL database is the right choice when the application's primary access pattern involves both structured filtering and vector retrieval on the same records, and when data freshness in retrieval is a product-level requirement rather than a nice-to-have.

It is less obviously the right choice when your vector workload is append-only and immutable — for example, indexing a static corpus of documents that never changes. If your documents are fixed, the sync job runs once and the freshness problem mostly disappears. A dedicated vector store might be simpler to operate in that scenario. Similarly, if your team already has a deep Postgres expertise and the retrieval workload is small, pgvector might be sufficient: it is an extension on a database you already understand, and the operational model is familiar even if the storage architecture is not co-designed.

The hybrid architecture earns its keep when writes are ongoing, the retrieval set is filtered by structured predicates, and the cost of stale retrieval is meaningful to your application's users. That describes most production LLM applications we've seen — RAG over live knowledge bases, semantic search over user-generated content, agent memory with per-user access controls, recommendation systems over products that change price and availability. For those workloads, the single-store hybrid model removes a class of synchronization bugs that, once eliminated, you wonder how you tolerated for so long.

The architectural decision is ultimately about which failure modes you want to live with. Dual-store gives you operational separation at the cost of synchronization complexity. A hybrid store removes the synchronization problem entirely, at the cost of having a newer, less battle-tested storage engine under you. Both trade-offs are real. Pick the one that matches your team's actual risk tolerance — not the one that looks simpler to set up on a Tuesday afternoon.