Blog Benchmarks

Dreambase vs. Pinecone + Postgres: a real engineering comparison

·17 min read·Andrew Keil
Abstract benchmark comparison visualization with two contrasting architecture patterns

We built Dreambase partly out of frustration with the operational cost of the dual-store pattern, but we wanted to know whether that frustration translated into measurable differences — not just anecdotes. So we ran the same RAG workload against two architectures: Dreambase, and a Pinecone + PostgreSQL setup configured as a reasonable production implementation of the dual-store approach.

Before diving into the numbers, a note on methodology: this is internal benchmarking, not a neutral third-party study. We built and operated both setups. We tried to be fair about the Pinecone + Postgres configuration — using recommended HNSW index settings, proper connection pooling, and a realistic CDC-based sync pipeline rather than a naive batch job. But we have an obvious interest in Dreambase performing well, and you should weight these results accordingly. The goal of this post is to give you enough setup detail that you could reproduce or challenge any specific number we cite.

Test setup and workload

Workload: a document retrieval application backed by 8 million documents, each in the 200-500 word range, with 1536-dimension embeddings (text-embedding-3-small compatible dimensions). The document corpus updates at ~12,000 rows per day — representative of a growing knowledge management tool or enterprise wiki.

Query mix: 70% hybrid queries (vector similarity + at least one scalar filter), 20% pure vector queries (no scalar filter), 10% pure SQL queries (no vector component). This reflects a real multi-tenant knowledge base where most queries are scoped to a specific user or organization.

Hardware: both setups ran on equivalent compute — Dreambase on our managed cloud (US-East, 16 vCPU, 64GB RAM), Pinecone on its standard pod-based tier with Postgres on a comparable RDS instance (db.r6g.2xlarge). The sync pipeline used Debezium on a separate t3.medium writing to a 5-partition Kafka topic, with a Python consumer handling batch embedding calls to the same embedding API endpoint.

Query load: 50 concurrent clients, 200 queries per second at peak, sustained for 30-minute windows. We ran three sessions to get stable p99 values.

Write latency

For writes, Dreambase is slower than Pinecone + Postgres by design. Writing a row in Dreambase triggers synchronous embedding computation at the database layer — the write doesn't complete until the embedding is generated and the HNSW index is updated. This takes additional time relative to a pure SQL write that offloads embedding to an async pipeline.

Setupp50 write (ms)p99 write (ms)
Dreambase1438
Pinecone + Postgres512

The Dreambase write latency is higher because it includes embedding computation (the dominant cost). The Pinecone + Postgres write is faster because it only writes to Postgres; the vector is generated asynchronously by the CDC consumer, which is not reflected in the write latency figure.

If your application is write-latency sensitive and can tolerate a freshness lag in retrieval, the dual-store write latency profile is better. If write latency is acceptable at these levels and retrieval freshness matters, Dreambase's write profile is the cost you pay for synchronous freshness.

Query latency

For hybrid queries — the 70% majority of our query mix — the results are where co-location makes a difference:

Query typeDreambase p50Dreambase p99Pinecone + Postgres p50Pinecone + Postgres p99
Hybrid (vector + filter)12ms44ms28ms118ms
Pure vector9ms31ms11ms38ms
Pure SQL3ms14ms3ms11ms

For pure vector queries, both approaches are similar — Pinecone's dedicated vector index is well-optimized, and the latency difference is small. For pure SQL, they are functionally identical. The difference is in hybrid queries, where the Pinecone + Postgres setup requires a round-trip: query Pinecone for top-K candidates, then filter in Postgres, or alternatively query Postgres first and then do a batch lookup in Pinecone. Both approaches have more network overhead than a single co-located query, which explains the 2.5-3x p99 gap.

We are not claiming a "10x improvement" or similar. The actual gap is meaningful — 44ms vs 118ms p99 on hybrid queries — and it compounds at 200 QPS, but it's not dramatic for all workloads. If your application is latency-tolerant and hybrid queries are not the majority, this gap may not matter.

Freshness gap

This is where the dual-store pattern's costs are most visible. With the CDC pipeline configured for maximum throughput and a Kafka consumer processing embedding calls in batches of 50:

MetricDreambasePinecone + Postgres
Avg lag from write to query-visible embedding~0ms (synchronous)~45 seconds (CDC → embed → upsert)
p99 lag under normal load~0ms~3 minutes
p99 lag under high write load (2x burst)~0ms~18 minutes
Stale rows at any given time (steady state)0~90-150 rows in transit

The 18-minute lag under high-write burst is real. When 24,000 rows arrived in a burst window (simulating a bulk document import), the CDC consumer's throughput was limited by the embedding API's rate limit. The queue backed up. Queries during that window were retrieving against a partially updated corpus.

Operational cost

This is harder to quantify but arguably more important for small teams. Setting up the dual-store required:

  • Pinecone account + API key management
  • Postgres RDS instance + parameter tuning
  • Debezium connector deployment and configuration
  • Kafka cluster (3 brokers) + consumer service
  • Embedding retry logic and idempotency handling in the consumer
  • Separate monitoring for: Pinecone index status, Postgres performance, Kafka consumer lag, embedding API error rates

The setup took roughly 3 days of engineering time. It introduced 6 separate observable failure modes that each require a distinct runbook. In steady state, the Kafka consumer lag metric is the key freshness indicator — if it climbs, retrieval is degrading, and the fix involves either increasing consumer parallelism or reducing the embedding batch call frequency.

Dreambase's operational surface for the same workload: one connection string, one monitoring dimension (query latency), one database to back up. Setup time for our test was about 4 hours, including data ingestion.

When to use each

The dual-store pattern remains appropriate when: your write volume is very high and synchronous embedding on writes is not latency-acceptable; your vector workload is mostly pure-vector with few scalar filters; you have specific vendor features in your current vector store that you depend on; or you have an existing production Postgres with significant tooling built around it that you're not ready to migrate off.

Dreambase is the better choice when: retrieval freshness is a product requirement (not just a nice-to-have); the majority of your queries are hybrid (vector + filters); your team is small and operational complexity is a real cost; or you're starting from scratch and don't have an existing dual-store investment to protect.

The benchmark numbers matter, but the operational simplicity argument is probably more durable. Latency improvements are specific to hardware and query patterns. The observation that a dual-store system has two failure surfaces and a synchronization layer that needs ongoing maintenance is architecture-level and doesn't depend on the benchmark. If you're evaluating whether to build on Dreambase or Pinecone + Postgres, we'd suggest running this comparison on your own query distribution rather than taking our numbers as definitive. The test setup details are enough to reproduce.