Home  /  Blog  /  Engineering

Engineering

Why we don't use vector search for retrieval (most of the time)

For the first six months of building Lossless, every retrieval went through a vector database. That was wrong. Here's what we replaced it with, and how we kept just enough fuzzy to feel like magic.

tl;dr

  • Personal data is not "documents." It's records — typed, dated, linked.
  • Records reward precise queries. Embeddings are imprecise by design.
  • Our retrieval is a structured query first, embeddings second, and the user feels both as one experience.
  • Latency dropped 3.4× and answer quality went up. The bug rate went down most of all.

When fuzzy is right

If a user asks "what did I tell my brother about that thing in Lisbon," there is no SQL query that handles that. There's a person, a vague topic, a place, and a long tail of phrasing. Embeddings are the right tool — you want similarity, not equality.

For maybe 20% of our queries, this is the right shape. We use embeddings on iMessage threads, voice transcripts, and freeform notes — anything that's prose-flavored and where the user's question is also prose-flavored.

When precise is right

The other 80% looks more like this: "how much did I spend at superchargers in March?" That's a SQL query. Filter by category, date range, vehicle, then sum.

Vector search is the wrong tool for this. You don't want similar charges — you want exactly that set, summed. You don't even want a model in the loop for the math; you want it for the explanation around the math.

Our hybrid

The system has three retrievers:

  1. Structured. A typed query against the records table. Sub-millisecond. The model writes the query; we audit it before running.
  2. Lexical. Postgres full-text on subject lines, file names, and shortish strings. Fast, predictable, the user can verify it visually.
  3. Semantic. Embeddings, but only over chunks of text-heavy records. Used as a tiebreaker, not the entrée.

The router is a small classifier we trained on 2,300 example questions. It picks one or two of the three retrievers and skips the rest.

The lesson we kept relearning: structured records are easier for the model to reason about than chunks of prose. Building the structure is the work.

The numbers

  • Median latency: 1,840 ms → 540 ms
  • Hallucinated citations: 4.1% → 0.6%
  • "That's exactly right" rating from beta users: 64% → 89%

Takeaway

If you're building on top of personal data, resist the temptation to vectorize everything. The hard, valuable work is converting messy inputs into typed records — once you have those, retrieval is mostly the boring kind of database query, and the model gets to do what it's actually good at: explaining.


Reading on a phone? You should know we built our retrieval evals as a public test set. Open it on GitHub.

Keep reading

More from the blog.