"We don't sell your data" is not a privacy policy
A walkthrough of the architectural decisions we made to ensure we couldn't sell user data even if we wanted to.
For the first six months of building Lossless, every retrieval went through a vector database. That was wrong. Here's what we replaced it with, and how we kept just enough fuzzy to feel like magic.
If a user asks "what did I tell my brother about that thing in Lisbon," there is no SQL query that handles that. There's a person, a vague topic, a place, and a long tail of phrasing. Embeddings are the right tool — you want similarity, not equality.
For maybe 20% of our queries, this is the right shape. We use embeddings on iMessage threads, voice transcripts, and freeform notes — anything that's prose-flavored and where the user's question is also prose-flavored.
The other 80% looks more like this: "how much did I spend at superchargers in March?" That's a SQL query. Filter by category, date range, vehicle, then sum.
Vector search is the wrong tool for this. You don't want similar charges — you want exactly that set, summed. You don't even want a model in the loop for the math; you want it for the explanation around the math.
The system has three retrievers:
The router is a small classifier we trained on 2,300 example questions. It picks one or two of the three retrievers and skips the rest.
The lesson we kept relearning: structured records are easier for the model to reason about than chunks of prose. Building the structure is the work.
If you're building on top of personal data, resist the temptation to vectorize everything. The hard, valuable work is converting messy inputs into typed records — once you have those, retrieval is mostly the boring kind of database query, and the model gets to do what it's actually good at: explaining.
Reading on a phone? You should know we built our retrieval evals as a public test set. Open it on GitHub.
A walkthrough of the architectural decisions we made to ensure we couldn't sell user data even if we wanted to.
Why most personal-data apps don't ship a great delete flow — and what we did about it.
Giving up on regex, then giving up on JSON schemas, then arriving somewhere unexpectedly clean.