Home  /  Platform  /  AI Pipeline

Platform · Raw input → sealed record

How a raw file becomes a record that survives discovery.

The connector is the easy part. The hard part is everything that happens next — to every email, PDF, photo, statement, telemetry stream, and voice note you let in. Seven stages, each one inspectable, each one reproducible. The pipeline is what turns a Tesla session, a 14-line HELOC statement, or a parking-lot voice note into a typed record the Provenance Ledger will sign and keep forever.

See it work in Finance Read the next pillar → Connectors
MULTIMODAL INGESTION PIPELINE 31 stages

OCR, diarization, statement parsing, entity extraction, linking, and embedding — every record made legible before it lands in the model.

OCR
Diarization
Parsing
Extraction
Linking
Embedding
Classification
Validation
Indexing
Why this matters

A file you can't find isn't a record. It's just clutter.

Most of your life already lives in files — but a PDF in a folder doesn't know what it is, who it's about, or when it matters. The pipeline is the part of Lossless that turns a pile of documents into something you can actually ask questions of.

  • ✓ It runs on its own — connect an account and walk away
  • ✓ It reads what people can read: scans, handwriting, audio, video, photos
  • ✓ It files every record into 19 domains and 130+ record types
  • ✓ It writes a plain-language summary for every single record
  • ✓ It never edits or deletes the original — only ever reads
The seven stages

Open any stage. See what's actually happening.

Under the hood there are 51 individual steps across 7 stages. You don't need to think about any of them — but you should be able to see them. Tap a stage to look inside.

Nothing sits in a folder waiting to be noticed. The instant a file comes in — from your inbox, your drive, an upload, a voice note — Lossless gives it a stable address and opens a record for it. From here on, it can't get lost.

  • Given a permanent, addressable home
  • A record of its own is opened

A scanned receipt isn't text until someone reads it. Lossless reads everything: the words on a page, the words inside a photo, what's spoken in an audio note, what happens in a video. If it's in another language, it's translated. It even reads the quiet details — when a photo was taken, where, and on what device — and picks out the people, businesses, amounts, orders, and dates along the way.

  • Printed text & handwriting (OCR)
  • Images inside documents
  • Audio transcribed
  • Video transcribed & narrated
  • Photos described
  • Anything translated to English
  • When & where it was created
  • The file's own metadata
  • People mentioned
  • Businesses & organizations
  • Amounts & money
  • Orders & receipts
  • Events & dates
  • Anything you've taught it to look for

Now the file is understood, not just transcribed — what it's about, who it concerns, why it matters, and what (if anything) you need to do about it. Lossless quietly connects the new file to the ones it belongs with: the confirmation email, the earlier statement, the right person, the right property. The pile starts becoming a story.

  • What it's about
  • Key quotes pulled out
  • Action items spotted
  • Routed to the right handler
  • Linked to related files
  • Tied to the right people
  • Attributed to the right property
  • Patterns & behavior noticed
  • Importance scored
  • Sorted into clean, structured fields

Every record gets a short, plain-language summary and a one-line brief — the kind of thing you'd jot on a sticky note. It's categorized — filed into the taxonomy of domains, document families, and spend categories. And if something's missing or thin, Lossless goes back and fills the gap rather than leaving a hole.

  • Filed into a category
  • Classified by document type
  • Gaps healed
  • A plain-language summary
  • A one-line brief
  • How relevant it is to you

A record is only useful if it shows up when you need it. Lossless tags each one with the topics and the bigger themes it belongs to — drawn from a three-tier map of your life that the system builds for you — so a question three years from now still finds the answer.

  • Topics extracted
  • Themes assigned

This is the part the name is about. The finished record is kept in several places at once, made searchable by meaning rather than just keywords, woven into the graph of people and places and things you own, and sealed with its provenance — payload, origin, seal. The original file is never touched. Nothing gets lost.

  • Searchable by meaning
  • Stored in the database
  • Original kept in cloud storage
  • Added to the vector store
  • Woven into your entity graph
  • Metadata written back
  • Sealed with provenance

Before Lossless calls a record done, one more pass looks it over — checking for anything thin, missing, or off — and fixes it. The bar is simple: would this hold up if you actually had to rely on it?

  • A final quality review & gap repair

7 STAGES · 51 STEPS · RUNS IN THE BACKGROUND · YOU NEVER PRESS A BUTTON

Inside the app

You can watch it work.

Open any record — or any upload batch — and there's an AI Pipeline tab. Every step, timed. Every file in a batch, mapped. Nothing hidden behind a spinner.

lossless · records · evidence_photo_001.jpg
Live

The Overview tab shows the AI summary, the one-line brief, and every extracted entity — all six of which are covered further down this page.

68%
Elapsed
0:41
Remaining
0:19
Steps
35/51
Stage
3 / 7
Currently running:  Topic Extraction — AI Analysis
📥 Ingestion 2/2
🔍 Extraction 16/16
🧠 AI Analysis 7/15
Scores Extraction1.4s
Quotes Extraction2.1s
Action Items Extraction1.8s
Document Routing0.9s
Related Files Discovery3.2s
Person-Document Linking2.0s
Behavior Pattern Analysis2.7s
Topic Extractionrunning…
·AI Structured Extraction
Enrichment & Summarization 0/6
🔗 AI Search Readiness 0/2
💾 Persistence 0/9
Quality Assurance 0/1
186
Complete
49
Processing
0
Pending
5
Needs review

File completeness heatmap — 240 files in this batch

Complete Processing Pending Needs review

Every cell is one file. Hover to inspect. A batch of years of paperwork runs the whole 51-step pipeline, file by file — and tells you exactly which five need a human glance.

The Entities tab lists every person, business, place, amount, date, topic, and action item the pipeline pulled — each one tappable, each one linked back to the source.

Follow one file

Pick something ordinary. Watch it go through.

The pipeline treats a voice memo and a nine-page bank statement with the same care. Choose one and see.

Watch the pipeline handle
  • an insurance renewal PDF
  • a voice memo from the car
  • a bank statement PDF
  • a photo of a receipt
The taxonomy

Every record gets a place.

"Categorized" isn't a vague promise here. There's a real, structured taxonomy underneath — domains, record types, document families, and a deep spend map — and every file is routed into it.

Step 1
A file arrives
Any type — email, PDF, photo, statement, voice note.
"scan_0007.pdf"
Step 2
Routed to a domain
One of 19 life domains — the broad area it belongs to.
→ Vehicles
Step 3
Typed as a record
One of 130+ record types — what kind of thing it is.
→ Vehicle Insurance
Step 4
Filed by category
Document family + spend category, down to the sub-category.
→ Vehicle Expenses → Vehicle insurance
Step 5
Tagged & linked
Entities, topics and themes — so it surfaces on demand.
→ people · amounts · "Auto & Insurance"

19 domains — the broad areas of a life

Every record belongs to one. Together they hold 130+ record types and 25+ sub-types.

Finance & Banking
20 record types
Accounts, credit cards, statements, transactions, loans, tax documents.
Properties & Rentals
25 record types
Units, leases, bookings, guests, utility bills, P&L reports.
Vehicles
10 record types
Trips, tolls, tickets, service history, mileage, insurance, claims.
Legal & Divorce
12 record types
Cases, disclosures, custody arrangements, support orders, chronologies.
People & Relationships
6 record types
People, contacts, relationships, life events, promises, grievances.
Email & Messaging
5 record types
Emails, text messages, notes, unified messages, threads.
Voice
3 record types
Voice sessions, audio recordings, voice profiles.
Calendar & Events
3 record types
Calendar events, universal events, trips.
Documents & Records
4 record types
Records, upload batches, quotes, extracted entities.
Trips & Travel
8 record types
Flights, hotels, car rentals, itineraries, travel documents.
Health & Medical
6 record types
Medical records, prescriptions, evaluations, insurance policies.
E-Commerce & Orders
4 record types
Vendors, orders, order items, receipts.

+ Memory & Knowledge · Projects & Action Items · Topics & Pulse · Photos & Media · AI Chat · Device Sync · System — 19 domains in all

39 document families — what kind of document it is

A second axis, running across the domains: the document type, grouped into eight families.

Financial 10
    Bank statements · Credit-card statements · Tax documents · Investment statements · Bills · Receipts · Invoices · Collection notices · Order & return receipts
Personal 8
    Text messages · Emails · Voice recordings · Voicemails · Social media · Photographs · Video recordings · Screenshots
Medical 6
    Medical records · Psychiatric evaluations · Therapy notes · Prescription records · Medical insurance policies · Hospital records
Legal 5
    Police reports · Court documents · Restraining orders · Motions & filings · Legal correspondence
Rental management 5
    Property documents · Rental income records · Maintenance & repairs · HOA documents · Insurance policies & claims
Childcare 4
    School records · Childcare documents · Custody schedules · Child support records
Other 3
    Witness statements · Timeline documents · Unknown / unclassified
Plus file formats 18
    PDF · Word · Sheets · Slides · Photos · Screenshots · iPhone messages · Audio · Video · Code · Archives

The spend map — 21 categories, 100+ sub-categories

When a record involves money, it's filed down to the leaf. Tap a category to see how deep it goes.

GasChargingParkingTollsRegistrationServiceOil changeTiresBrake replacementRepairsVehicle cleaningTowingVehicle insuranceParking ticketsSpeeding ticketsVehicle purchaseVehicle leaseVehicle financingVehicle accessories
FurnishingsSmart home devicesFurnitureFixturesLarge applianceKitchen cabinetryBathroom tileworkPlumbingElectricalHVAC systemsPaintingDrywallFraming & lumberRoofFlooringRenovationDemolitionPermitsContractor laborInterior designArchitectureStaging
CleaningRepairsListings feesLegal servicesCPA servicesMarketing & adsGuest toiletriesBedding & towelsCleaning suppliesGuest giftsDecorKitchenwareSmart home devicesWi-Fi devicesUtilitiesInternetMortgage componentsHELOCHOA feesProperty insuranceSubscriptions
Payroll depositPayout depositAccount transferCash withdrawalATM withdrawalCredit card paymentMortgage paymentHELOC paymentDigital wallet fundingInterest feeOverdraft feeLate feeWire feeService feeProcessing feeManagement feeDividend creditInterest creditReversalClawback
MoviesShowsTheaterConcertsFestivalsComedySport eventsTheme parksZoos & aquariumsMuseums & galleriesArcades & recreationCasinos & gamblingNightclubsRavesPartiesBarsEvent tickets & fees
DentalPharmacySupplementsVisionMedical insuranceMedical appointmentMedical procedureMedical suppliesLabworkSkin treatmentsAddiction treatmentTherapySpaMassageYoga classGym membershipPersonal training

+ Groceries & Food · Housing · Digital Services · Shopping & Retail · Travel · Transit · Education · Childcare · Charity & Gifts · Professional Services · Beauty & Personal Care — 21 in all

What you get back

Six things every record carries — that the raw file never did.

The plain-language summary

A short, human paragraph that explains what the record is — readable in about five seconds, no jargon, no skimming a PDF.

The one-line brief

An entity-first headline with the significant stuff flagged — the deadline, the amount, the thing you'd actually want to know first.

The right shelf

Routed into the taxonomy — a domain, a record type, a document family, and a spend category down to the sub-category.

The entity graph

Every person, business, place, amount, and date it touched — linked, so you can walk from a person to a property to an account.

Themes & topics

The throughlines of your life, tagged onto the record across a three-tier topic map — so a question years later still surfaces it.

The provenance seal

Payload, origin, seal. Every finished record carries proof of where it came from — so any agent you authorize can trust it.

By the numbers

One pipeline. Every kind of file you own.

7stages · 51 steps
19domains · 130+ record types
39document families
100+spend sub-categories
"I connected eighteen years of Gmail and a folder of scanned paperwork I'd been avoiding. A day later it was all just… records — summarized, dated, filed under the right category. I didn't lift a finger. The pipeline did the part I'd been dreading for a decade."
— Beta user · two-Tesla household · San Francisco

"Reproducibility at the parser level is the part the security auditor actually cares about — re-run the source, re-derive the record, the signature still verifies."

— What the technical evaluator writes in their report

See the pipeline doing real work.

Connect one account. Watch a few years of statements, receipts, and telemetry turn into typed records the ledger will sign. Reads, never writes.

See it work in Finance Read the next pillar → Connectors
Continue the architecture tour

You've seen the kiln. Now see what feeds it.

Next pillar: the schema-aware connectors that hand the pipeline its raw material — Gmail receipts, Plaid statements, Tesla telemetry, iMessage threads — each one parsed into a typed record, not a blob.

Next pillar → Connectors Voice Records Provenance Ledger ← Back to Overview