Blog
Engineering
How Raven Searches Your Documents During Live Calls
Raven includes a fully local RAG pipeline - upload documents, embed them on your machine, and the AI automatically references the most relevant sections during conversations. Here's how it works.

Chaitanya Laxman
Product
Mar 2, 2026

Imagine you're on a customer call and they ask a detailed question about a feature that launched last month. The answer is in your product changelog, but you'd have to minimize the call, find the document, search through it, and read the relevant section - all while the customer waits.
Raven's RAG (Retrieval-Augmented Generation) system solves this. Upload your documents to a mode, and the AI automatically searches them during conversations, including the most relevant sections in its responses. The entire pipeline runs locally on your machine.
How it works end-to-end
The RAG pipeline has four stages: parsing, chunking, embedding, and retrieval.
Stage 1: Parsing. When you upload a document to a mode, Raven first extracts the text content. It supports four formats:
PDF - parsed using the
pdf-parselibrary, which extracts text from all pagesDOCX - parsed using
mammoth, which extracts raw text from Word documentsTXT and Markdown - read directly as plain text
The parsed text is the raw content of the document, stripped of formatting.
Stage 2: Chunking. The extracted text is split into overlapping chunks. Rather than feeding the entire document to the AI (which might exceed context limits), we break it into manageable pieces of a few hundred words each, with overlap between adjacent chunks to preserve context at the boundaries.
The overlap is important. If a relevant piece of information spans the boundary between two chunks, the overlap ensures that at least one chunk contains the complete information.
Stage 3: Embedding. Each chunk is converted into a vector embedding - a numerical representation that captures the semantic meaning of the text. Raven uses the all-MiniLM-L6-v2 model via the @xenova/transformers library, which runs the model locally in your Electron app.
The first time you upload a document, the embedding model downloads (about 30MB). After that, it's cached locally and runs without any network request.
Each chunk's embedding is stored in the local SQLite database alongside the chunk text, the source file name, and the mode it belongs to. No external vector database is needed.
Stage 4: Retrieval. When you ask the AI for help during a meeting, Raven takes your request (or, if you used a quick action, the recent transcript) and embeds it using the same model. It then computes cosine similarity between your query embedding and all stored chunk embeddings for the active mode.
The chunks with the highest similarity scores - the ones most semantically related to your question — are selected and included in the AI prompt as additional context. There's a token budget to prevent the context from getting too large.
What this looks like in practice
Say you're a sales rep with a "Sales" mode. You've uploaded:
Your product's feature comparison sheet (PDF)
Your pricing guide (DOCX)
Your most common objection handling scripts (TXT)
During a call, the prospect says: "How does your pricing compare to Competitor X for teams under 50 people?"
You press Cmd+Enter or use the "What should I say?" quick action. Raven:
Takes the recent conversation as the query
Embeds it locally
Searches your uploaded documents
Finds the relevant sections from your pricing guide and comparison sheet
Passes them to Claude or OpenAI along with the full conversation transcript
Returns a response that references the specific pricing tiers and competitive positioning from your own documents
You get an accurate, grounded answer in seconds - no searching, no switching windows, no "let me get back to you on that."
Why local matters
The entire RAG pipeline runs on your machine. Your documents are never uploaded to any server. The embeddings are generated locally. The similarity search happens locally. The only external call is to the AI provider (Anthropic or OpenAI) at the final step, where the retrieved chunks are included in the prompt.
This means you can upload sensitive documents - pricing sheets, internal playbooks, customer data, legal agreements - without worrying about where they're stored or who has access. They're in your local SQLite database and nowhere else.
Technical details
The full RAG implementation lives in src/main/services/ragService.ts - about 240 lines covering parsing, chunking, embedding, storage, and retrieval. It's integrated into the AI service in src/main/claudeService.ts, where retrieved chunks are injected into the system prompt.
The embedding model (all-MiniLM-L6-v2) produces 384-dimensional vectors. Cosine similarity search is brute-force - we compute similarity against every chunk for the active mode. This is fast enough for typical document collections (hundreds of chunks search in milliseconds), but wouldn't scale to millions of chunks. For Raven's use case - personal documents uploaded to specific modes - it's more than sufficient.

Chaitanya Laxman
Product
Share


