Imagine you're on a customer call and they ask a detailed question about a feature that launched last month. The answer is in your product changelog, but you'd have to minimize the call, find the document, search through it, and read the relevant section - all while the customer waits.

Raven's RAG (Retrieval-Augmented Generation) system solves this. Upload your documents to a mode, and the AI automatically searches them during conversations, including the most relevant sections in its responses. The entire pipeline runs locally on your machine.

How it works end-to-end

The RAG pipeline has four stages: parsing, chunking, embedding, and retrieval.

Stage 1: Parsing. When you upload a document to a mode, Raven first extracts the text content. It supports four formats:

  • PDF - parsed using the pdf-parse library, which extracts text from all pages

  • DOCX - parsed using mammoth, which extracts raw text from Word documents

  • TXT and Markdown - read directly as plain text

The parsed text is the raw content of the document, stripped of formatting.

Stage 2: Chunking. The extracted text is split into overlapping chunks. Rather than feeding the entire document to the AI (which might exceed context limits), we break it into manageable pieces of a few hundred words each, with overlap between adjacent chunks to preserve context at the boundaries.

The overlap is important. If a relevant piece of information spans the boundary between two chunks, the overlap ensures that at least one chunk contains the complete information.

Stage 3: Embedding. Each chunk is converted into a vector embedding - a numerical representation that captures the semantic meaning of the text. Raven uses the all-MiniLM-L6-v2 model via the @xenova/transformers library, which runs the model locally in your Electron app.

The first time you upload a document, the embedding model downloads (about 30MB). After that, it's cached locally and runs without any network request.

Each chunk's embedding is stored in the local SQLite database alongside the chunk text, the source file name, and the mode it belongs to. No external vector database is needed.

Stage 4: Retrieval. When you ask the AI for help during a meeting, Raven takes your request (or, if you used a quick action, the recent transcript) and embeds it using the same model. It then computes cosine similarity between your query embedding and all stored chunk embeddings for the active mode.

The chunks with the highest similarity scores - the ones most semantically related to your question — are selected and included in the AI prompt as additional context. There's a token budget to prevent the context from getting too large.

What this looks like in practice

Say you're a sales rep with a "Sales" mode. You've uploaded:

  • Your product's feature comparison sheet (PDF)

  • Your pricing guide (DOCX)

  • Your most common objection handling scripts (TXT)

During a call, the prospect says: "How does your pricing compare to Competitor X for teams under 50 people?"

You press Cmd+Enter or use the "What should I say?" quick action. Raven:

  1. Takes the recent conversation as the query

  2. Embeds it locally

  3. Searches your uploaded documents

  4. Finds the relevant sections from your pricing guide and comparison sheet

  5. Passes them to Claude or OpenAI along with the full conversation transcript

  6. Returns a response that references the specific pricing tiers and competitive positioning from your own documents

You get an accurate, grounded answer in seconds - no searching, no switching windows, no "let me get back to you on that."

Why local matters

The entire RAG pipeline runs on your machine. Your documents are never uploaded to any server. The embeddings are generated locally. The similarity search happens locally. The only external call is to the AI provider (Anthropic or OpenAI) at the final step, where the retrieved chunks are included in the prompt.

This means you can upload sensitive documents - pricing sheets, internal playbooks, customer data, legal agreements - without worrying about where they're stored or who has access. They're in your local SQLite database and nowhere else.

Technical details

The full RAG implementation lives in src/main/services/ragService.ts - about 240 lines covering parsing, chunking, embedding, storage, and retrieval. It's integrated into the AI service in src/main/claudeService.ts, where retrieved chunks are injected into the system prompt.

The embedding model (all-MiniLM-L6-v2) produces 384-dimensional vectors. Cosine similarity search is brute-force - we compute similarity against every chunk for the active mode. This is fast enough for typical document collections (hundreds of chunks search in milliseconds), but wouldn't scale to millions of chunks. For Raven's use case - personal documents uploaded to specific modes - it's more than sufficient.

Chaitanya Laxman

Product

Share