LocalRAG is a fully local Retrieval-Augmented Generation application I built to answer one question: how much of a useful enterprise RAG can you run without sending a single byte to a cloud LLM?
The problem
Most “build a chatbot over your documents” tutorials assume an OpenAI key, a managed vector database and a cloud orchestrator. That’s fine for prototypes — and a dead end the moment you talk to a customer in regulated banking, healthcare or government. They want answers on their data, on their hardware, with no egress.
The shape of the solution
LocalRAG uses local Ollama models for both embeddings and generation, FAISS for the vector index, and a content-type-aware ingestion pipeline that handles PDF, DOCX, CSV, Excel, XML and images. Everything runs on a laptop. The full demo is on YouTube.
- Ingestion: multi-format extractors that preserve enough structure to chunk intelligently — tables stay together, lists stay together, headings become metadata.
- Indexing: FAISS index with content-type tags so retrieval can prefer the right shape of content for the question.
- Retrieval: semantic top-k with rate-limited retries and a simple fallback when a model is overloaded.
- Generation: a local Ollama model with grounded prompts and source citations.
What I’d do differently next time
Two things. First, evaluation should be a first-class subsystem from day one, not bolted on later — even a small golden-question set saves you from regression panic during refactors. Second, content-type awareness is more important than fancy reranking; a boring extractor that respects document structure beats a clever reranker that received bad chunks.
Repo: github.com/PowerAI-Labs/LocalRAG. Feedback and PRs welcome.

Leave a comment