Local RAG

Written by

Local RAG UI — file upload sidebar, local model selector (phi4:14b), and search interface.

Local RAG is a fully local AI-powered document processing and search system. A robust Retrieval-Augmented Generation application using Ollama local models and FAISS for efficient document processing and semantic search — designed for private enterprise data that never leaves the network.

Stack

Python, FastAPI, React.js, Ollama, FAISS, Sentence Transformers, LangChain-style retrieval, OCR.

Status

Active — work in progress. Functional end-to-end, with ongoing refinement of features and error handling.

Key features

Multi-format document processing — PDF, DOCX, CSV, Excel, XML, and images.
Semantic search with content-type awareness — intelligent document understanding.
Batch processing for large files and many documents at once.
Rate limiting and error handling for reliable API performance.
Automatic backup and recovery for data integrity.
Query expansion and results reranking for better search relevance.
Faceted search — filter and navigate results.
OCR support for extracting text from images and PDFs.
Progress tracking and webhook notifications for long-running tasks.

Architecture

A two-tier system: a Python FastAPI backend handling document ingestion, chunking, embeddings, FAISS vector storage, and LLM orchestration via Ollama; and a React.js frontend providing the chat interface, file upload, model selection, and search UX. All inference runs locally — no data is sent to external APIs.

The processing pipeline: document loader → chunker → local embedding model → FAISS vector store → retriever → local LLM via Ollama (for example, deepseek-r1:8b or phi4:14b).

System requirements

OS: Windows 10/11, macOS, or Linux
CPU: 4+ cores recommended
RAM: Minimum 8GB, 16GB+ recommended for large documents
Storage: 10GB+ free space for application and models
GPU: Required, at least 4GB dedicated GPU RAM for model inference

Key learnings

Local-first RAG is viable end-to-end with Ollama + FAISS — no cloud dependency required for private enterprise data.
Content-type aware chunking and reranking matter more than embedding model choice for mixed document corpora.
Batch processing and progress tracking are essential for usable enterprise workflows on large document sets.

Repository

GitHub repo: github.com/PowerAI-Labs/LocalRAG · Licensed under MIT.

Local RAG

Stack

Status

Key features

Architecture

System requirements

Key learnings

Repository

Share this:

Comments

Leave a comment Cancel reply

More posts

Deploying, evaluating, and calling models in Microsoft Foundry: a production guide for architects

The case of the 6,000 orphaned contacts: debugging GAB dual-write in Dynamics 365

Copilot Cowork: the agent that does the work — and the extensibility model architects should actually study

Microsoft IQ: the intelligence layer your agents inherit — and what it actually changes for enterprise AI builders