Local RAG

Local RAG application UI
Local RAG application UI
Local RAG UI — file upload sidebar, local model selector (phi4:14b), and search interface.

Local RAG is a fully local AI-powered document processing and search system. A robust Retrieval-Augmented Generation application using Ollama local models and FAISS for efficient document processing and semantic search — designed for private enterprise data that never leaves the network.

Stack

Python, FastAPI, React.js, Ollama, FAISS, Sentence Transformers, LangChain-style retrieval, OCR.

Status

Active — work in progress. Functional end-to-end, with ongoing refinement of features and error handling.

Key features

  • Multi-format document processing — PDF, DOCX, CSV, Excel, XML, and images.
  • Semantic search with content-type awareness — intelligent document understanding.
  • Batch processing for large files and many documents at once.
  • Rate limiting and error handling for reliable API performance.
  • Automatic backup and recovery for data integrity.
  • Query expansion and results reranking for better search relevance.
  • Faceted search — filter and navigate results.
  • OCR support for extracting text from images and PDFs.
  • Progress tracking and webhook notifications for long-running tasks.

Architecture

A two-tier system: a Python FastAPI backend handling document ingestion, chunking, embeddings, FAISS vector storage, and LLM orchestration via Ollama; and a React.js frontend providing the chat interface, file upload, model selection, and search UX. All inference runs locally — no data is sent to external APIs.

The processing pipeline: document loader → chunker → local embedding model → FAISS vector store → retriever → local LLM via Ollama (for example, deepseek-r1:8b or phi4:14b).

System requirements

  • OS: Windows 10/11, macOS, or Linux
  • CPU: 4+ cores recommended
  • RAM: Minimum 8GB, 16GB+ recommended for large documents
  • Storage: 10GB+ free space for application and models
  • GPU: Required, at least 4GB dedicated GPU RAM for model inference

Key learnings

  • Local-first RAG is viable end-to-end with Ollama + FAISS — no cloud dependency required for private enterprise data.
  • Content-type aware chunking and reranking matter more than embedding model choice for mixed document corpora.
  • Batch processing and progress tracking are essential for usable enterprise workflows on large document sets.

Repository

GitHub repo: github.com/PowerAI-Labs/LocalRAG · Licensed under MIT.

Comments

Leave a comment