Project Type: AI Engineering

  • Local RAG

    Local RAG

    Local RAG application UI
    Local RAG UI — file upload sidebar, local model selector (phi4:14b), and search interface.

    Local RAG is a fully local AI-powered document processing and search system. A robust Retrieval-Augmented Generation application using Ollama local models and FAISS for efficient document processing and semantic search — designed for private enterprise data that never leaves the network.

    Stack

    Python, FastAPI, React.js, Ollama, FAISS, Sentence Transformers, LangChain-style retrieval, OCR.

    Status

    Active — work in progress. Functional end-to-end, with ongoing refinement of features and error handling.

    Key features

    • Multi-format document processing — PDF, DOCX, CSV, Excel, XML, and images.
    • Semantic search with content-type awareness — intelligent document understanding.
    • Batch processing for large files and many documents at once.
    • Rate limiting and error handling for reliable API performance.
    • Automatic backup and recovery for data integrity.
    • Query expansion and results reranking for better search relevance.
    • Faceted search — filter and navigate results.
    • OCR support for extracting text from images and PDFs.
    • Progress tracking and webhook notifications for long-running tasks.

    Architecture

    A two-tier system: a Python FastAPI backend handling document ingestion, chunking, embeddings, FAISS vector storage, and LLM orchestration via Ollama; and a React.js frontend providing the chat interface, file upload, model selection, and search UX. All inference runs locally — no data is sent to external APIs.

    The processing pipeline: document loader → chunker → local embedding model → FAISS vector store → retriever → local LLM via Ollama (for example, deepseek-r1:8b or phi4:14b).

    System requirements

    • OS: Windows 10/11, macOS, or Linux
    • CPU: 4+ cores recommended
    • RAM: Minimum 8GB, 16GB+ recommended for large documents
    • Storage: 10GB+ free space for application and models
    • GPU: Required, at least 4GB dedicated GPU RAM for model inference

    Key learnings

    • Local-first RAG is viable end-to-end with Ollama + FAISS — no cloud dependency required for private enterprise data.
    • Content-type aware chunking and reranking matter more than embedding model choice for mixed document corpora.
    • Batch processing and progress tracking are essential for usable enterprise workflows on large document sets.

    Repository

    GitHub repo: github.com/PowerAI-Labs/LocalRAG · Licensed under MIT.