Local RAG

A fully private, offline Retrieval-Augmented Generation (RAG) application built with Streamlit, LangChain, PGVector, and Ollama. Upload PDF documents, ask questions in natural language, and receive cited, grounded answers — all without sending a single byte to the cloud.

Features

Category	Details
Streaming responses	LLM output streams token-by-token with a live cursor; no waiting for the full response
Conversation memory	Sliding window of past messages sent to the LLM for coherent follow-up questions
Document ingestion	Upload PDFs, background indexing with auto-refreshing real-time job status panel
Disk-based file storage	PDF files stored on disk (`FILE_STORAGE_FOLDER`), not as DB blobs
Cited answers	Every response includes file name, page number, and excerpt from the source
No-evidence guard	Configurable similarity threshold prevents hallucination when content is irrelevant
Multi-query retrieval	Generates multiple query variants for better recall (toggleable)
Chat management	Create, rename, search, and sort chats; cascade-delete cleans all data
Document management	Re-index or delete individual files with full vector cleanup
Retrieval settings	Per-chat sliders: Top-K chunks, citation limit, multi-query toggle
Export	Download any conversation as a Markdown file
DB connection pooling	Configurable `pool_size`, `max_overflow`, `pool_timeout`, `pool_recycle`
Startup health checks	Validates DB connectivity, Ollama availability, and storage folder on launch
Structured JSON logging	Latency metrics around retrieval, embedding, and generation
Accessibility	WCAG AA contrast ratios, `role="log"` live region, keyboard focus rings, `aria-hidden` on decorative elements
Mobile-friendly	Responsive CSS with `@media` breakpoints for narrow viewports
Docker Compose	One-command stack: pgvector, Ollama (auto-pulls models), and the app
Test suite	pytest + pytest-asyncio covering `AIService` and `FileService`

Models

Purpose	Default model
LLM	`llama3.2`
Embeddings	`nomic-embed-text`

Change both in .env — no code changes needed.

Prerequisites

Python 3.12+
Ollama running locally (http://localhost:11434)
PostgreSQL with the pgvector extension enabled — or use Docker Compose (see below)
pip / venv

Pull required Ollama models

ollama pull llama3.2
ollama pull nomic-embed-text

Enable pgvector in PostgreSQL

CREATE EXTENSION IF NOT EXISTS vector;

Installation

# 1. Clone
git clone https://github.com/dbunt1tled/local-rag.git
cd local-rag

# 2. Create virtual environment
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

Configuration

Copy the example env file and edit it:

cp .env.example .env

Variable	Description	Default
`PG_DSN`	PostgreSQL connection string	`postgresql+psycopg://user:password@localhost:5432/db`
`COLLECTION_NAME`	PGVector collection name	`local-rag`
`LLM_MODEL`	Ollama chat model	`llama3.2`
`TEXT_EMBEDDING_MODEL`	Ollama embedding model	`nomic-embed-text`
`OLLAMA_HOST`	Ollama base URL	`http://localhost:11434`
`FILE_STORAGE_FOLDER`	Directory where uploaded PDFs are saved	`./data/files`
`SIMILARITY_THRESHOLD`	Minimum relevance score (0–1) to include a chunk	`0.30`
`HISTORY_WINDOW`	Number of past message pairs sent to the LLM	`6`
`DB_POOL_SIZE`	SQLAlchemy connection pool size	`5`
`DB_MAX_OVERFLOW`	Max connections above pool size	`10`

Database setup

Run Alembic migrations before first launch:

alembic upgrade head

Docker Compose (recommended)

The easiest way to run the full stack with no manual setup:

cp .env.example .env   # review and adjust if needed
docker compose up -d

This starts:

postgres — pgvector-enabled PostgreSQL 16
ollama — pulls llama3.2 and nomic-embed-text automatically on first boot
app — runs alembic upgrade head then streamlit run Home.py

Open http://localhost:8501 in your browser.

Running

streamlit run Home.py

Open http://localhost:8501 in your browser.

Project structure

local-rag/
├── Home.py                         # Streamlit entrypoint + health checks
├── Dockerfile
├── docker-compose.yml
├── pytest.ini
├── tests/
│   ├── test_ai_service.py          # AIService unit tests
│   └── test_file_service.py        # FileService unit tests
├── app/
│   ├── ai/                         # LLM streaming query, retrieval, citations
│   ├── chat/                       # Chat CRUD, rename, cascade delete
│   ├── file/                       # File upload, background indexing, vector management
│   │   └── model/index_job.py      # Persistent indexing job records
│   └── message/                    # Message storage and streaming rendering
├── internal/
│   ├── config/
│   │   ├── setting.py              # Pydantic settings from .env
│   │   └── logging_config.py       # Structured JSON logging + timed() helper
│   ├── di/container.py             # dependency-injector wiring
│   └── domain/                     # Base repository and entity classes
├── migration/                      # Alembic migrations
│   └── versions/
├── data/files/                     # Uploaded PDFs (created at runtime, git-ignored)
├── alembic.ini
├── pyproject.toml
└── requirements.txt

Architecture overview

User → Streamlit UI
         │
         ├── FileService ──► PyPDFLoader → TextSplitter → PGVector (embeddings)
         │       │                └── PDF written to FILE_STORAGE_FOLDER on disk
         │       └── IndexJobRepository (tracks background jobs in DB)
         │
         └── AIService ──► similarity_search_with_relevance_scores
                  │              (threshold filter → no-hallucination guard)
                  ├── MultiQueryRetriever (optional)
                  ├── MessagesPlaceholder (conversation memory, sliding window)
                  └── ChatOllama.astream() → token chunks → live Streamlit UI
                                              └── Citations appended at end

Running tests

pytest

Contributing

Pull requests are welcome. Please open an issue first to discuss significant changes.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG

Features

Models

Prerequisites

Pull required Ollama models

Enable pgvector in PostgreSQL

Installation

Configuration

Database setup

Docker Compose (recommended)

Running

Project structure

Architecture overview

Running tests

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app		app
internal		internal
migration		migration
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.isort.cfg		.isort.cfg
Dockerfile		Dockerfile
Home.py		Home.py
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Local RAG

Features

Models

Prerequisites

Pull required Ollama models

Enable pgvector in PostgreSQL

Installation

Configuration

Database setup

Docker Compose (recommended)

Running

Project structure

Architecture overview

Running tests

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages