🛰️ Beacon

Chat with your own files. 100% local, nothing leaves your machine.

Beacon indexes the folders you choose, finds the passages that actually answer your question, and answers using a local LLM through Ollama. Context is optimized by TokenGate so you get sharp answers instead of token dumps.

What it does

Your laptop is full of answers: resumes, contracts, notes, PDFs, code docs. Finding "which file said that?" normally means opening a dozen windows. Cloud tools solve this by uploading your files to someone else's servers.

Beacon keeps everything on your device.

🔒 Private by design. Files, embeddings, the vector DB, and chat history never leave your machine. The only network calls are to your own local Ollama and a one-time model download from Hugging Face.
🎯 Grounded answers. Every reply cites the exact files it used, with inline previews you can click to open.
🧠 Smart context, not token dumps. TokenGate reranks, deduplicates, compresses, and budgets the retrieved chunks so the LLM sees only the most relevant content.
🔍 Full transparency. The built-in TokenGate Insights view shows, per question, what was retrieved, what was kept vs. dropped, tokens saved, and the exact prompt sent to the LLM.
⚡ Agentic mode. For models that support tool calling (e.g. gemma4:e4b), Beacon lets the LLM search files on demand instead of doing a single bulk retrieval.
📊 A/B comparison built-in. Toggle TokenGate on/off per chat to compare against a best-practice baseline RAG (rerank, top-N selection, LangChain stuffing).

How it works

  Your folders
      |  scan (incremental, ignores node_modules/site-packages/...)
      v
  Extract text    TXT / MD / PDF / DOCX / images (OCR optional)
      |  chunk (token-aware, with overlap)
      v
  BGE-M3 embed --> LanceDB  (local vector store)
                        ^
   your question -------+  retrieve top-50 within an optional folder scope
        |
        v
   Relevance gate   cross-encoder check; chitchat is answered directly without file context
        |
        v
   TokenGate.optimize()  OR  Best-practice baseline (rerank, top-N, LangChain)
        |
        v
   Ollama (local LLM)  -->  streamed answer, cited files, full audit

Layer	Technology
Backend API	Python 3.12, FastAPI, uv
Retrieval	LanceDB (vectors) + BGE-M3 embeddings
Reranking	BGE-Reranker-v2-m3 (cross-encoder)
Context optimization	`tokengate` library
Baseline RAG	LangChain (LCEL, ChatOllama)
Local LLM	Ollama
Metadata	SQLite
Frontend	Vite, React, TypeScript, Framer Motion
Desktop (planned)	Tauri

Prerequisites

Requirement	Notes
Python 3.12	Backend pinned to `>=3.12,<3.13` for ML wheel compatibility. Install from python.org or via `pyenv`.
uv	Python env and dependency manager. `pip install uv` or see uv docs.
Node.js >= 20	For the Vite frontend. nodejs.org
Ollama	Local LLM runtime. Install from ollama.com, then pull a model.
NVIDIA GPU + CUDA 12.8	Optional. Speeds up embedding and reranking; CPU fallback is automatic.

First run downloads models. BGE-M3 (embedder, ~600 MB) and BGE-Reranker-v2-m3 (reranker, ~1.1 GB) are fetched from Hugging Face the first time you index or chat. They are cached locally afterward.

Setup and Run

git clone https://github.com/Mario-Vishal/beacon.git
cd beacon

1. Start Ollama and pull a model

ollama pull gemma4:e4b      # recommended default (supports tool-calling + thinking, 128K ctx)
# or: ollama pull llama3.2  # lighter alternative
ollama serve                # or just have the Ollama desktop app running

2. Start the backend

cd backend
uv sync                                          # create .venv and install all Python deps
uv run uvicorn beacon.main:app --port 8000       # API on http://localhost:8000

Verify it's up by opening http://localhost:8000/health. It returns the current GPU/CPU mode and Ollama status.

3. Start the frontend

Open a new terminal:

cd frontend
npm install
npm run dev        # UI on http://localhost:5173

Windows / PowerShell: if npm is blocked by execution policy, use npm.cmd run dev or run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned once.

4. Add a folder and start chatting

Open http://localhost:5173
Click Add folder and pick a folder to index (e.g. ~/Documents)
Wait for indexing to finish. Progress appears in the file explorer.
Ask a question: "Which of my resumes mentions AWS experience?"

After pulling new code: restart the backend (Ctrl+C then uv run uvicorn ...) so the running process picks up the changes.

Settings

Open the Settings overlay (gear icon) to adjust:

Setting	Options	Notes
Ollama model	any model name	default `gemma4:e4b`
GPU mode	auto / cpu_only / force_gpu	`auto` runs the embedder on CPU and the LLM on GPU, which is optimal for 12 GB VRAM
Retrieval mode	auto / gate / always / agentic	`gate` = adaptive (chitchat answered directly); `agentic` = LLM searches files with tools
Strategy	speed / balanced / quality / max_compression	TokenGate optimization preset
Max prompt tokens	integer	overrides the dynamic budget if set
File types	list of extensions	which files get indexed

Settings persist in a local SQLite database under your OS app-data directory. Override the location with BEACON_DATA_DIR.

Features

TokenGate on/off toggle (A/B comparison)

Every chat session has a TokenGate toggle. Turn it off to run the best-practice baseline RAG instead: cross-encoder rerank, top-20 selection, then LangChain stuffing. The Insights view shows both paths so you can directly compare token usage and answer quality.

Agentic mode

When the model supports tool calling (gemma4:e4b does), retrieval_mode=agentic lets the LLM invoke search_files, list_directory, and read_file tools within the indexed boundary. TokenGate optimizes each tool result before it enters the context window.

TokenGate Insights

The dock's TokenGate Insights tab shows, per message:

Tokens in, tokens out, and % saved
Per-stage funnel with blocks in/out at each pipeline stage
Mode badge (TokenGate vs. LangChain Baseline)
Per-question LLM spend in prompt and output tokens

Resumable sessions

Chat history is saved. Click the History (clock) icon to resume a previous session. Citations, toggles, and the full audit are all restored.

Transparent citations

Every answer shows the files it used as clickable chips. Click a chip to preview the file (image inline, PDF/text extracted) and jump to it in the file explorer.

Privacy

Beacon is local-first:

Indexed file contents, embeddings, LanceDB/SQLite databases, and content-bearing logs are in .gitignore and never committed or transmitted.
The only network calls are to Hugging Face for the first-run model download and to your own local Ollama.
No telemetry, no analytics, no cloud sync.

Development

# backend tests
cd backend && uv run pytest              # 140 tests
uv run ruff check src tests             # lint
uv run mypy src                         # type-check (strict)

# frontend build
cd frontend && npm run build            # type-check + production build

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
backend		backend
frontend		frontend
runs		runs
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛰️ Beacon

What it does

How it works

Prerequisites

Setup and Run

1. Start Ollama and pull a model

2. Start the backend

3. Start the frontend

4. Add a folder and start chatting

Settings

Features

TokenGate on/off toggle (A/B comparison)

Agentic mode

TokenGate Insights

Resumable sessions

Transparent citations

Privacy

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛰️ Beacon

What it does

How it works

Prerequisites

Setup and Run

1. Start Ollama and pull a model

2. Start the backend

3. Start the frontend

4. Add a folder and start chatting

Settings

Features

TokenGate on/off toggle (A/B comparison)

Agentic mode

TokenGate Insights

Resumable sessions

Transparent citations

Privacy

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages