Skip to content

Mario-Vishal/beacon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

58 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›°οΈ Beacon

Chat with your own files. 100% local, nothing leaves your machine.

Beacon indexes the folders you choose, finds the passages that actually answer your question, and answers using a local LLM through Ollama. Context is optimized by TokenGate so you get sharp answers instead of token dumps.

status python node license


What it does

Your laptop is full of answers: resumes, contracts, notes, PDFs, code docs. Finding "which file said that?" normally means opening a dozen windows. Cloud tools solve this by uploading your files to someone else's servers.

Beacon keeps everything on your device.

  • πŸ”’ Private by design. Files, embeddings, the vector DB, and chat history never leave your machine. The only network calls are to your own local Ollama and a one-time model download from Hugging Face.
  • 🎯 Grounded answers. Every reply cites the exact files it used, with inline previews you can click to open.
  • 🧠 Smart context, not token dumps. TokenGate reranks, deduplicates, compresses, and budgets the retrieved chunks so the LLM sees only the most relevant content.
  • πŸ” Full transparency. The built-in TokenGate Insights view shows, per question, what was retrieved, what was kept vs. dropped, tokens saved, and the exact prompt sent to the LLM.
  • ⚑ Agentic mode. For models that support tool calling (e.g. gemma4:e4b), Beacon lets the LLM search files on demand instead of doing a single bulk retrieval.
  • πŸ“Š A/B comparison built-in. Toggle TokenGate on/off per chat to compare against a best-practice baseline RAG (rerank, top-N selection, LangChain stuffing).

How it works

  Your folders
      |  scan (incremental, ignores node_modules/site-packages/...)
      v
  Extract text    TXT / MD / PDF / DOCX / images (OCR optional)
      |  chunk (token-aware, with overlap)
      v
  BGE-M3 embed --> LanceDB  (local vector store)
                        ^
   your question -------+  retrieve top-50 within an optional folder scope
        |
        v
   Relevance gate   cross-encoder check; chitchat is answered directly without file context
        |
        v
   TokenGate.optimize()  OR  Best-practice baseline (rerank, top-N, LangChain)
        |
        v
   Ollama (local LLM)  -->  streamed answer, cited files, full audit
Layer Technology
Backend API Python 3.12, FastAPI, uv
Retrieval LanceDB (vectors) + BGE-M3 embeddings
Reranking BGE-Reranker-v2-m3 (cross-encoder)
Context optimization tokengate library
Baseline RAG LangChain (LCEL, ChatOllama)
Local LLM Ollama
Metadata SQLite
Frontend Vite, React, TypeScript, Framer Motion
Desktop (planned) Tauri

Prerequisites

Requirement Notes
Python 3.12 Backend pinned to >=3.12,<3.13 for ML wheel compatibility. Install from python.org or via pyenv.
uv Python env and dependency manager. pip install uv or see uv docs.
Node.js >= 20 For the Vite frontend. nodejs.org
Ollama Local LLM runtime. Install from ollama.com, then pull a model.
NVIDIA GPU + CUDA 12.8 Optional. Speeds up embedding and reranking; CPU fallback is automatic.

First run downloads models. BGE-M3 (embedder, ~600 MB) and BGE-Reranker-v2-m3 (reranker, ~1.1 GB) are fetched from Hugging Face the first time you index or chat. They are cached locally afterward.


Setup and Run

git clone https://github.com/Mario-Vishal/beacon.git
cd beacon

1. Start Ollama and pull a model

ollama pull gemma4:e4b      # recommended default (supports tool-calling + thinking, 128K ctx)
# or: ollama pull llama3.2  # lighter alternative
ollama serve                # or just have the Ollama desktop app running

2. Start the backend

cd backend
uv sync                                          # create .venv and install all Python deps
uv run uvicorn beacon.main:app --port 8000       # API on http://localhost:8000

Verify it's up by opening http://localhost:8000/health. It returns the current GPU/CPU mode and Ollama status.

3. Start the frontend

Open a new terminal:

cd frontend
npm install
npm run dev        # UI on http://localhost:5173

Windows / PowerShell: if npm is blocked by execution policy, use npm.cmd run dev or run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned once.

4. Add a folder and start chatting

  1. Open http://localhost:5173
  2. Click Add folder and pick a folder to index (e.g. ~/Documents)
  3. Wait for indexing to finish. Progress appears in the file explorer.
  4. Ask a question: "Which of my resumes mentions AWS experience?"

After pulling new code: restart the backend (Ctrl+C then uv run uvicorn ...) so the running process picks up the changes.


Settings

Open the Settings overlay (gear icon) to adjust:

Setting Options Notes
Ollama model any model name default gemma4:e4b
GPU mode auto / cpu_only / force_gpu auto runs the embedder on CPU and the LLM on GPU, which is optimal for 12 GB VRAM
Retrieval mode auto / gate / always / agentic gate = adaptive (chitchat answered directly); agentic = LLM searches files with tools
Strategy speed / balanced / quality / max_compression TokenGate optimization preset
Max prompt tokens integer overrides the dynamic budget if set
File types list of extensions which files get indexed

Settings persist in a local SQLite database under your OS app-data directory. Override the location with BEACON_DATA_DIR.


Features

TokenGate on/off toggle (A/B comparison)

Every chat session has a TokenGate toggle. Turn it off to run the best-practice baseline RAG instead: cross-encoder rerank, top-20 selection, then LangChain stuffing. The Insights view shows both paths so you can directly compare token usage and answer quality.

Agentic mode

When the model supports tool calling (gemma4:e4b does), retrieval_mode=agentic lets the LLM invoke search_files, list_directory, and read_file tools within the indexed boundary. TokenGate optimizes each tool result before it enters the context window.

TokenGate Insights

The dock's TokenGate Insights tab shows, per message:

  • Tokens in, tokens out, and % saved
  • Per-stage funnel with blocks in/out at each pipeline stage
  • Mode badge (TokenGate vs. LangChain Baseline)
  • Per-question LLM spend in prompt and output tokens

Resumable sessions

Chat history is saved. Click the History (clock) icon to resume a previous session. Citations, toggles, and the full audit are all restored.

Transparent citations

Every answer shows the files it used as clickable chips. Click a chip to preview the file (image inline, PDF/text extracted) and jump to it in the file explorer.


Privacy

Beacon is local-first:

  • Indexed file contents, embeddings, LanceDB/SQLite databases, and content-bearing logs are in .gitignore and never committed or transmitted.
  • The only network calls are to Hugging Face for the first-run model download and to your own local Ollama.
  • No telemetry, no analytics, no cloud sync.

Development

# backend tests
cd backend && uv run pytest              # 140 tests
uv run ruff check src tests             # lint
uv run mypy src                         # type-check (strict)

# frontend build
cd frontend && npm run build            # type-check + production build

License

MIT

About

Local-first desktop RAG app for your own files: private, on-device document search and chat with local LLMs, using TokenGate for auditable context optimization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors