Spectra is a document search console for teams that need to turn messy files into searchable knowledge.
It gives users one place to ingest documents, track indexing progress, inspect stored metadata, and search across indexed content with fast vector lookup. The goal is simple: make private document collections easier to explore without hiding what the system is doing.
- Upload documents and make them searchable.
- See ingestion progress in real time.
- Keep document metadata and vector embeddings together in PostgreSQL with pgvector.
- Search semantically, not only by exact keyword match.
- Inspect documents, chunks, latency, vector counts, and socket events from one dashboard.
- Work locally with a small stack that is easy to understand and extend.
- React dashboard with Overview, Ingest, Documents, Explorer, Search, and Console tabs
- Authenticated document ingestion
- Durable ingestion jobs with status history
- Real background ingestion worker backed by PostgreSQL job claiming
- Server-side upload parsing for text, Markdown, JSON, CSV, and PDF files
- Batch ingestion for multiple files
- Real-time ingestion status with Socket.IO
- Duplicate document detection by content hash
- Chunking, embedding, and pgvector indexing pipeline
- PostgreSQL document, chunk, and embedding storage
- Semantic search with optional metadata filters
- Highlighted matching words in search results
- Document list with delete actions
- Cluster stats for documents, vectors, compression, and latency
- Console view for socket and ingestion events
- pgvector health check and rebuild command
frontend: Vite, React, Material UI dashboardbackend: Express, Socket.io, PostgreSQL pool, pgvector search, ingestion worker
- Copy
.env.exampleto.env - Install dependencies with
npm install - Create database and tables with
npm run db:setup - Run app with
npm run dev
DATABASE_URL: main app database for users, documents, chunks, vectors, jobs, collections, and search audit.LOG_DATABASE_URL: optional observability database for request, job, worker, and error logs. Defaults toDATABASE_URLwhen empty.UPLOAD_CLEANUP_MAX_AGE_HOURS: uploaded temp folder age before cleanup.UPLOAD_CLEANUP_INTERVAL_MS: background cleanup interval.
- Rebuild vectors from PostgreSQL chunks with
npm run vectors:rebuild -w backend - Run only the ingestion worker with
npm run worker -w backend
Spectra can use Turbovec as an optional compressed vector index while PostgreSQL stays the source of truth.
- Install sidecar deps with
pip install -r backend/services/requirements-turbovec.txt - Run the sidecar with
npm run turbovec:sidecar -w backend - Set
VECTOR_SEARCH_BACKEND=turbovec - Keep
TURBOVEC_DIMequal to the embedding size. The current local embedding is128.
When Turbovec is unavailable, search falls back to pgvector.