Spectra

Spectra is a document search console for teams that need to turn messy files into searchable knowledge.

It gives users one place to ingest documents, track indexing progress, inspect stored metadata, and search across indexed content with fast vector lookup. The goal is simple: make private document collections easier to explore without hiding what the system is doing.

Value

Upload documents and make them searchable.
See ingestion progress in real time.
Keep document metadata and vector embeddings together in PostgreSQL with pgvector.
Search semantically, not only by exact keyword match.
Inspect documents, chunks, latency, vector counts, and socket events from one dashboard.
Work locally with a small stack that is easy to understand and extend.

Features

React dashboard with Overview, Ingest, Documents, Explorer, Search, and Console tabs
Authenticated document ingestion
Durable ingestion jobs with status history
Real background ingestion worker backed by PostgreSQL job claiming
Server-side upload parsing for text, Markdown, JSON, CSV, and PDF files
Batch ingestion for multiple files
Real-time ingestion status with Socket.IO
Duplicate document detection by content hash
Chunking, embedding, and pgvector indexing pipeline
PostgreSQL document, chunk, and embedding storage
Semantic search with optional metadata filters
Highlighted matching words in search results
Document list with delete actions
Cluster stats for documents, vectors, compression, and latency
Console view for socket and ingestion events
pgvector health check and rebuild command

Layout

frontend: Vite, React, Material UI dashboard
backend: Express, Socket.io, PostgreSQL pool, pgvector search, ingestion worker

Quick Setup

Copy .env.example to .env
Install dependencies with npm install
Create database and tables with npm run db:setup
Run app with npm run dev

Environment

DATABASE_URL: main app database for users, documents, chunks, vectors, jobs, collections, and search audit.
LOG_DATABASE_URL: optional observability database for request, job, worker, and error logs. Defaults to DATABASE_URL when empty.
UPLOAD_CLEANUP_MAX_AGE_HOURS: uploaded temp folder age before cleanup.
UPLOAD_CLEANUP_INTERVAL_MS: background cleanup interval.

Maintenance

Rebuild vectors from PostgreSQL chunks with npm run vectors:rebuild -w backend
Run only the ingestion worker with npm run worker -w backend

Turbovec

Spectra can use Turbovec as an optional compressed vector index while PostgreSQL stays the source of truth.

Install sidecar deps with pip install -r backend/services/requirements-turbovec.txt
Run the sidecar with npm run turbovec:sidecar -w backend
Set VECTOR_SEARCH_BACKEND=turbovec
Keep TURBOVEC_DIM equal to the embedding size. The current local embedding is 128.

When Turbovec is unavailable, search falls back to pgvector.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.vscode		.vscode
backend		backend
frontend		frontend
scripts		scripts
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.instructions.md		.instructions.md
.prettierrc.json		.prettierrc.json
IMPROVEMENTS.md		IMPROVEMENTS.md
README.md		README.md
TODO.txt		TODO.txt
eslint_output.txt		eslint_output.txt
output.txt		output.txt
package-lock.json		package-lock.json
package.json		package.json
test_eslint.js		test_eslint.js
test_username_immutable.html		test_username_immutable.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spectra

Value

Features

Layout

Quick Setup

Environment

Maintenance

Turbovec

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spectra

Value

Features

Layout

Quick Setup

Environment

Maintenance

Turbovec

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages