Skip to content

Add configurable embedding provider support#134

Open
100yenadmin wants to merge 1 commit intogarrytan:masterfrom
100yenadmin:sub/voyage-embedding-provider
Open

Add configurable embedding provider support#134
100yenadmin wants to merge 1 commit intogarrytan:masterfrom
100yenadmin:sub/voyage-embedding-provider

Conversation

@100yenadmin
Copy link
Copy Markdown

Summary

  • make embedding provider, model, and dimensions configurable via env vars
  • preserve the current OpenAI defaults while adding Voyage AI support
  • add minimal docs and focused tests for the new config path

Validation

  • bun install
  • bun test test/embed.test.ts

Closes #133

Copilot AI review requested due to automatic review settings April 15, 2026 10:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an env-driven configuration layer to the embedding subsystem so deployments can switch embedding provider/model/dimensions (defaulting to the current OpenAI setup) and records the embedding model used when writing chunk updates.

Changes:

  • Introduce getEmbeddingConfig() and wire embedBatch to support openai (default) and voyage providers.
  • Persist the configured embedding model onto chunk updates in the gbrain embed command.
  • Add docs and tests covering the new configuration path.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/embed.test.ts Adds unit tests for embedding config defaults/overrides and extends env cleanup.
src/core/embedding.ts Implements env-driven embedding config and adds Voyage embeddings implementation.
src/commands/embed.ts Records embedding model on chunk upserts during embedding runs.
README.md Notes embedding provider/model/dimensions env vars in CLI help.
INSTALL_FOR_AGENTS.md Documents optional embedding provider overrides and Voyage API key.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/core/embedding.ts
Comment on lines +37 to +55
export function getEmbeddingConfig(): EmbeddingConfig {
const providerRaw = (process.env.EMBEDDING_PROVIDER || DEFAULT_PROVIDER).toLowerCase();
const model = process.env.EMBEDDING_MODEL || DEFAULT_MODEL;
const dimensionsRaw = process.env.EMBEDDING_DIMENSIONS;
const dimensions = dimensionsRaw ? parseInt(dimensionsRaw, 10) : DEFAULT_DIMENSIONS;

if (providerRaw !== 'openai' && providerRaw !== 'voyage') {
throw new Error(`Unsupported embedding provider: ${providerRaw}. Expected openai or voyage.`);
}

if (dimensionsRaw && Number.isNaN(dimensions)) {
throw new Error(`Invalid EMBEDDING_DIMENSIONS: ${dimensionsRaw}`);
}
return client;

return {
provider: providerRaw,
model,
dimensions,
};
Comment thread src/commands/embed.ts
Comment on lines 72 to 79
const updated: ChunkInput[] = chunks.map(c => ({
chunk_index: c.chunk_index,
chunk_text: c.chunk_text,
chunk_source: c.chunk_source,
embedding: embeddingMap.get(c.chunk_index),
model: embeddingConfig.model,
token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
}));
Comment thread src/commands/embed.ts
Comment on lines 122 to 129
const updated: ChunkInput[] = chunks.map(c => ({
chunk_index: c.chunk_index,
chunk_text: c.chunk_text,
chunk_source: c.chunk_source,
embedding: embeddingMap.get(c.chunk_index) ?? undefined,
model: embeddingConfig.model,
token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
}));
Comment thread test/embed.test.ts
Comment on lines +2 to 3
import { getEmbeddingConfig } from '../src/core/embedding.ts';
import type { BrainEngine } from '../src/core/engine.ts';
Comment thread src/core/embedding.ts
Comment on lines +38 to +42
const providerRaw = (process.env.EMBEDDING_PROVIDER || DEFAULT_PROVIDER).toLowerCase();
const model = process.env.EMBEDDING_MODEL || DEFAULT_MODEL;
const dimensionsRaw = process.env.EMBEDDING_DIMENSIONS;
const dimensions = dimensionsRaw ? parseInt(dimensionsRaw, 10) : DEFAULT_DIMENSIONS;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make embedding provider, model, and dimensions configurable

2 participants