Skip to content

fix: make embedding batch size configurable#696

Open
pntech20 wants to merge 1 commit into
plastic-labs:mainfrom
pntech20:codex/configurable-embedding-batch-size
Open

fix: make embedding batch size configurable#696
pntech20 wants to merge 1 commit into
plastic-labs:mainfrom
pntech20:codex/configurable-embedding-batch-size

Conversation

@pntech20
Copy link
Copy Markdown

@pntech20 pntech20 commented May 17, 2026

Summary

  • add optional max_batch_size to embedding model config and runtime embedding config
  • use the configured batch cap when splitting OpenAI/Gemini embedding requests
  • document the env/TOML setting for OpenAI-compatible providers such as DashScope

Fixes #687.

Verification

  • uv run ruff check src\config.py src\embedding_client.py tests\llm\test_embedding_client.py
  • uv run ruff format --check src\config.py src\embedding_client.py tests\llm\test_embedding_client.py
  • uv run basedpyright src\config.py src\embedding_client.py tests\llm\test_embedding_client.py
  • direct runtime check that max_batch_size=2 splits simple_batch_embed(["a", "b", "c"]) into two OpenAI embedding calls and that embedding settings parse max_batch_size=10

uv run pytest tests\llm\test_embedding_client.py -q is blocked in this Windows environment before the test file runs because the repo-level tests/conftest.py imports app startup, which imports src.telemetry.reasoning_traces, which imports Unix-only fcntl.

Summary by CodeRabbit

  • New Features

    • Added optional max_batch_size configuration for embedding providers, enabling customization of per-request input limits. Defaults to provider-specific values (e.g., 2048 for OpenAI, 100 for Gemini).
  • Documentation

    • Updated embedding configuration guide with batch size setting examples and guidance for providers with smaller request limits.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

Walkthrough

This PR adds a configurable max_batch_size field to embedding model configuration, allowing per-provider batch size limits to be set. Gemini and OpenAI clients now respect this setting, with fallbacks to their respective defaults (100 and 2048). Configuration examples, documentation, and test coverage accompany the change.

Changes

Embedding batch size configuration

Layer / File(s) Summary
Configuration field definitions
src/config.py
max_batch_size field (validated as positive int or None) added to ConfiguredEmbeddingModelSettings and EmbeddingModelConfig.
Configuration examples and documentation
.env.template, config.toml.example, docs/v3/contributing/configuration.mdx
Environment template, TOML config example, and embedding configuration documentation updated to show max_batch_size as an optional setting, with guidance on provider defaults and when to override (e.g., DashScope limit of 10).
Configuration resolution
src/config.py
resolve_embedding_model_config propagates configured.max_batch_size into the resolved runtime EmbeddingModelConfig.
Client batching implementation
src/embedding_client.py
Gemini (default 100) and OpenAI (default 2048) batch size initialization now honors config.max_batch_size when set. Client recreation signature updated to include max_batch_size so configuration changes trigger reinitialization.
Test coverage
tests/llm/test_embedding_client.py
Test helper accepts configurable max_batch_size. New tests verify simple_batch_embed splits inputs per batch limit and EmbeddingSettings parses the environment variable. Environment cleanup extended.

🎯 2 (Simple) | ⏱️ ~10 minutes

🐰 A config field hops into place,
With batch sizes no more of one pace,
DashScope now grins,
Each embed request wins,
Ten inputs fit perfectly in space! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: make embedding batch size configurable' directly and clearly describes the main change: adding configuration support for embedding batch size.
Linked Issues check ✅ Passed The PR implements the recommended solution from issue #687 by exposing max_batch_size as a configurable field in embedding model config, allowing users to set provider-appropriate batch limits.
Out of Scope Changes check ✅ Passed All changes are directly scoped to adding max_batch_size configurability: config declarations, environment/TOML examples, documentation, runtime batching logic, and corresponding tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/embedding_client.py (1)

185-185: ⚡ Quick win

Update comment to reflect configurable batch size.

The comment references "max 2048 embeddings per request", but batch size is now configurable per this PR. Consider updating to something more generic like "Create batches that fit configured API limits" to avoid confusion.

📝 Suggested comment update
-        # 2. Create batches that fit API limits (max 2048 embeddings per request, max 300,000 tokens per request)
+        # 2. Create batches that fit API limits (batch size and token limits)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/embedding_client.py` at line 185, Update the inline comment that
currently says "max 2048 embeddings per request" to a generic note reflecting
that batch size is configurable, e.g., "Create batches that fit configured API
limits (e.g., max embeddings per request and max tokens per request)"; make this
change next to the batching logic that uses the batch_size configuration
variable (and related max token limit variable) so the comment matches the
runtime-configurable behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/embedding_client.py`:
- Line 185: Update the inline comment that currently says "max 2048 embeddings
per request" to a generic note reflecting that batch size is configurable, e.g.,
"Create batches that fit configured API limits (e.g., max embeddings per request
and max tokens per request)"; make this change next to the batching logic that
uses the batch_size configuration variable (and related max token limit
variable) so the comment matches the runtime-configurable behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 612c6ca2-cb97-4ac2-8bdf-1ad34326c4e0

📥 Commits

Reviewing files that changed from the base of the PR and between 8fcbb54 and b3a485c.

📒 Files selected for processing (6)
  • .env.template
  • config.toml.example
  • docs/v3/contributing/configuration.mdx
  • src/config.py
  • src/embedding_client.py
  • tests/llm/test_embedding_client.py

@Omee11
Copy link
Copy Markdown

Omee11 commented May 17, 2026

Amazing, thank you!

@pntech20
Copy link
Copy Markdown
Author

Thanks! Glad this was useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] OpenAIEmbeddingClient.max_batch_size hardcoded to 2048 breaks DashScope embeddings (cap is 10)

2 participants