feat: shards and replicas#1735
Conversation
WalkthroughThis PR adds configurable OpenSearch index sharding and replica settings across all deployment layers. Environment variables ChangesOpenSearch Shard/Replica Configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Trivy (0.69.3)Trivy execution failed: 2026-06-02T18:02:04Z FATAL Fatal error run error: fs scan error: scan error: scan failed: failed analysis: post analysis error: post analysis error: kubernetes scan error: fs filter error: fs filter error: walk error range error: stat .coderabbit-opengrep-fallback.yml: no such file or directory: range error: stat .coderabbit-opengrep-fallback.yml: no such file or directory Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Build successful! ✅ |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
kubernetes/operator/internal/controller/env.go (1)
34-34:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExpose the new OpenSearch settings through Langflow's env allowlist.
Adding defaults here is not enough: Line 34 still omits both new names from
LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT, so Langflow components/flows will not see the shard/replica values even though they exist in the pod env.Suggested fix
- "LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT": "JWT,OPENRAG_QUERY_FILTER,OPENSEARCH_PASSWORD,OPENSEARCH_URL,OPENSEARCH_INDEX_NAME,DOCLING_SERVE_URL,DOCLING_TASK_ID,OWNER,OWNER_NAME,OWNER_EMAIL,CONNECTOR_TYPE,DOCUMENT_ID,SOURCE_URL,ALLOWED_USERS,ALLOWED_GROUPS,ALLOWED_PRINCIPALS,FILENAME,MIMETYPE,FILESIZE,SELECTED_EMBEDDING_MODEL,OPENAI_API_KEY,ANTHROPIC_API_KEY,WATSONX_API_KEY,WATSONX_ENDPOINT,WATSONX_PROJECT_ID,OLLAMA_BASE_URL,OPENRAG_INGEST_URL,OPENRAG_INGEST_TOKEN,OPENRAG_INGEST_RUN_ID,OPENRAG_INGEST_BATCH_SIZE", + "LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT": "JWT,OPENRAG_QUERY_FILTER,OPENSEARCH_PASSWORD,OPENSEARCH_URL,OPENSEARCH_INDEX_NAME,OPENRAG_OPENSEARCH_NUMBER_OF_SHARDS,OPENRAG_OPENSEARCH_NUMBER_OF_REPLICAS,DOCLING_SERVE_URL,DOCLING_TASK_ID,OWNER,OWNER_NAME,OWNER_EMAIL,CONNECTOR_TYPE,DOCUMENT_ID,SOURCE_URL,ALLOWED_USERS,ALLOWED_GROUPS,ALLOWED_PRINCIPALS,FILENAME,MIMETYPE,FILESIZE,SELECTED_EMBEDDING_MODEL,OPENAI_API_KEY,ANTHROPIC_API_KEY,WATSONX_API_KEY,WATSONX_ENDPOINT,WATSONX_PROJECT_ID,OLLAMA_BASE_URL,OPENRAG_INGEST_URL,OPENRAG_INGEST_TOKEN,OPENRAG_INGEST_RUN_ID,OPENRAG_INGEST_BATCH_SIZE",Also applies to: 77-82
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kubernetes/operator/internal/controller/env.go` at line 34, Update the LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT allowlist to include the new OpenSearch shard/replica env names so Langflow can read them at runtime; specifically add OPENSEARCH_SHARDS and OPENSEARCH_REPLICAS to the comma-separated string value of LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT (the same constant appears again further down and must be updated there as well) so components/flows can access these variables.src/tui/managers/env_manager.py (1)
715-780:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFull setup still doesn't surface the new shard/replica knobs.
EnvConfig,_env_attr_map(), andsave_env_file()all support these values now, butget_full_setup_fields()never exposes them. The guided setup path will keep users pinned to the hard-coded1/0defaults, so this feature is only partially configurable.Suggested fix
optional_fields = [ ( "disable_ingest_with_langflow", "Disable Langflow Ingestion (optional)", "False", False, ), ( "ingest_sample_data", "Ingest Sample Data (optional)", "True", False, ), ( "opensearch_index_name", "OpenSearch Index Name", "documents", False, ), + ( + "opensearch_number_of_shards", + "OpenSearch Primary Shards", + "1", + False, + ), + ( + "opensearch_number_of_replicas", + "OpenSearch Replica Shards", + "0", + False, + ), ( "webhook_base_url", "Webhook Base URL (optional)", "https://your-domain.com", False,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/tui/managers/env_manager.py` around lines 715 - 780, get_full_setup_fields() never includes the new OpenSearch shard/replica environment keys, so guided setup keeps hard-coded 1/0 defaults; add the two env keys (the exact names implemented in EnvConfig/_env_attr_map/save_env_file) into the optional_fields list inside get_full_setup_fields with user-friendly labels like "OpenSearch Index Shards" and "OpenSearch Index Replicas" and sensible defaults ("1" and "0") and booleans matching other optional items so the guided setup surfaces and saves those values.
🧹 Nitpick comments (1)
src/utils/embeddings.py (1)
63-70: 💤 Low valueSettings shape is inconsistent with the rest of the codebase.
Here
number_of_shards/number_of_replicasare placed flat undersettings, as siblings ofindex: {knn: True}, whereas theknowledge_filtersbody inopensearch_init.py(Lines 281-284) nests both undersettings.index. OpenSearch normalizes the flat form toindex.*, so this works, but the divergence is a maintenance smell — consider nesting them underindexfor consistency.♻️ Optional: nest under
indexreturn { "settings": { - "index": {"knn": True}, - "number_of_shards": OPENSEARCH_NUMBER_OF_SHARDS, - "number_of_replicas": OPENSEARCH_NUMBER_OF_REPLICAS, + "index": { + "knn": True, + "number_of_shards": OPENSEARCH_NUMBER_OF_SHARDS, + "number_of_replicas": OPENSEARCH_NUMBER_OF_REPLICAS, + }, }, "mappings": {"properties": properties}, }Note: if you adopt this, the unit test assertions in
tests/unit/test_embedding_fields.py(Lines 139-140) that readbody["settings"]["number_of_shards"]must be updated tobody["settings"]["index"][...].🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/embeddings.py` around lines 63 - 70, The settings dict returned by the function (the block returning {"settings": {...}, "mappings": {"properties": properties}}) places number_of_shards and number_of_replicas at the top of settings, which is inconsistent with other code that nests them under settings.index; update the returned structure so that "settings.index" contains {"knn": True, "number_of_shards": OPENSEARCH_NUMBER_OF_SHARDS, "number_of_replicas": OPENSEARCH_NUMBER_OF_REPLICAS} (keep "mappings": {"properties": properties} unchanged) and adjust any unit tests that assert body["settings"]["number_of_shards"] to check body["settings"]["index"]["number_of_shards"] instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@flows/components/opensearch_multimodal.py`:
- Line 5: Remove the unused "import os" from
flows/components/opensearch_multimodal.py and instead import any needed
configuration values from config/settings.py (e.g., from config.settings import
SOME_SETTING) so no os.environ reads remain in this module; update any local
references that would have used os.environ to use the corresponding settings
symbols and ensure imports and names (e.g., any function or class in this file
that relied on env vars) are updated accordingly.
- Around line 34-43: Remove the local _get_min_env_int helper and the
OPENSEARCH_NUMBER_OF_SHARDS / OPENSEARCH_NUMBER_OF_REPLICAS constant definitions
and instead import the existing constants from config.settings; update the
imports near the other config imports (around line where other config values are
imported) to pull OPENSEARCH_NUMBER_OF_SHARDS and OPENSEARCH_NUMBER_OF_REPLICAS
from config.settings, then delete the _get_min_env_int function and the two
local constant assignments in flows/components/opensearch_multimodal.py so all
config values come from config/settings.py as required.
In `@src/tui/config_fields.py`:
- Line 10: Remove the unused import Optional from the import statement (current
line imports "Optional") in config_fields.py; update the top-level typing import
so it no longer imports Optional (leaving only used names) to satisfy Ruff and
eliminate the unused-import error.
- Around line 125-142: The two ConfigField instances
"opensearch_number_of_shards" and "opensearch_number_of_replicas" currently
accept arbitrary strings; add input validation to ensure the shards value parses
to an integer >= 1 and the replicas value parses to an integer >= 0 before
saving. Implement this by wiring a validator/cleaner on these ConfigField
definitions (or the form save handler used by them) that attempts int(...)
conversion, rejects non-numeric input with a clear helper/error message, and
rejects values outside the allowed ranges so invalid inputs like "foo" or "-1"
cannot be persisted.
In `@src/tui/managers/env_manager.py`:
- Line 10: Remove the stale typing imports in src/tui/managers/env_manager.py:
drop Dict and List from the from typing import line (or remove the entire import
if Optional is also unused), and update any type annotations to use built-in
generics (e.g., dict, list) or keep only Optional if still referenced; ensure
the import line no longer includes Dict or List to resolve the UP035 Ruff error.
---
Outside diff comments:
In `@kubernetes/operator/internal/controller/env.go`:
- Line 34: Update the LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT allowlist to
include the new OpenSearch shard/replica env names so Langflow can read them at
runtime; specifically add OPENSEARCH_SHARDS and OPENSEARCH_REPLICAS to the
comma-separated string value of LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT (the
same constant appears again further down and must be updated there as well) so
components/flows can access these variables.
In `@src/tui/managers/env_manager.py`:
- Around line 715-780: get_full_setup_fields() never includes the new OpenSearch
shard/replica environment keys, so guided setup keeps hard-coded 1/0 defaults;
add the two env keys (the exact names implemented in
EnvConfig/_env_attr_map/save_env_file) into the optional_fields list inside
get_full_setup_fields with user-friendly labels like "OpenSearch Index Shards"
and "OpenSearch Index Replicas" and sensible defaults ("1" and "0") and booleans
matching other optional items so the guided setup surfaces and saves those
values.
---
Nitpick comments:
In `@src/utils/embeddings.py`:
- Around line 63-70: The settings dict returned by the function (the block
returning {"settings": {...}, "mappings": {"properties": properties}}) places
number_of_shards and number_of_replicas at the top of settings, which is
inconsistent with other code that nests them under settings.index; update the
returned structure so that "settings.index" contains {"knn": True,
"number_of_shards": OPENSEARCH_NUMBER_OF_SHARDS, "number_of_replicas":
OPENSEARCH_NUMBER_OF_REPLICAS} (keep "mappings": {"properties": properties}
unchanged) and adjust any unit tests that assert
body["settings"]["number_of_shards"] to check
body["settings"]["index"]["number_of_shards"] instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1d7b508d-0aa6-4c76-980a-60f358ffe4a7
📒 Files selected for processing (25)
.env.exampledocker-compose.ymldocs/docs/reference/configuration.mdxflows/components/opensearch_multimodal.pyflows/ingestion_flow.jsonflows/openrag_agent.jsonflows/openrag_nudges.jsonflows/openrag_url_mcp.jsonkubernetes/helm/openrag/templates/backend/backend-dotenv.yamlkubernetes/helm/openrag/templates/langflow/langflow-dotenv.yamlkubernetes/helm/openrag/values.yamlkubernetes/operator/README.mdkubernetes/operator/api/v1alpha1/openrag_types.gokubernetes/operator/config/crd/bases/openr.ag_openrags.yamlkubernetes/operator/config/samples/kind-cluster-openrag-cr.yamlkubernetes/operator/config/samples/openrag_v1alpha1_openrag.yamlkubernetes/operator/internal/controller/env.gokubernetes/operator/internal/controller/openrag_controller.gokubernetes/operator/internal/controller/openrag_controller_test.gosrc/config/settings.pysrc/tui/config_fields.pysrc/tui/managers/env_manager.pysrc/utils/embeddings.pysrc/utils/opensearch_init.pytests/unit/test_embedding_fields.py
|
|
||
| import copy | ||
| import json | ||
| import os |
There was a problem hiding this comment.
Remove unused import.
The os module is imported but should not be used in this file per coding guidelines. Configuration values should be imported from config/settings.py instead.
As per coding guidelines, config values must come from config/settings.py (the only place os.environ is read); never access os.environ elsewhere in the codebase.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@flows/components/opensearch_multimodal.py` at line 5, Remove the unused
"import os" from flows/components/opensearch_multimodal.py and instead import
any needed configuration values from config/settings.py (e.g., from
config.settings import SOME_SETTING) so no os.environ reads remain in this
module; update any local references that would have used os.environ to use the
corresponding settings symbols and ensure imports and names (e.g., any function
or class in this file that relied on env vars) are updated accordingly.
| def _get_min_env_int(key: str, default: int, minimum: int) -> int: | ||
| try: | ||
| value = int(os.getenv(key, default)) | ||
| except (TypeError, ValueError): | ||
| value = default | ||
| return max(value, minimum) | ||
|
|
||
|
|
||
| OPENSEARCH_NUMBER_OF_SHARDS = _get_min_env_int("OPENRAG_OPENSEARCH_NUMBER_OF_SHARDS", 1, 1) | ||
| OPENSEARCH_NUMBER_OF_REPLICAS = _get_min_env_int("OPENRAG_OPENSEARCH_NUMBER_OF_REPLICAS", 0, 0) |
There was a problem hiding this comment.
Import shard/replica constants from config/settings.py instead of reading environment variables directly.
This code violates the coding guideline which states: "Config values must come from config/settings.py (the only place os.environ is read); never access os.environ elsewhere in the codebase."
The OPENSEARCH_NUMBER_OF_SHARDS and OPENSEARCH_NUMBER_OF_REPLICAS constants already exist in config/settings.py (lines 460-461) and should be imported from there.
♻️ Proposed fix to import from config/settings.py
Remove the duplicated helper and constant definitions:
-import os
-
-
-def _get_min_env_int(key: str, default: int, minimum: int) -> int:
- try:
- value = int(os.getenv(key, default))
- except (TypeError, ValueError):
- value = default
- return max(value, minimum)
-
-
-OPENSEARCH_NUMBER_OF_SHARDS = _get_min_env_int("OPENRAG_OPENSEARCH_NUMBER_OF_SHARDS", 1, 1)
-OPENSEARCH_NUMBER_OF_REPLICAS = _get_min_env_int("OPENRAG_OPENSEARCH_NUMBER_OF_REPLICAS", 0, 0)Add to the imports section near line 15 (where other config imports exist):
from config.embedding_constants import OPENAI_DEFAULT_EMBEDDING_MODEL
+from config.settings import OPENSEARCH_NUMBER_OF_SHARDS, OPENSEARCH_NUMBER_OF_REPLICASAs per coding guidelines, config values must come from config/settings.py (the only place os.environ is read); never access os.environ elsewhere in the codebase.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@flows/components/opensearch_multimodal.py` around lines 34 - 43, Remove the
local _get_min_env_int helper and the OPENSEARCH_NUMBER_OF_SHARDS /
OPENSEARCH_NUMBER_OF_REPLICAS constant definitions and instead import the
existing constants from config.settings; update the imports near the other
config imports (around line where other config values are imported) to pull
OPENSEARCH_NUMBER_OF_SHARDS and OPENSEARCH_NUMBER_OF_REPLICAS from
config.settings, then delete the _get_min_env_int function and the two local
constant assignments in flows/components/opensearch_multimodal.py so all config
values come from config/settings.py as required.
| from collections.abc import Callable | ||
| from dataclasses import dataclass | ||
| from dataclasses import field as dataclass_field | ||
| from typing import Optional |
There was a problem hiding this comment.
Remove the unused Optional import.
Ruff is already failing on Line 10 for this.
🧰 Tools
🪛 GitHub Actions: Lint Backend / 0_Ruff and mypy on changed files.txt
[error] 10-10: ruff check --no-fix failed (F401). typing.Optional imported but unused
🪛 GitHub Actions: Lint Backend / Ruff and mypy on changed files
[error] 10-10: Ruff (ruff check --no-fix) lint error: F401 typing.Optional imported but unused
🪛 GitHub Check: Ruff and mypy on changed files
[failure] 10-10: ruff (F401)
src/tui/config_fields.py:10:20: F401 typing.Optional imported but unused
help: Remove unused import: typing.Optional
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/tui/config_fields.py` at line 10, Remove the unused import Optional from
the import statement (current line imports "Optional") in config_fields.py;
update the top-level typing import so it no longer imports Optional (leaving
only used names) to satisfy Ruff and eliminate the unused-import error.
| ConfigField( | ||
| "opensearch_number_of_shards", | ||
| "OPENRAG_OPENSEARCH_NUMBER_OF_SHARDS", | ||
| "Primary Shards", | ||
| placeholder="1", | ||
| default="1", | ||
| advanced=True, | ||
| helper_text="Primary shard count for newly-created OpenRAG indices", | ||
| ), | ||
| ConfigField( | ||
| "opensearch_number_of_replicas", | ||
| "OPENRAG_OPENSEARCH_NUMBER_OF_REPLICAS", | ||
| "Replica Shards", | ||
| placeholder="0", | ||
| default="0", | ||
| advanced=True, | ||
| helper_text="Replica shard count for OpenRAG indices", | ||
| ), |
There was a problem hiding this comment.
Validate shard/replica inputs before saving them.
These fields map to CRD-backed settings with minimums of shards >= 1 and replicas >= 0, but the TUI currently accepts arbitrary strings here. That lets users save values like foo or -1 and only discover the problem later during startup/reconciliation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/tui/config_fields.py` around lines 125 - 142, The two ConfigField
instances "opensearch_number_of_shards" and "opensearch_number_of_replicas"
currently accept arbitrary strings; add input validation to ensure the shards
value parses to an integer >= 1 and the replicas value parses to an integer >= 0
before saving. Implement this by wiring a validator/cleaner on these ConfigField
definitions (or the form save handler used by them) that attempts int(...)
conversion, rejects non-numeric input with a clear helper/error message, and
rejects values outside the allowed ranges so invalid inputs like "foo" or "-1"
cannot be persisted.
| @@ -10,6 +10,7 @@ | |||
| from typing import Dict, List, Optional | |||
There was a problem hiding this comment.
Drop the stale typing imports.
After switching this file to built-in generics, Dict/List are just leftover imports, and Ruff is already failing with UP035 here.
🧰 Tools
🪛 GitHub Actions: autofix.ci / 0_ruff autofix.txt
[error] 10-10: Ruff (UP035): typing.Dict is deprecated; use dict instead.
[error] 10-10: Ruff (UP035): typing.List is deprecated; use list instead.
🪛 GitHub Actions: autofix.ci / ruff autofix
[error] 10-10: ruff check failed (exit code 1). UP035: typing.Dict is deprecated, use dict instead.
[error] 10-10: ruff check failed (exit code 1). UP035: typing.List is deprecated, use list instead.
🪛 GitHub Check: Ruff and mypy on changed files
[failure] 10-10: ruff (F401)
src/tui/managers/env_manager.py:10:32: F401 typing.Optional imported but unused
help: Remove unused import
[failure] 10-10: ruff (F401)
src/tui/managers/env_manager.py:10:26: F401 typing.List imported but unused
help: Remove unused import
[failure] 10-10: ruff (F401)
src/tui/managers/env_manager.py:10:20: F401 typing.Dict imported but unused
help: Remove unused import
[failure] 10-10: ruff (UP035)
src/tui/managers/env_manager.py:10:1: UP035 typing.List is deprecated, use list instead
[failure] 10-10: ruff (UP035)
src/tui/managers/env_manager.py:10:1: UP035 typing.Dict is deprecated, use dict instead
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/tui/managers/env_manager.py` at line 10, Remove the stale typing imports
in src/tui/managers/env_manager.py: drop Dict and List from the from typing
import line (or remove the entire import if Optional is also unused), and update
any type annotations to use built-in generics (e.g., dict, list) or keep only
Optional if still referenced; ensure the import line no longer includes Dict or
List to resolve the UP035 Ruff error.
Summary by CodeRabbit
New Features
Documentation
Tests