Skip to content

fix: update ssl verification for docling#1813

Merged
lucaseduoli merged 5 commits into
mainfrom
fix/docling_tls
Jun 10, 2026
Merged

fix: update ssl verification for docling#1813
lucaseduoli merged 5 commits into
mainfrom
fix/docling_tls

Conversation

@lucaseduoli

@lucaseduoli lucaseduoli commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

This pull request adds support for configuring SSL certificate verification when connecting to Docling Serve, making it possible to disable SSL verification if needed (for example, in local or development environments). The change introduces a new environment variable, input field, and logic for handling the SSL verification flag throughout the codebase.

Configuration and environment variable updates:

  • Added the DOCLING_SERVE_VERIFY_SSL environment variable to docker-compose.yml and included it in the LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT list, allowing users to control SSL verification via environment configuration.

Component and input enhancements:

  • Introduced a new verify_ssl input field in the DoclingRemoteComponent, making SSL verification configurable from the UI or environment. [1] [2]
  • Implemented the _get_verify_ssl method to interpret the verify_ssl input and determine whether SSL verification should be enabled.

Networking and client usage:

  • Updated all usages of httpx.Client in DoclingRemoteComponent to use the SSL verification flag by passing the result of _get_verify_ssl(), ensuring consistent behavior during API calls. [1] [2]

Summary by CodeRabbit

  • New Features

    • Configurable SSL certificate verification for external document-processing connections via component UI input and environment variable.
  • Chores

    • Surface and propagate the SSL verification setting across ingestion flows, job runs, deployment templates, Helm values, operator defaults, and CRD schema (default: false).
    • Ensure runtimes receive the setting when invoking external services.
  • Tests

    • Updated integration test to assert the new environment variable and its default.

@lucaseduoli lucaseduoli requested a review from zzzming June 9, 2026 16:38
@lucaseduoli lucaseduoli self-assigned this Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR adds configurable SSL certificate verification for Docling Serve HTTP connections across the system. DOCLING_SERVE_VERIFY_SSL is exposed in compose/Helm/operator manifests, threaded into Langflow run headers, added as a verify_ssl component input, and applied to all httpx client calls in the DoclingRemote flows.

Changes

Docling Serve SSL Verification

Layer / File(s) Summary
Environment, Helm, and operator integration
docker-compose.yml, kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml, kubernetes/helm/openrag/templates/langflow/langflow-dotenv.yaml, kubernetes/helm/openrag/values.yaml, kubernetes/operator/api/v1alpha1/openrag_types.go, kubernetes/operator/internal/controller/env.go, kubernetes/operator/internal/controller/openrag_controller.go, kubernetes/operator/internal/controller/env_test.go, kubernetes/operator/api/v1alpha1/zz_generated.deepcopy.go, kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml, src/services/langflow_file_service.py
Adds DOCLING_SERVE_VERIFY_SSL to Docker Compose and Helm templates/values, exposes it in generated .env for backend/langflow, adds verifySsl to the operator CRD/spec and deepcopy handling, updates controller defaults and tests, and passes the variable into Langflow run headers.
Component input and SSL verification
flows/components/docling_remote.py
Adds verify_ssl advanced/optional input (env-default DOCLING_SERVE_VERIFY_SSL), implements _get_verify_ssl() to normalize values to boolean, and passes verify=self._get_verify_ssl() to httpx.Client for task polling and file conversion.
Flow manifest configuration
flows/ingestion_flow.json
Adds verify_ssl to the DoclingRemote node field_order and UI/template; embeds component code changes mirroring the standalone component (helper and verify= usage).

Sequence Diagram

sequenceDiagram
  participant DoclingRemoteComponent
  participant _get_verify_ssl
  participant httpxClient
  participant DoclingServe
  DoclingRemoteComponent->>_get_verify_ssl: read `verify_ssl` / env
  _get_verify_ssl->>httpxClient: return boolean `verify`
  httpxClient->>DoclingServe: HTTPS request (verify=boolean)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • langflow-ai/openrag#1817: Implements the same Docling Serve SSL verification toggle end-to-end, including DOCLING_SERVE_VERIFY_SSL wiring and verify_ssl component input passed to httpx.Client(verify=...).

Suggested labels

enhancement

Suggested reviewers

  • zzzming
  • edwinjosechittilappilly
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: update ssl verification for docling' clearly and concisely summarizes the main change—adding configurable SSL certificate verification for Docling connections across the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/docling_tls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) docker bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flows/components/docling_remote.py`:
- Around line 136-144: The verify_ssl StrInput currently treats any string not
in a tiny truthy list as False, which disables TLS on typos/blank values; update
the parsing/consumption of the StrInput named "verify_ssl" so you first trim and
lowercase the value, treat empty/missing/unresolved placeholders as secure
(True), and only accept an explicit truthy allowlist (e.g. "true","1","yes","y")
as True while treating everything else as True by default if ambiguous — then
convert that boolean and pass it to both Docling client creation sites (the two
places where verify_ssl is read/used) so TLS verification defaults to enabled
unless the user intentionally sets a valid false token.

In `@flows/ingestion_flow.json`:
- Line 739: The verify_ssl input defaults to the literal placeholder
"DOCLING_SERVE_VERIFY_SSL" and _get_verify_ssl() treats any non-true string as
False, so unresolved env placeholders disable TLS; fix by (1) changing the
StrInput default or handling the placeholder in _get_verify_ssl(): in function
_get_verify_ssl(), detect the unresolved placeholder literal (e.g.,
"DOCLING_SERVE_VERIFY_SSL") and treat it as True (preserve secure default), also
treat None/empty string as True, keep existing boolean and "true"/"1"/"yes"
handling; update the StrInput declaration (name="verify_ssl") only if you prefer
to set a safer default value instead of relying on placeholder.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e57dd97-7da8-4d99-a39a-400cede33c20

📥 Commits

Reviewing files that changed from the base of the PR and between 03f1c91 and 82af609.

📒 Files selected for processing (4)
  • docker-compose.yml
  • flows/components/docling_remote.py
  • flows/ingestion_flow.json
  • src/services/langflow_file_service.py

Comment on lines +136 to +144
StrInput(
name="verify_ssl",
display_name="Verify SSL",
info="Whether to verify SSL certificates for Docling Serve.",
value="DOCLING_SERVE_VERIFY_SSL",
load_from_db=True,
required=False,
advanced=True,
),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default unknown verify_ssl values to secure True.

This parser disables certificate verification for any string outside a tiny truthy allowlist. Because verify_ssl is a free-form StrInput, a typo, extra whitespace, blank value, or unresolved placeholder will silently turn TLS verification off for both Docling client paths.

Suggested fix
     def _get_verify_ssl(self) -> bool:
         """Determine whether to verify SSL certificates for Docling Serve.
 
         Returns:
             bool: True if SSL verification should be enforced, False otherwise.
         """
-        verify = getattr(self, "verify_ssl", "true")
+        verify = getattr(self, "verify_ssl", True)
         if isinstance(verify, bool):
             return verify
         if isinstance(verify, str):
-            return verify.lower() in ("true", "1", "yes")
+            normalized = verify.strip().lower()
+            if normalized in {"false", "0", "no"}:
+                return False
+            return True
         return True

Also applies to: 293-304

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flows/components/docling_remote.py` around lines 136 - 144, The verify_ssl
StrInput currently treats any string not in a tiny truthy list as False, which
disables TLS on typos/blank values; update the parsing/consumption of the
StrInput named "verify_ssl" so you first trim and lowercase the value, treat
empty/missing/unresolved placeholders as secure (True), and only accept an
explicit truthy allowlist (e.g. "true","1","yes","y") as True while treating
everything else as True by default if ambiguous — then convert that boolean and
pass it to both Docling client creation sites (the two places where verify_ssl
is read/used) so TLS verification defaults to enabled unless the user
intentionally sets a valid false token.

Comment thread flows/ingestion_flow.json
"title_case": false,
"type": "code",
"value": "from __future__ import annotations\n\nimport base64\nimport json\nimport time\nfrom concurrent.futures import Future, ThreadPoolExecutor\nfrom pathlib import Path # noqa: TC003\nfrom typing import Any\n\nimport httpx\nfrom docling_core.types.doc import DoclingDocument\nfrom pydantic import ValidationError\n\nfrom lfx.base.data import BaseFileComponent\nfrom lfx.inputs import IntInput, NestedDictInput, StrInput, TableInput\nfrom lfx.inputs.inputs import FloatInput\nfrom lfx.schema import Data, dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n\nclass DoclingRemoteComponent(BaseFileComponent):\n display_name = \"Docling Serve\"\n description = (\n \"Uses Docling to process input documents connecting to your instance of Docling Serve.\"\n )\n documentation = \"https://docling-project.github.io/docling/\"\n trace_type = \"tool\"\n icon = \"Docling\"\n name = \"DoclingRemote\"\n\n MAX_500_RETRIES = 5\n\n # https://docling-project.github.io/docling/usage/supported_formats/\n VALID_EXTENSIONS = [\n \"adoc\",\n \"asciidoc\",\n \"asc\",\n \"bmp\",\n \"csv\",\n \"dotx\",\n \"dotm\",\n \"docm\",\n \"docx\",\n \"htm\",\n \"html\",\n \"jpeg\",\n \"jpg\",\n \"json\",\n \"md\",\n \"pdf\",\n \"png\",\n \"potx\",\n \"ppsx\",\n \"pptm\",\n \"potm\",\n \"ppsm\",\n \"pptx\",\n \"tiff\",\n \"txt\",\n \"xls\",\n \"xlsx\",\n \"xhtml\",\n \"xml\",\n \"webp\",\n ]\n\n inputs = [\n *BaseFileComponent.get_base_inputs(),\n StrInput(\n name=\"api_url\",\n display_name=\"Server address\",\n info=\"URL of the Docling Serve instance.\",\n required=True,\n ),\n StrInput(\n name=\"task_id\",\n display_name=\"Task ID\",\n info=(\n \"Optional task ID from a previous Docling Serve upload. \"\n \"If provided, file input is ignored and the component polls for this task's results.\"\n ),\n required=False,\n ),\n IntInput(\n name=\"max_concurrency\",\n display_name=\"Concurrency\",\n info=\"Maximum number of concurrent requests for the server.\",\n advanced=True,\n value=2,\n input_types=[\"Message\"],\n ),\n FloatInput(\n name=\"max_poll_timeout\",\n display_name=\"Maximum poll time\",\n info=\"Maximum waiting time for the document conversion to complete.\",\n advanced=True,\n value=3600,\n input_types=[\"Message\"],\n ),\n TableInput(\n name=\"api_headers\",\n display_name=\"HTTP headers\",\n advanced=True,\n required=False,\n info=(\"Optional headers required for connecting to Docling Serve.\"),\n table_schema=[\n {\n \"name\": \"key\",\n \"display_name\": \"Key\",\n \"type\": \"string\",\n \"description\": \"Key name\",\n },\n {\n \"name\": \"value\",\n \"display_name\": \"Value\",\n \"load_from_db\": True,\n \"type\": \"string\",\n \"description\": \"Value of the header\",\n },\n ],\n value=[],\n real_time_refresh=True,\n input_types=[\"Data\", \"JSON\"],\n ),\n NestedDictInput(\n name=\"docling_serve_opts\",\n display_name=\"Docling options\",\n advanced=True,\n required=False,\n info=(\n \"Optional dictionary of additional options. \"\n \"See https://github.com/docling-project/docling-serve/blob/main/docs/usage.md for more information.\"\n ),\n input_types=[\"Message\"],\n ),\n ]\n\n outputs = [\n *BaseFileComponent.get_base_outputs(),\n ]\n\n def _process_headers(self) -> dict[str, str]:\n \"\"\"Process the headers input into a valid dictionary.\"\"\"\n if not self.api_headers:\n return {}\n\n component_headers_dict = {}\n # TableInput normalizes to list\n items = self.api_headers if isinstance(self.api_headers, list) else [self.api_headers]\n\n for item in items:\n if not item:\n continue\n\n # Case 1: Data object\n if hasattr(item, \"data\") and isinstance(item.data, dict):\n data = item.data\n if \"key\" in data and \"value\" in data:\n component_headers_dict[str(data[\"key\"])] = str(data[\"value\"])\n else:\n # Fallback: merge all keys from Data object\n for k, v in data.items():\n if k not in (\"text_key\", \"default_value\"):\n component_headers_dict[str(k)] = str(v)\n\n # Case 2: Dictionary (Table row)\n elif isinstance(item, dict):\n if \"key\" in item and \"value\" in item:\n component_headers_dict[str(item[\"key\"])] = str(item[\"value\"])\n else:\n # Fallback: merge all keys\n for k, v in item.items():\n component_headers_dict[str(k)] = str(v)\n\n # Case 3: Message object\n elif hasattr(item, \"text\") and isinstance(item.text, str):\n try:\n parsed = json.loads(item.text)\n if isinstance(parsed, dict):\n for k, v in parsed.items():\n component_headers_dict[str(k)] = str(v)\n except json.JSONDecodeError:\n pass\n\n return component_headers_dict\n\n def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"api_headers\":\n if isinstance(field_value, dict):\n # If it's a dict, convert to list of {key, value} pairs for TableInput\n # This handles migration from NestedDictInput to TableInput\n new_value = [{\"key\": k, \"value\": v} for k, v in field_value.items()]\n build_config[\"api_headers\"][\"value\"] = new_value\n return build_config\n if field_value is None:\n build_config[\"api_headers\"][\"value\"] = []\n return build_config\n\n # Default behavior\n return super().update_build_config(build_config, field_value, field_name)\n\n def _poll_and_fetch_result(\n self, client: httpx.Client, base_url: str, task_id: str, file_path: str | None = None\n ) -> Data | None:\n \"\"\"Poll for task completion and fetch the result.\n\n Args:\n client: The HTTP client to use for requests.\n base_url: The base URL of the Docling Serve API.\n task_id: The task ID to poll for.\n file_path: Optional file path to include in the result data.\n\n Returns:\n Data object with the DoclingDocument, or None if processing failed.\n \"\"\"\n http_failures = 0\n retry_status_start = 500\n retry_status_end = 600\n start_wait_time = time.monotonic()\n\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n response.raise_for_status()\n task = response.json()\n\n while task[\"task_status\"] not in (\"success\", \"failure\"):\n processing_time = time.monotonic() - start_wait_time\n if processing_time >= self.max_poll_timeout:\n msg = (\n f\"Processing time {processing_time=} exceeds the maximum poll timeout {self.max_poll_timeout=}.\"\n \"Please increase the max_poll_timeout parameter or review why the processing \"\n \"takes long on the server.\"\n )\n self.log(msg)\n raise RuntimeError(msg)\n\n time.sleep(2)\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n\n if retry_status_start <= response.status_code < retry_status_end:\n http_failures += 1\n if http_failures > self.MAX_500_RETRIES:\n self.log(\n f\"The status requests got a http response {response.status_code} too many times.\"\n )\n return None\n continue\n\n task = response.json()\n\n result_resp = client.get(f\"{base_url}/result/{task_id}\")\n result_resp.raise_for_status()\n result = result_resp.json()\n\n if result.get(\"status\") == \"failure\" or result.get(\"errors\"):\n errors = result.get(\"errors\", [])\n err_msg_list = []\n for err in errors:\n if isinstance(err, dict) and \"error_message\" in err:\n err_msg_list.append(err[\"error_message\"])\n elif isinstance(err, str):\n err_msg_list.append(err)\n\n err_details = \"; \".join(err_msg_list) if err_msg_list else \"Unknown Docling processing error\"\n\n msg = f\"Docling processing failed: {err_details}\"\n raise ValueError(msg)\n\n if \"json_content\" not in result[\"document\"] or result[\"document\"][\"json_content\"] is None:\n self.log(\"No JSON DoclingDocument found in the result.\")\n return None\n\n try:\n doc = DoclingDocument.model_validate(result[\"document\"][\"json_content\"])\n data_dict: dict[str, Any] = {\"doc\": doc}\n if file_path:\n data_dict[\"file_path\"] = file_path\n return Data(data=data_dict)\n except ValidationError as e:\n self.log(f\"Error validating the document. {e}\")\n return None\n\n def _process_task_id(self) -> list[Data]:\n \"\"\"Process an existing task by polling for status and retrieving results.\n\n Returns:\n List containing the result Data object, or empty list if processing failed.\n \"\"\"\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n with httpx.Client(headers=self._process_headers()) as client:\n result = self._poll_and_fetch_result(client, base_url, self.task_id)\n return [result] if result else []\n\n def load_files_base(self) -> list[Data]:\n \"\"\"Load and process files, or poll an existing task if task_id is provided.\n\n Returns:\n list[Data]: Parsed data from the processed files or task.\n \"\"\"\n if self.task_id:\n return self._process_task_id()\n return super().load_files_base()\n\n def process_files(\n self, file_list: list[BaseFileComponent.BaseFile]\n ) -> list[BaseFileComponent.BaseFile]:\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n def _convert_document(\n client: httpx.Client, file_path: Path, options: dict[str, Any]\n ) -> Data | None:\n encoded_doc = base64.b64encode(file_path.read_bytes()).decode()\n payload = {\n \"options\": options,\n \"sources\": [\n {\"kind\": \"file\", \"base64_string\": encoded_doc, \"filename\": file_path.name}\n ],\n }\n\n response = client.post(f\"{base_url}/convert/source/async\", json=payload)\n response.raise_for_status()\n task = response.json()\n\n return self._poll_and_fetch_result(client, base_url, task[\"task_id\"], str(file_path))\n\n docling_options = {\n \"to_formats\": [\"json\"],\n \"image_export_mode\": \"placeholder\",\n **(self.docling_serve_opts or {}),\n }\n\n processed_data: list[Data | None] = []\n with (\n httpx.Client(headers=self._process_headers()) as client,\n ThreadPoolExecutor(max_workers=self.max_concurrency) as executor,\n ):\n futures: list[tuple[int, Future]] = []\n for i, file in enumerate(file_list):\n if file.path is None:\n processed_data.append(None)\n continue\n\n futures.append(\n (i, executor.submit(_convert_document, client, file.path, docling_options))\n )\n\n for _index, future in futures:\n try:\n result_data = future.result()\n processed_data.append(result_data)\n except (httpx.HTTPStatusError, httpx.RequestError, KeyError, ValueError) as exc:\n self.log(f\"Docling remote processing failed: {exc}\")\n raise\n\n return self.rollup_data(file_list, processed_data)\n"
"value": "from __future__ import annotations\n\nimport base64\nimport json\nimport time\nfrom concurrent.futures import Future, ThreadPoolExecutor\nfrom pathlib import Path # noqa: TC003\nfrom typing import Any\n\nimport httpx\nfrom docling_core.types.doc import DoclingDocument\nfrom pydantic import ValidationError\n\nfrom lfx.base.data import BaseFileComponent\nfrom lfx.inputs import IntInput, NestedDictInput, StrInput, TableInput\nfrom lfx.inputs.inputs import FloatInput\nfrom lfx.schema import Data, dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n\nclass DoclingRemoteComponent(BaseFileComponent):\n display_name = \"Docling Serve\"\n description = (\n \"Uses Docling to process input documents connecting to your instance of Docling Serve.\"\n )\n documentation = \"https://docling-project.github.io/docling/\"\n trace_type = \"tool\"\n icon = \"Docling\"\n name = \"DoclingRemote\"\n\n MAX_500_RETRIES = 5\n\n # https://docling-project.github.io/docling/usage/supported_formats/\n VALID_EXTENSIONS = [\n \"adoc\",\n \"asciidoc\",\n \"asc\",\n \"bmp\",\n \"csv\",\n \"dotx\",\n \"dotm\",\n \"docm\",\n \"docx\",\n \"htm\",\n \"html\",\n \"jpeg\",\n \"jpg\",\n \"json\",\n \"md\",\n \"pdf\",\n \"png\",\n \"potx\",\n \"ppsx\",\n \"pptm\",\n \"potm\",\n \"ppsm\",\n \"pptx\",\n \"tiff\",\n \"txt\",\n \"xls\",\n \"xlsx\",\n \"xhtml\",\n \"xml\",\n \"webp\",\n ]\n\n inputs = [\n *BaseFileComponent.get_base_inputs(),\n StrInput(\n name=\"api_url\",\n display_name=\"Server address\",\n info=\"URL of the Docling Serve instance.\",\n required=True,\n ),\n StrInput(\n name=\"task_id\",\n display_name=\"Task ID\",\n info=(\n \"Optional task ID from a previous Docling Serve upload. \"\n \"If provided, file input is ignored and the component polls for this task's results.\"\n ),\n required=False,\n ),\n IntInput(\n name=\"max_concurrency\",\n display_name=\"Concurrency\",\n info=\"Maximum number of concurrent requests for the server.\",\n advanced=True,\n value=2,\n input_types=[\"Message\"],\n ),\n FloatInput(\n name=\"max_poll_timeout\",\n display_name=\"Maximum poll time\",\n info=\"Maximum waiting time for the document conversion to complete.\",\n advanced=True,\n value=3600,\n input_types=[\"Message\"],\n ),\n TableInput(\n name=\"api_headers\",\n display_name=\"HTTP headers\",\n advanced=True,\n required=False,\n info=(\"Optional headers required for connecting to Docling Serve.\"),\n table_schema=[\n {\n \"name\": \"key\",\n \"display_name\": \"Key\",\n \"type\": \"string\",\n \"description\": \"Key name\",\n },\n {\n \"name\": \"value\",\n \"display_name\": \"Value\",\n \"load_from_db\": True,\n \"type\": \"string\",\n \"description\": \"Value of the header\",\n },\n ],\n value=[],\n real_time_refresh=True,\n input_types=[\"Data\", \"JSON\"],\n ),\n NestedDictInput(\n name=\"docling_serve_opts\",\n display_name=\"Docling options\",\n advanced=True,\n required=False,\n info=(\n \"Optional dictionary of additional options. \"\n \"See https://github.com/docling-project/docling-serve/blob/main/docs/usage.md for more information.\"\n ),\n input_types=[\"Message\"],\n ),\n StrInput(\n name=\"verify_ssl\",\n display_name=\"Verify SSL\",\n info=\"Whether to verify SSL certificates for Docling Serve.\",\n value=\"DOCLING_SERVE_VERIFY_SSL\",\n load_from_db=True,\n required=False,\n advanced=True,\n ),\n ]\n\n outputs = [\n *BaseFileComponent.get_base_outputs(),\n ]\n\n def _process_headers(self) -> dict[str, str]:\n \"\"\"Process the headers input into a valid dictionary.\"\"\"\n if not self.api_headers:\n return {}\n\n component_headers_dict = {}\n # TableInput normalizes to list\n items = self.api_headers if isinstance(self.api_headers, list) else [self.api_headers]\n\n for item in items:\n if not item:\n continue\n\n # Case 1: Data object\n if hasattr(item, \"data\") and isinstance(item.data, dict):\n data = item.data\n if \"key\" in data and \"value\" in data:\n component_headers_dict[str(data[\"key\"])] = str(data[\"value\"])\n else:\n # Fallback: merge all keys from Data object\n for k, v in data.items():\n if k not in (\"text_key\", \"default_value\"):\n component_headers_dict[str(k)] = str(v)\n\n # Case 2: Dictionary (Table row)\n elif isinstance(item, dict):\n if \"key\" in item and \"value\" in item:\n component_headers_dict[str(item[\"key\"])] = str(item[\"value\"])\n else:\n # Fallback: merge all keys\n for k, v in item.items():\n component_headers_dict[str(k)] = str(v)\n\n # Case 3: Message object\n elif hasattr(item, \"text\") and isinstance(item.text, str):\n try:\n parsed = json.loads(item.text)\n if isinstance(parsed, dict):\n for k, v in parsed.items():\n component_headers_dict[str(k)] = str(v)\n except json.JSONDecodeError:\n pass\n\n return component_headers_dict\n\n def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"api_headers\":\n if isinstance(field_value, dict):\n # If it's a dict, convert to list of {key, value} pairs for TableInput\n # This handles migration from NestedDictInput to TableInput\n new_value = [{\"key\": k, \"value\": v} for k, v in field_value.items()]\n build_config[\"api_headers\"][\"value\"] = new_value\n return build_config\n if field_value is None:\n build_config[\"api_headers\"][\"value\"] = []\n return build_config\n\n # Default behavior\n return super().update_build_config(build_config, field_value, field_name)\n\n def _poll_and_fetch_result(\n self, client: httpx.Client, base_url: str, task_id: str, file_path: str | None = None\n ) -> Data | None:\n \"\"\"Poll for task completion and fetch the result.\n\n Args:\n client: The HTTP client to use for requests.\n base_url: The base URL of the Docling Serve API.\n task_id: The task ID to poll for.\n file_path: Optional file path to include in the result data.\n\n Returns:\n Data object with the DoclingDocument, or None if processing failed.\n \"\"\"\n http_failures = 0\n retry_status_start = 500\n retry_status_end = 600\n start_wait_time = time.monotonic()\n\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n response.raise_for_status()\n task = response.json()\n\n while task[\"task_status\"] not in (\"success\", \"failure\"):\n processing_time = time.monotonic() - start_wait_time\n if processing_time >= self.max_poll_timeout:\n msg = (\n f\"Processing time {processing_time=} exceeds the maximum poll timeout {self.max_poll_timeout=}.\"\n \"Please increase the max_poll_timeout parameter or review why the processing \"\n \"takes long on the server.\"\n )\n self.log(msg)\n raise RuntimeError(msg)\n\n time.sleep(2)\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n\n if retry_status_start <= response.status_code < retry_status_end:\n http_failures += 1\n if http_failures > self.MAX_500_RETRIES:\n self.log(\n f\"The status requests got a http response {response.status_code} too many times.\"\n )\n return None\n continue\n\n task = response.json()\n\n result_resp = client.get(f\"{base_url}/result/{task_id}\")\n result_resp.raise_for_status()\n result = result_resp.json()\n\n if result.get(\"status\") == \"failure\" or result.get(\"errors\"):\n errors = result.get(\"errors\", [])\n err_msg_list = []\n for err in errors:\n if isinstance(err, dict) and \"error_message\" in err:\n err_msg_list.append(err[\"error_message\"])\n elif isinstance(err, str):\n err_msg_list.append(err)\n\n err_details = \"; \".join(err_msg_list) if err_msg_list else \"Unknown Docling processing error\"\n\n msg = f\"Docling processing failed: {err_details}\"\n raise ValueError(msg)\n\n if \"json_content\" not in result[\"document\"] or result[\"document\"][\"json_content\"] is None:\n self.log(\"No JSON DoclingDocument found in the result.\")\n return None\n\n try:\n doc = DoclingDocument.model_validate(result[\"document\"][\"json_content\"])\n data_dict: dict[str, Any] = {\"doc\": doc}\n if file_path:\n data_dict[\"file_path\"] = file_path\n return Data(data=data_dict)\n except ValidationError as e:\n self.log(f\"Error validating the document. {e}\")\n return None\n\n def _get_verify_ssl(self) -> bool:\n \"\"\"Determine whether to verify SSL certificates for Docling Serve.\n\n Returns:\n bool: True if SSL verification should be enforced, False otherwise.\n \"\"\"\n verify = getattr(self, \"verify_ssl\", \"true\")\n if isinstance(verify, bool):\n return verify\n if isinstance(verify, str):\n return verify.lower() in (\"true\", \"1\", \"yes\")\n return True\n\n def _process_task_id(self) -> list[Data]:\n \"\"\"Process an existing task by polling for status and retrieving results.\n\n Returns:\n List containing the result Data object, or empty list if processing failed.\n \"\"\"\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n with httpx.Client(headers=self._process_headers(), verify=self._get_verify_ssl()) as client:\n result = self._poll_and_fetch_result(client, base_url, self.task_id)\n return [result] if result else []\n\n def load_files_base(self) -> list[Data]:\n \"\"\"Load and process files, or poll an existing task if task_id is provided.\n\n Returns:\n list[Data]: Parsed data from the processed files or task.\n \"\"\"\n if self.task_id:\n return self._process_task_id()\n return super().load_files_base()\n\n def process_files(\n self, file_list: list[BaseFileComponent.BaseFile]\n ) -> list[BaseFileComponent.BaseFile]:\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n def _convert_document(\n client: httpx.Client, file_path: Path, options: dict[str, Any]\n ) -> Data | None:\n encoded_doc = base64.b64encode(file_path.read_bytes()).decode()\n payload = {\n \"options\": options,\n \"sources\": [\n {\"kind\": \"file\", \"base64_string\": encoded_doc, \"filename\": file_path.name}\n ],\n }\n\n response = client.post(f\"{base_url}/convert/source/async\", json=payload)\n response.raise_for_status()\n task = response.json()\n\n return self._poll_and_fetch_result(client, base_url, task[\"task_id\"], str(file_path))\n\n docling_options = {\n \"to_formats\": [\"json\"],\n \"image_export_mode\": \"placeholder\",\n **(self.docling_serve_opts or {}),\n }\n\n processed_data: list[Data | None] = []\n with (\n httpx.Client(headers=self._process_headers(), verify=self._get_verify_ssl()) as client,\n ThreadPoolExecutor(max_workers=self.max_concurrency) as executor,\n ):\n futures: list[tuple[int, Future]] = []\n for i, file in enumerate(file_list):\n if file.path is None:\n processed_data.append(None)\n continue\n\n futures.append(\n (i, executor.submit(_convert_document, client, file.path, docling_options))\n )\n\n for _index, future in futures:\n try:\n result_data = future.result()\n processed_data.append(result_data)\n except (httpx.HTTPStatusError, httpx.RequestError, KeyError, ValueError) as exc:\n self.log(f\"Docling remote processing failed: {exc}\")\n raise\n\n return self.rollup_data(file_list, processed_data)\n"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve secure TLS defaults when the env placeholder is unresolved.

verify_ssl defaults to the literal string DOCLING_SERVE_VERIFY_SSL, but _get_verify_ssl() treats every string except "true", "1", and "yes" as False. That means an unset/unresolved env value silently disables certificate verification instead of keeping it enabled.

🔧 Suggested fix
     def _get_verify_ssl(self) -> bool:
-        verify = getattr(self, "verify_ssl", "true")
+        verify = getattr(self, "verify_ssl", True)
         if isinstance(verify, bool):
             return verify
+        if verify is None:
+            return True
         if isinstance(verify, str):
-            return verify.lower() in ("true", "1", "yes")
+            normalized = verify.strip()
+            if not normalized or normalized == "DOCLING_SERVE_VERIFY_SSL":
+                return True
+            normalized = normalized.lower()
+            if normalized in ("true", "1", "yes"):
+                return True
+            if normalized in ("false", "0", "no"):
+                return False
         return True

Also applies to: 1028-1047

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flows/ingestion_flow.json` at line 739, The verify_ssl input defaults to the
literal placeholder "DOCLING_SERVE_VERIFY_SSL" and _get_verify_ssl() treats any
non-true string as False, so unresolved env placeholders disable TLS; fix by (1)
changing the StrInput default or handling the placeholder in _get_verify_ssl():
in function _get_verify_ssl(), detect the unresolved placeholder literal (e.g.,
"DOCLING_SERVE_VERIFY_SSL") and treat it as True (preserve secure default), also
treat None/empty string as True, keep existing boolean and "true"/"1"/"yes"
handling; update the StrInput declaration (name="verify_ssl") only if you prefer
to set a safer default value instead of relying on placeholder.

@edwinjosechittilappilly

Copy link
Copy Markdown
Collaborator

@lucaseduoli @rodageve can we add it to the operator env also ?

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml`:
- Around line 153-157: Helm and the operator disagree on the
DOCLING_SERVE_VERIFY_SSL default—Helm sets it to "true" while the operator sets
"false"—so update the operator to match the secure Helm default: in
kubernetes/operator/internal/controller/env.go change both places where
DOCLING_SERVE_VERIFY_SSL is set to default "false" to instead default to "true"
(these are the two env var definitions that populate the operator's pod env);
ensure no other code branches override it (note src/config/settings.py reads
this env var and src/services/docling_service.py uses it as httpx verify).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6113d4cf-4108-4618-a7ac-dddcaf870bb4

📥 Commits

Reviewing files that changed from the base of the PR and between 82af609 and d899064.

📒 Files selected for processing (7)
  • kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml
  • kubernetes/helm/openrag/templates/langflow/langflow-dotenv.yaml
  • kubernetes/helm/openrag/values.yaml
  • kubernetes/operator/api/v1alpha1/openrag_types.go
  • kubernetes/operator/internal/controller/env.go
  • kubernetes/operator/internal/controller/env_test.go
  • kubernetes/operator/internal/controller/openrag_controller.go

Comment on lines +153 to +157
{{- if hasKey .Values.global.docling "verifySsl" }}
DOCLING_SERVE_VERIFY_SSL={{ ternary "true" "false" .Values.global.docling.verifySsl | quote }}
{{- else }}
DOCLING_SERVE_VERIFY_SSL="true"
{{- end }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Compare DOCLING_SERVE_VERIFY_SSL defaults in Helm vs operator

echo "=== Helm backend-dotenv default when verifySsl key absent ==="
rg -n 'DOCLING_SERVE_VERIFY_SSL.*"true"' kubernetes/helm/openrag/templates/backend/

echo ""
echo "=== Operator env.go default for DOCLING_SERVE_VERIFY_SSL ==="
rg -n 'DOCLING_SERVE_VERIFY_SSL.*"false"' kubernetes/operator/internal/controller/env.go

Repository: langflow-ai/openrag

Length of output: 541


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Helm backend-dotenv.yaml (around 148-160) ==="
nl -ba kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml | sed -n '140,170p'

echo
echo "=== Operator env.go (around 70-110) ==="
nl -ba kubernetes/operator/internal/controller/env.go | sed -n '60,115p'

echo
echo "=== Operator env.go (around 135-170) ==="
nl -ba kubernetes/operator/internal/controller/env.go | sed -n '130,180p'

Repository: langflow-ai/openrag

Length of output: 174


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Helm backend-dotenv.yaml (around 148-160) ==="
cat -n kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml | sed -n '145,165p'

echo
echo "=== Operator env.go (around 70-115) ==="
cat -n kubernetes/operator/internal/controller/env.go | sed -n '70,115p'

echo
echo "=== Operator env.go (around 135-180) ==="
cat -n kubernetes/operator/internal/controller/env.go | sed -n '135,180p'

Repository: langflow-ai/openrag

Length of output: 5931


🏁 Script executed:

rg -n "DOCLING_SERVE_VERIFY_SSL" .

Repository: langflow-ai/openrag

Length of output: 18841


Align DOCLING_SERVE_VERIFY_SSL default between Helm and the operator
Helm defaults DOCLING_SERVE_VERIFY_SSL to "true" when global.docling.verifySsl is absent (kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml lines 153-157; also kubernetes/helm/openrag/templates/langflow/langflow-dotenv.yaml lines 111-113). The operator defaults it to "false" in both env var sets (kubernetes/operator/internal/controller/env.go lines 87 and 149). Since src/config/settings.py reads this env var and src/services/docling_service.py uses it as httpx’s verify flag, SSL verification will differ by deployment mode unless users set the value explicitly—update one side so the defaults match (or document the intended behavioral difference).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml` around lines
153 - 157, Helm and the operator disagree on the DOCLING_SERVE_VERIFY_SSL
default—Helm sets it to "true" while the operator sets "false"—so update the
operator to match the secure Helm default: in
kubernetes/operator/internal/controller/env.go change both places where
DOCLING_SERVE_VERIFY_SSL is set to default "false" to instead default to "true"
(these are the two env var definitions that populate the operator's pod env);
ensure no other code branches override it (note src/config/settings.py reads
this env var and src/services/docling_service.py uses it as httpx verify).

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 9, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml (1)

2187-2189: Consider the security implications of defaulting SSL verification to false.

The CRD correctly reflects the upstream Go type definition and kubebuilder annotations. However, defaulting verifySsl to false means SSL certificate verification is disabled by default when connecting to external Docling services over HTTPS.

Users should be aware that they must explicitly set verifySsl: true in their OpenRAG CR when connecting to production HTTPS Docling endpoints to ensure certificate validation and prevent MITM attacks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml` around lines
2187 - 2189, The CRD sets the verifySsl field default to false which disables
HTTPS certificate verification; update the OpenRAG CRD so verifySsl defaults to
true (or remove the unsafe false default) and document that consumers should set
verifySsl: true for production Docling endpoints; locate the verifySsl schema
entry in the CRD (the verifySsl property under the relevant spec for OpenRAG)
and change its default boolean value to true and ensure any generated
Go/kubebuilder tags (e.g., the VerifySsl field on the OpenRAG API type) and
controller code respect and validate this default.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml`:
- Around line 2187-2189: The CRD sets the verifySsl field default to false which
disables HTTPS certificate verification; update the OpenRAG CRD so verifySsl
defaults to true (or remove the unsafe false default) and document that
consumers should set verifySsl: true for production Docling endpoints; locate
the verifySsl schema entry in the CRD (the verifySsl property under the relevant
spec for OpenRAG) and change its default boolean value to true and ensure any
generated Go/kubebuilder tags (e.g., the VerifySsl field on the OpenRAG API
type) and controller code respect and validate this default.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eff0fe81-6832-4794-b6fe-a8f2d9cb637e

📥 Commits

Reviewing files that changed from the base of the PR and between 4863abb and c0db998.

📒 Files selected for processing (2)
  • kubernetes/operator/api/v1alpha1/zz_generated.deepcopy.go
  • kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml

@lucaseduoli lucaseduoli enabled auto-merge (squash) June 10, 2026 11:58
@lucaseduoli lucaseduoli merged commit 8530ab0 into main Jun 10, 2026
25 of 26 checks passed
@github-actions github-actions Bot added the lgtm label Jun 10, 2026
@github-actions github-actions Bot deleted the fix/docling_tls branch June 10, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. docker lgtm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants