fix: update ssl verification for docling#1813
Conversation
WalkthroughThis PR adds configurable SSL certificate verification for Docling Serve HTTP connections across the system. ChangesDocling Serve SSL Verification
Sequence DiagramsequenceDiagram
participant DoclingRemoteComponent
participant _get_verify_ssl
participant httpxClient
participant DoclingServe
DoclingRemoteComponent->>_get_verify_ssl: read `verify_ssl` / env
_get_verify_ssl->>httpxClient: return boolean `verify`
httpxClient->>DoclingServe: HTTPS request (verify=boolean)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@flows/components/docling_remote.py`:
- Around line 136-144: The verify_ssl StrInput currently treats any string not
in a tiny truthy list as False, which disables TLS on typos/blank values; update
the parsing/consumption of the StrInput named "verify_ssl" so you first trim and
lowercase the value, treat empty/missing/unresolved placeholders as secure
(True), and only accept an explicit truthy allowlist (e.g. "true","1","yes","y")
as True while treating everything else as True by default if ambiguous — then
convert that boolean and pass it to both Docling client creation sites (the two
places where verify_ssl is read/used) so TLS verification defaults to enabled
unless the user intentionally sets a valid false token.
In `@flows/ingestion_flow.json`:
- Line 739: The verify_ssl input defaults to the literal placeholder
"DOCLING_SERVE_VERIFY_SSL" and _get_verify_ssl() treats any non-true string as
False, so unresolved env placeholders disable TLS; fix by (1) changing the
StrInput default or handling the placeholder in _get_verify_ssl(): in function
_get_verify_ssl(), detect the unresolved placeholder literal (e.g.,
"DOCLING_SERVE_VERIFY_SSL") and treat it as True (preserve secure default), also
treat None/empty string as True, keep existing boolean and "true"/"1"/"yes"
handling; update the StrInput declaration (name="verify_ssl") only if you prefer
to set a safer default value instead of relying on placeholder.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9e57dd97-7da8-4d99-a39a-400cede33c20
📒 Files selected for processing (4)
docker-compose.ymlflows/components/docling_remote.pyflows/ingestion_flow.jsonsrc/services/langflow_file_service.py
| StrInput( | ||
| name="verify_ssl", | ||
| display_name="Verify SSL", | ||
| info="Whether to verify SSL certificates for Docling Serve.", | ||
| value="DOCLING_SERVE_VERIFY_SSL", | ||
| load_from_db=True, | ||
| required=False, | ||
| advanced=True, | ||
| ), |
There was a problem hiding this comment.
Default unknown verify_ssl values to secure True.
This parser disables certificate verification for any string outside a tiny truthy allowlist. Because verify_ssl is a free-form StrInput, a typo, extra whitespace, blank value, or unresolved placeholder will silently turn TLS verification off for both Docling client paths.
Suggested fix
def _get_verify_ssl(self) -> bool:
"""Determine whether to verify SSL certificates for Docling Serve.
Returns:
bool: True if SSL verification should be enforced, False otherwise.
"""
- verify = getattr(self, "verify_ssl", "true")
+ verify = getattr(self, "verify_ssl", True)
if isinstance(verify, bool):
return verify
if isinstance(verify, str):
- return verify.lower() in ("true", "1", "yes")
+ normalized = verify.strip().lower()
+ if normalized in {"false", "0", "no"}:
+ return False
+ return True
return TrueAlso applies to: 293-304
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@flows/components/docling_remote.py` around lines 136 - 144, The verify_ssl
StrInput currently treats any string not in a tiny truthy list as False, which
disables TLS on typos/blank values; update the parsing/consumption of the
StrInput named "verify_ssl" so you first trim and lowercase the value, treat
empty/missing/unresolved placeholders as secure (True), and only accept an
explicit truthy allowlist (e.g. "true","1","yes","y") as True while treating
everything else as True by default if ambiguous — then convert that boolean and
pass it to both Docling client creation sites (the two places where verify_ssl
is read/used) so TLS verification defaults to enabled unless the user
intentionally sets a valid false token.
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from __future__ import annotations\n\nimport base64\nimport json\nimport time\nfrom concurrent.futures import Future, ThreadPoolExecutor\nfrom pathlib import Path # noqa: TC003\nfrom typing import Any\n\nimport httpx\nfrom docling_core.types.doc import DoclingDocument\nfrom pydantic import ValidationError\n\nfrom lfx.base.data import BaseFileComponent\nfrom lfx.inputs import IntInput, NestedDictInput, StrInput, TableInput\nfrom lfx.inputs.inputs import FloatInput\nfrom lfx.schema import Data, dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n\nclass DoclingRemoteComponent(BaseFileComponent):\n display_name = \"Docling Serve\"\n description = (\n \"Uses Docling to process input documents connecting to your instance of Docling Serve.\"\n )\n documentation = \"https://docling-project.github.io/docling/\"\n trace_type = \"tool\"\n icon = \"Docling\"\n name = \"DoclingRemote\"\n\n MAX_500_RETRIES = 5\n\n # https://docling-project.github.io/docling/usage/supported_formats/\n VALID_EXTENSIONS = [\n \"adoc\",\n \"asciidoc\",\n \"asc\",\n \"bmp\",\n \"csv\",\n \"dotx\",\n \"dotm\",\n \"docm\",\n \"docx\",\n \"htm\",\n \"html\",\n \"jpeg\",\n \"jpg\",\n \"json\",\n \"md\",\n \"pdf\",\n \"png\",\n \"potx\",\n \"ppsx\",\n \"pptm\",\n \"potm\",\n \"ppsm\",\n \"pptx\",\n \"tiff\",\n \"txt\",\n \"xls\",\n \"xlsx\",\n \"xhtml\",\n \"xml\",\n \"webp\",\n ]\n\n inputs = [\n *BaseFileComponent.get_base_inputs(),\n StrInput(\n name=\"api_url\",\n display_name=\"Server address\",\n info=\"URL of the Docling Serve instance.\",\n required=True,\n ),\n StrInput(\n name=\"task_id\",\n display_name=\"Task ID\",\n info=(\n \"Optional task ID from a previous Docling Serve upload. \"\n \"If provided, file input is ignored and the component polls for this task's results.\"\n ),\n required=False,\n ),\n IntInput(\n name=\"max_concurrency\",\n display_name=\"Concurrency\",\n info=\"Maximum number of concurrent requests for the server.\",\n advanced=True,\n value=2,\n input_types=[\"Message\"],\n ),\n FloatInput(\n name=\"max_poll_timeout\",\n display_name=\"Maximum poll time\",\n info=\"Maximum waiting time for the document conversion to complete.\",\n advanced=True,\n value=3600,\n input_types=[\"Message\"],\n ),\n TableInput(\n name=\"api_headers\",\n display_name=\"HTTP headers\",\n advanced=True,\n required=False,\n info=(\"Optional headers required for connecting to Docling Serve.\"),\n table_schema=[\n {\n \"name\": \"key\",\n \"display_name\": \"Key\",\n \"type\": \"string\",\n \"description\": \"Key name\",\n },\n {\n \"name\": \"value\",\n \"display_name\": \"Value\",\n \"load_from_db\": True,\n \"type\": \"string\",\n \"description\": \"Value of the header\",\n },\n ],\n value=[],\n real_time_refresh=True,\n input_types=[\"Data\", \"JSON\"],\n ),\n NestedDictInput(\n name=\"docling_serve_opts\",\n display_name=\"Docling options\",\n advanced=True,\n required=False,\n info=(\n \"Optional dictionary of additional options. \"\n \"See https://github.com/docling-project/docling-serve/blob/main/docs/usage.md for more information.\"\n ),\n input_types=[\"Message\"],\n ),\n ]\n\n outputs = [\n *BaseFileComponent.get_base_outputs(),\n ]\n\n def _process_headers(self) -> dict[str, str]:\n \"\"\"Process the headers input into a valid dictionary.\"\"\"\n if not self.api_headers:\n return {}\n\n component_headers_dict = {}\n # TableInput normalizes to list\n items = self.api_headers if isinstance(self.api_headers, list) else [self.api_headers]\n\n for item in items:\n if not item:\n continue\n\n # Case 1: Data object\n if hasattr(item, \"data\") and isinstance(item.data, dict):\n data = item.data\n if \"key\" in data and \"value\" in data:\n component_headers_dict[str(data[\"key\"])] = str(data[\"value\"])\n else:\n # Fallback: merge all keys from Data object\n for k, v in data.items():\n if k not in (\"text_key\", \"default_value\"):\n component_headers_dict[str(k)] = str(v)\n\n # Case 2: Dictionary (Table row)\n elif isinstance(item, dict):\n if \"key\" in item and \"value\" in item:\n component_headers_dict[str(item[\"key\"])] = str(item[\"value\"])\n else:\n # Fallback: merge all keys\n for k, v in item.items():\n component_headers_dict[str(k)] = str(v)\n\n # Case 3: Message object\n elif hasattr(item, \"text\") and isinstance(item.text, str):\n try:\n parsed = json.loads(item.text)\n if isinstance(parsed, dict):\n for k, v in parsed.items():\n component_headers_dict[str(k)] = str(v)\n except json.JSONDecodeError:\n pass\n\n return component_headers_dict\n\n def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"api_headers\":\n if isinstance(field_value, dict):\n # If it's a dict, convert to list of {key, value} pairs for TableInput\n # This handles migration from NestedDictInput to TableInput\n new_value = [{\"key\": k, \"value\": v} for k, v in field_value.items()]\n build_config[\"api_headers\"][\"value\"] = new_value\n return build_config\n if field_value is None:\n build_config[\"api_headers\"][\"value\"] = []\n return build_config\n\n # Default behavior\n return super().update_build_config(build_config, field_value, field_name)\n\n def _poll_and_fetch_result(\n self, client: httpx.Client, base_url: str, task_id: str, file_path: str | None = None\n ) -> Data | None:\n \"\"\"Poll for task completion and fetch the result.\n\n Args:\n client: The HTTP client to use for requests.\n base_url: The base URL of the Docling Serve API.\n task_id: The task ID to poll for.\n file_path: Optional file path to include in the result data.\n\n Returns:\n Data object with the DoclingDocument, or None if processing failed.\n \"\"\"\n http_failures = 0\n retry_status_start = 500\n retry_status_end = 600\n start_wait_time = time.monotonic()\n\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n response.raise_for_status()\n task = response.json()\n\n while task[\"task_status\"] not in (\"success\", \"failure\"):\n processing_time = time.monotonic() - start_wait_time\n if processing_time >= self.max_poll_timeout:\n msg = (\n f\"Processing time {processing_time=} exceeds the maximum poll timeout {self.max_poll_timeout=}.\"\n \"Please increase the max_poll_timeout parameter or review why the processing \"\n \"takes long on the server.\"\n )\n self.log(msg)\n raise RuntimeError(msg)\n\n time.sleep(2)\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n\n if retry_status_start <= response.status_code < retry_status_end:\n http_failures += 1\n if http_failures > self.MAX_500_RETRIES:\n self.log(\n f\"The status requests got a http response {response.status_code} too many times.\"\n )\n return None\n continue\n\n task = response.json()\n\n result_resp = client.get(f\"{base_url}/result/{task_id}\")\n result_resp.raise_for_status()\n result = result_resp.json()\n\n if result.get(\"status\") == \"failure\" or result.get(\"errors\"):\n errors = result.get(\"errors\", [])\n err_msg_list = []\n for err in errors:\n if isinstance(err, dict) and \"error_message\" in err:\n err_msg_list.append(err[\"error_message\"])\n elif isinstance(err, str):\n err_msg_list.append(err)\n\n err_details = \"; \".join(err_msg_list) if err_msg_list else \"Unknown Docling processing error\"\n\n msg = f\"Docling processing failed: {err_details}\"\n raise ValueError(msg)\n\n if \"json_content\" not in result[\"document\"] or result[\"document\"][\"json_content\"] is None:\n self.log(\"No JSON DoclingDocument found in the result.\")\n return None\n\n try:\n doc = DoclingDocument.model_validate(result[\"document\"][\"json_content\"])\n data_dict: dict[str, Any] = {\"doc\": doc}\n if file_path:\n data_dict[\"file_path\"] = file_path\n return Data(data=data_dict)\n except ValidationError as e:\n self.log(f\"Error validating the document. {e}\")\n return None\n\n def _process_task_id(self) -> list[Data]:\n \"\"\"Process an existing task by polling for status and retrieving results.\n\n Returns:\n List containing the result Data object, or empty list if processing failed.\n \"\"\"\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n with httpx.Client(headers=self._process_headers()) as client:\n result = self._poll_and_fetch_result(client, base_url, self.task_id)\n return [result] if result else []\n\n def load_files_base(self) -> list[Data]:\n \"\"\"Load and process files, or poll an existing task if task_id is provided.\n\n Returns:\n list[Data]: Parsed data from the processed files or task.\n \"\"\"\n if self.task_id:\n return self._process_task_id()\n return super().load_files_base()\n\n def process_files(\n self, file_list: list[BaseFileComponent.BaseFile]\n ) -> list[BaseFileComponent.BaseFile]:\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n def _convert_document(\n client: httpx.Client, file_path: Path, options: dict[str, Any]\n ) -> Data | None:\n encoded_doc = base64.b64encode(file_path.read_bytes()).decode()\n payload = {\n \"options\": options,\n \"sources\": [\n {\"kind\": \"file\", \"base64_string\": encoded_doc, \"filename\": file_path.name}\n ],\n }\n\n response = client.post(f\"{base_url}/convert/source/async\", json=payload)\n response.raise_for_status()\n task = response.json()\n\n return self._poll_and_fetch_result(client, base_url, task[\"task_id\"], str(file_path))\n\n docling_options = {\n \"to_formats\": [\"json\"],\n \"image_export_mode\": \"placeholder\",\n **(self.docling_serve_opts or {}),\n }\n\n processed_data: list[Data | None] = []\n with (\n httpx.Client(headers=self._process_headers()) as client,\n ThreadPoolExecutor(max_workers=self.max_concurrency) as executor,\n ):\n futures: list[tuple[int, Future]] = []\n for i, file in enumerate(file_list):\n if file.path is None:\n processed_data.append(None)\n continue\n\n futures.append(\n (i, executor.submit(_convert_document, client, file.path, docling_options))\n )\n\n for _index, future in futures:\n try:\n result_data = future.result()\n processed_data.append(result_data)\n except (httpx.HTTPStatusError, httpx.RequestError, KeyError, ValueError) as exc:\n self.log(f\"Docling remote processing failed: {exc}\")\n raise\n\n return self.rollup_data(file_list, processed_data)\n" | ||
| "value": "from __future__ import annotations\n\nimport base64\nimport json\nimport time\nfrom concurrent.futures import Future, ThreadPoolExecutor\nfrom pathlib import Path # noqa: TC003\nfrom typing import Any\n\nimport httpx\nfrom docling_core.types.doc import DoclingDocument\nfrom pydantic import ValidationError\n\nfrom lfx.base.data import BaseFileComponent\nfrom lfx.inputs import IntInput, NestedDictInput, StrInput, TableInput\nfrom lfx.inputs.inputs import FloatInput\nfrom lfx.schema import Data, dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n\nclass DoclingRemoteComponent(BaseFileComponent):\n display_name = \"Docling Serve\"\n description = (\n \"Uses Docling to process input documents connecting to your instance of Docling Serve.\"\n )\n documentation = \"https://docling-project.github.io/docling/\"\n trace_type = \"tool\"\n icon = \"Docling\"\n name = \"DoclingRemote\"\n\n MAX_500_RETRIES = 5\n\n # https://docling-project.github.io/docling/usage/supported_formats/\n VALID_EXTENSIONS = [\n \"adoc\",\n \"asciidoc\",\n \"asc\",\n \"bmp\",\n \"csv\",\n \"dotx\",\n \"dotm\",\n \"docm\",\n \"docx\",\n \"htm\",\n \"html\",\n \"jpeg\",\n \"jpg\",\n \"json\",\n \"md\",\n \"pdf\",\n \"png\",\n \"potx\",\n \"ppsx\",\n \"pptm\",\n \"potm\",\n \"ppsm\",\n \"pptx\",\n \"tiff\",\n \"txt\",\n \"xls\",\n \"xlsx\",\n \"xhtml\",\n \"xml\",\n \"webp\",\n ]\n\n inputs = [\n *BaseFileComponent.get_base_inputs(),\n StrInput(\n name=\"api_url\",\n display_name=\"Server address\",\n info=\"URL of the Docling Serve instance.\",\n required=True,\n ),\n StrInput(\n name=\"task_id\",\n display_name=\"Task ID\",\n info=(\n \"Optional task ID from a previous Docling Serve upload. \"\n \"If provided, file input is ignored and the component polls for this task's results.\"\n ),\n required=False,\n ),\n IntInput(\n name=\"max_concurrency\",\n display_name=\"Concurrency\",\n info=\"Maximum number of concurrent requests for the server.\",\n advanced=True,\n value=2,\n input_types=[\"Message\"],\n ),\n FloatInput(\n name=\"max_poll_timeout\",\n display_name=\"Maximum poll time\",\n info=\"Maximum waiting time for the document conversion to complete.\",\n advanced=True,\n value=3600,\n input_types=[\"Message\"],\n ),\n TableInput(\n name=\"api_headers\",\n display_name=\"HTTP headers\",\n advanced=True,\n required=False,\n info=(\"Optional headers required for connecting to Docling Serve.\"),\n table_schema=[\n {\n \"name\": \"key\",\n \"display_name\": \"Key\",\n \"type\": \"string\",\n \"description\": \"Key name\",\n },\n {\n \"name\": \"value\",\n \"display_name\": \"Value\",\n \"load_from_db\": True,\n \"type\": \"string\",\n \"description\": \"Value of the header\",\n },\n ],\n value=[],\n real_time_refresh=True,\n input_types=[\"Data\", \"JSON\"],\n ),\n NestedDictInput(\n name=\"docling_serve_opts\",\n display_name=\"Docling options\",\n advanced=True,\n required=False,\n info=(\n \"Optional dictionary of additional options. \"\n \"See https://github.com/docling-project/docling-serve/blob/main/docs/usage.md for more information.\"\n ),\n input_types=[\"Message\"],\n ),\n StrInput(\n name=\"verify_ssl\",\n display_name=\"Verify SSL\",\n info=\"Whether to verify SSL certificates for Docling Serve.\",\n value=\"DOCLING_SERVE_VERIFY_SSL\",\n load_from_db=True,\n required=False,\n advanced=True,\n ),\n ]\n\n outputs = [\n *BaseFileComponent.get_base_outputs(),\n ]\n\n def _process_headers(self) -> dict[str, str]:\n \"\"\"Process the headers input into a valid dictionary.\"\"\"\n if not self.api_headers:\n return {}\n\n component_headers_dict = {}\n # TableInput normalizes to list\n items = self.api_headers if isinstance(self.api_headers, list) else [self.api_headers]\n\n for item in items:\n if not item:\n continue\n\n # Case 1: Data object\n if hasattr(item, \"data\") and isinstance(item.data, dict):\n data = item.data\n if \"key\" in data and \"value\" in data:\n component_headers_dict[str(data[\"key\"])] = str(data[\"value\"])\n else:\n # Fallback: merge all keys from Data object\n for k, v in data.items():\n if k not in (\"text_key\", \"default_value\"):\n component_headers_dict[str(k)] = str(v)\n\n # Case 2: Dictionary (Table row)\n elif isinstance(item, dict):\n if \"key\" in item and \"value\" in item:\n component_headers_dict[str(item[\"key\"])] = str(item[\"value\"])\n else:\n # Fallback: merge all keys\n for k, v in item.items():\n component_headers_dict[str(k)] = str(v)\n\n # Case 3: Message object\n elif hasattr(item, \"text\") and isinstance(item.text, str):\n try:\n parsed = json.loads(item.text)\n if isinstance(parsed, dict):\n for k, v in parsed.items():\n component_headers_dict[str(k)] = str(v)\n except json.JSONDecodeError:\n pass\n\n return component_headers_dict\n\n def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"api_headers\":\n if isinstance(field_value, dict):\n # If it's a dict, convert to list of {key, value} pairs for TableInput\n # This handles migration from NestedDictInput to TableInput\n new_value = [{\"key\": k, \"value\": v} for k, v in field_value.items()]\n build_config[\"api_headers\"][\"value\"] = new_value\n return build_config\n if field_value is None:\n build_config[\"api_headers\"][\"value\"] = []\n return build_config\n\n # Default behavior\n return super().update_build_config(build_config, field_value, field_name)\n\n def _poll_and_fetch_result(\n self, client: httpx.Client, base_url: str, task_id: str, file_path: str | None = None\n ) -> Data | None:\n \"\"\"Poll for task completion and fetch the result.\n\n Args:\n client: The HTTP client to use for requests.\n base_url: The base URL of the Docling Serve API.\n task_id: The task ID to poll for.\n file_path: Optional file path to include in the result data.\n\n Returns:\n Data object with the DoclingDocument, or None if processing failed.\n \"\"\"\n http_failures = 0\n retry_status_start = 500\n retry_status_end = 600\n start_wait_time = time.monotonic()\n\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n response.raise_for_status()\n task = response.json()\n\n while task[\"task_status\"] not in (\"success\", \"failure\"):\n processing_time = time.monotonic() - start_wait_time\n if processing_time >= self.max_poll_timeout:\n msg = (\n f\"Processing time {processing_time=} exceeds the maximum poll timeout {self.max_poll_timeout=}.\"\n \"Please increase the max_poll_timeout parameter or review why the processing \"\n \"takes long on the server.\"\n )\n self.log(msg)\n raise RuntimeError(msg)\n\n time.sleep(2)\n response = client.get(f\"{base_url}/status/poll/{task_id}\")\n\n if retry_status_start <= response.status_code < retry_status_end:\n http_failures += 1\n if http_failures > self.MAX_500_RETRIES:\n self.log(\n f\"The status requests got a http response {response.status_code} too many times.\"\n )\n return None\n continue\n\n task = response.json()\n\n result_resp = client.get(f\"{base_url}/result/{task_id}\")\n result_resp.raise_for_status()\n result = result_resp.json()\n\n if result.get(\"status\") == \"failure\" or result.get(\"errors\"):\n errors = result.get(\"errors\", [])\n err_msg_list = []\n for err in errors:\n if isinstance(err, dict) and \"error_message\" in err:\n err_msg_list.append(err[\"error_message\"])\n elif isinstance(err, str):\n err_msg_list.append(err)\n\n err_details = \"; \".join(err_msg_list) if err_msg_list else \"Unknown Docling processing error\"\n\n msg = f\"Docling processing failed: {err_details}\"\n raise ValueError(msg)\n\n if \"json_content\" not in result[\"document\"] or result[\"document\"][\"json_content\"] is None:\n self.log(\"No JSON DoclingDocument found in the result.\")\n return None\n\n try:\n doc = DoclingDocument.model_validate(result[\"document\"][\"json_content\"])\n data_dict: dict[str, Any] = {\"doc\": doc}\n if file_path:\n data_dict[\"file_path\"] = file_path\n return Data(data=data_dict)\n except ValidationError as e:\n self.log(f\"Error validating the document. {e}\")\n return None\n\n def _get_verify_ssl(self) -> bool:\n \"\"\"Determine whether to verify SSL certificates for Docling Serve.\n\n Returns:\n bool: True if SSL verification should be enforced, False otherwise.\n \"\"\"\n verify = getattr(self, \"verify_ssl\", \"true\")\n if isinstance(verify, bool):\n return verify\n if isinstance(verify, str):\n return verify.lower() in (\"true\", \"1\", \"yes\")\n return True\n\n def _process_task_id(self) -> list[Data]:\n \"\"\"Process an existing task by polling for status and retrieving results.\n\n Returns:\n List containing the result Data object, or empty list if processing failed.\n \"\"\"\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n with httpx.Client(headers=self._process_headers(), verify=self._get_verify_ssl()) as client:\n result = self._poll_and_fetch_result(client, base_url, self.task_id)\n return [result] if result else []\n\n def load_files_base(self) -> list[Data]:\n \"\"\"Load and process files, or poll an existing task if task_id is provided.\n\n Returns:\n list[Data]: Parsed data from the processed files or task.\n \"\"\"\n if self.task_id:\n return self._process_task_id()\n return super().load_files_base()\n\n def process_files(\n self, file_list: list[BaseFileComponent.BaseFile]\n ) -> list[BaseFileComponent.BaseFile]:\n transformed_url = transform_localhost_url(self.api_url)\n base_url = f\"{transformed_url}/v1\"\n\n def _convert_document(\n client: httpx.Client, file_path: Path, options: dict[str, Any]\n ) -> Data | None:\n encoded_doc = base64.b64encode(file_path.read_bytes()).decode()\n payload = {\n \"options\": options,\n \"sources\": [\n {\"kind\": \"file\", \"base64_string\": encoded_doc, \"filename\": file_path.name}\n ],\n }\n\n response = client.post(f\"{base_url}/convert/source/async\", json=payload)\n response.raise_for_status()\n task = response.json()\n\n return self._poll_and_fetch_result(client, base_url, task[\"task_id\"], str(file_path))\n\n docling_options = {\n \"to_formats\": [\"json\"],\n \"image_export_mode\": \"placeholder\",\n **(self.docling_serve_opts or {}),\n }\n\n processed_data: list[Data | None] = []\n with (\n httpx.Client(headers=self._process_headers(), verify=self._get_verify_ssl()) as client,\n ThreadPoolExecutor(max_workers=self.max_concurrency) as executor,\n ):\n futures: list[tuple[int, Future]] = []\n for i, file in enumerate(file_list):\n if file.path is None:\n processed_data.append(None)\n continue\n\n futures.append(\n (i, executor.submit(_convert_document, client, file.path, docling_options))\n )\n\n for _index, future in futures:\n try:\n result_data = future.result()\n processed_data.append(result_data)\n except (httpx.HTTPStatusError, httpx.RequestError, KeyError, ValueError) as exc:\n self.log(f\"Docling remote processing failed: {exc}\")\n raise\n\n return self.rollup_data(file_list, processed_data)\n" |
There was a problem hiding this comment.
Preserve secure TLS defaults when the env placeholder is unresolved.
verify_ssl defaults to the literal string DOCLING_SERVE_VERIFY_SSL, but _get_verify_ssl() treats every string except "true", "1", and "yes" as False. That means an unset/unresolved env value silently disables certificate verification instead of keeping it enabled.
🔧 Suggested fix
def _get_verify_ssl(self) -> bool:
- verify = getattr(self, "verify_ssl", "true")
+ verify = getattr(self, "verify_ssl", True)
if isinstance(verify, bool):
return verify
+ if verify is None:
+ return True
if isinstance(verify, str):
- return verify.lower() in ("true", "1", "yes")
+ normalized = verify.strip()
+ if not normalized or normalized == "DOCLING_SERVE_VERIFY_SSL":
+ return True
+ normalized = normalized.lower()
+ if normalized in ("true", "1", "yes"):
+ return True
+ if normalized in ("false", "0", "no"):
+ return False
return TrueAlso applies to: 1028-1047
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@flows/ingestion_flow.json` at line 739, The verify_ssl input defaults to the
literal placeholder "DOCLING_SERVE_VERIFY_SSL" and _get_verify_ssl() treats any
non-true string as False, so unresolved env placeholders disable TLS; fix by (1)
changing the StrInput default or handling the placeholder in _get_verify_ssl():
in function _get_verify_ssl(), detect the unresolved placeholder literal (e.g.,
"DOCLING_SERVE_VERIFY_SSL") and treat it as True (preserve secure default), also
treat None/empty string as True, keep existing boolean and "true"/"1"/"yes"
handling; update the StrInput declaration (name="verify_ssl") only if you prefer
to set a safer default value instead of relying on placeholder.
|
@lucaseduoli @rodageve can we add it to the operator env also ? |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml`:
- Around line 153-157: Helm and the operator disagree on the
DOCLING_SERVE_VERIFY_SSL default—Helm sets it to "true" while the operator sets
"false"—so update the operator to match the secure Helm default: in
kubernetes/operator/internal/controller/env.go change both places where
DOCLING_SERVE_VERIFY_SSL is set to default "false" to instead default to "true"
(these are the two env var definitions that populate the operator's pod env);
ensure no other code branches override it (note src/config/settings.py reads
this env var and src/services/docling_service.py uses it as httpx verify).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6113d4cf-4108-4618-a7ac-dddcaf870bb4
📒 Files selected for processing (7)
kubernetes/helm/openrag/templates/backend/backend-dotenv.yamlkubernetes/helm/openrag/templates/langflow/langflow-dotenv.yamlkubernetes/helm/openrag/values.yamlkubernetes/operator/api/v1alpha1/openrag_types.gokubernetes/operator/internal/controller/env.gokubernetes/operator/internal/controller/env_test.gokubernetes/operator/internal/controller/openrag_controller.go
| {{- if hasKey .Values.global.docling "verifySsl" }} | ||
| DOCLING_SERVE_VERIFY_SSL={{ ternary "true" "false" .Values.global.docling.verifySsl | quote }} | ||
| {{- else }} | ||
| DOCLING_SERVE_VERIFY_SSL="true" | ||
| {{- end }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Compare DOCLING_SERVE_VERIFY_SSL defaults in Helm vs operator
echo "=== Helm backend-dotenv default when verifySsl key absent ==="
rg -n 'DOCLING_SERVE_VERIFY_SSL.*"true"' kubernetes/helm/openrag/templates/backend/
echo ""
echo "=== Operator env.go default for DOCLING_SERVE_VERIFY_SSL ==="
rg -n 'DOCLING_SERVE_VERIFY_SSL.*"false"' kubernetes/operator/internal/controller/env.goRepository: langflow-ai/openrag
Length of output: 541
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Helm backend-dotenv.yaml (around 148-160) ==="
nl -ba kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml | sed -n '140,170p'
echo
echo "=== Operator env.go (around 70-110) ==="
nl -ba kubernetes/operator/internal/controller/env.go | sed -n '60,115p'
echo
echo "=== Operator env.go (around 135-170) ==="
nl -ba kubernetes/operator/internal/controller/env.go | sed -n '130,180p'Repository: langflow-ai/openrag
Length of output: 174
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Helm backend-dotenv.yaml (around 148-160) ==="
cat -n kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml | sed -n '145,165p'
echo
echo "=== Operator env.go (around 70-115) ==="
cat -n kubernetes/operator/internal/controller/env.go | sed -n '70,115p'
echo
echo "=== Operator env.go (around 135-180) ==="
cat -n kubernetes/operator/internal/controller/env.go | sed -n '135,180p'Repository: langflow-ai/openrag
Length of output: 5931
🏁 Script executed:
rg -n "DOCLING_SERVE_VERIFY_SSL" .Repository: langflow-ai/openrag
Length of output: 18841
Align DOCLING_SERVE_VERIFY_SSL default between Helm and the operator
Helm defaults DOCLING_SERVE_VERIFY_SSL to "true" when global.docling.verifySsl is absent (kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml lines 153-157; also kubernetes/helm/openrag/templates/langflow/langflow-dotenv.yaml lines 111-113). The operator defaults it to "false" in both env var sets (kubernetes/operator/internal/controller/env.go lines 87 and 149). Since src/config/settings.py reads this env var and src/services/docling_service.py uses it as httpx’s verify flag, SSL verification will differ by deployment mode unless users set the value explicitly—update one side so the defaults match (or document the intended behavioral difference).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml` around lines
153 - 157, Helm and the operator disagree on the DOCLING_SERVE_VERIFY_SSL
default—Helm sets it to "true" while the operator sets "false"—so update the
operator to match the secure Helm default: in
kubernetes/operator/internal/controller/env.go change both places where
DOCLING_SERVE_VERIFY_SSL is set to default "false" to instead default to "true"
(these are the two env var definitions that populate the operator's pod env);
ensure no other code branches override it (note src/config/settings.py reads
this env var and src/services/docling_service.py uses it as httpx verify).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml (1)
2187-2189: Consider the security implications of defaulting SSL verification tofalse.The CRD correctly reflects the upstream Go type definition and kubebuilder annotations. However, defaulting
verifySsltofalsemeans SSL certificate verification is disabled by default when connecting to external Docling services over HTTPS.Users should be aware that they must explicitly set
verifySsl: truein their OpenRAG CR when connecting to production HTTPS Docling endpoints to ensure certificate validation and prevent MITM attacks.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml` around lines 2187 - 2189, The CRD sets the verifySsl field default to false which disables HTTPS certificate verification; update the OpenRAG CRD so verifySsl defaults to true (or remove the unsafe false default) and document that consumers should set verifySsl: true for production Docling endpoints; locate the verifySsl schema entry in the CRD (the verifySsl property under the relevant spec for OpenRAG) and change its default boolean value to true and ensure any generated Go/kubebuilder tags (e.g., the VerifySsl field on the OpenRAG API type) and controller code respect and validate this default.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@kubernetes/operator/config/crd/bases/openr.ag_openrags.yaml`:
- Around line 2187-2189: The CRD sets the verifySsl field default to false which
disables HTTPS certificate verification; update the OpenRAG CRD so verifySsl
defaults to true (or remove the unsafe false default) and document that
consumers should set verifySsl: true for production Docling endpoints; locate
the verifySsl schema entry in the CRD (the verifySsl property under the relevant
spec for OpenRAG) and change its default boolean value to true and ensure any
generated Go/kubebuilder tags (e.g., the VerifySsl field on the OpenRAG API
type) and controller code respect and validate this default.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eff0fe81-6832-4794-b6fe-a8f2d9cb637e
📒 Files selected for processing (2)
kubernetes/operator/api/v1alpha1/zz_generated.deepcopy.gokubernetes/operator/config/crd/bases/openr.ag_openrags.yaml
This pull request adds support for configuring SSL certificate verification when connecting to Docling Serve, making it possible to disable SSL verification if needed (for example, in local or development environments). The change introduces a new environment variable, input field, and logic for handling the SSL verification flag throughout the codebase.
Configuration and environment variable updates:
DOCLING_SERVE_VERIFY_SSLenvironment variable todocker-compose.ymland included it in theLANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENTlist, allowing users to control SSL verification via environment configuration.Component and input enhancements:
verify_sslinput field in theDoclingRemoteComponent, making SSL verification configurable from the UI or environment. [1] [2]_get_verify_sslmethod to interpret theverify_sslinput and determine whether SSL verification should be enabled.Networking and client usage:
httpx.ClientinDoclingRemoteComponentto use the SSL verification flag by passing the result of_get_verify_ssl(), ensuring consistent behavior during API calls. [1] [2]Summary by CodeRabbit
New Features
Chores
Tests