Skip to content

feat(deployment): Add OpenTelemetry Collector.#2278

Open
Nathan903 wants to merge 74 commits into
y-scope:mainfrom
Nathan903:dev
Open

feat(deployment): Add OpenTelemetry Collector.#2278
Nathan903 wants to merge 74 commits into
y-scope:mainfrom
Nathan903:dev

Conversation

@Nathan903
Copy link
Copy Markdown
Contributor

@Nathan903 Nathan903 commented May 14, 2026

Description

  • Adds an OpenTelemetry Collector sidecar to both Docker Compose and Helm deployments (only instantiated if there is telemetry profile/config flag)
  • Injects OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_RESOURCE_ATTRIBUTES, and CLP_DISABLE_TELEMETRY env vars into every service/component.
  • The DockerComposeController reads the CLP version, builds resource attributes (deployment ID, version, storage engine, OS, arch), and emits deployment topology gauges (worker replicas, concurrency) to the collector at startup.
  • Telemetry is opt-out. Set CLP_DISABLE_TELEMETRY=true (Docker Compose) or clpConfig.telemetry.disable: true (Helm) to disable.

Checklist

  • The PR satisfies the [contribution guidelines][yscope-contrib-guidelines].
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Started CLP with the local telemetry backend (from the clp-telemetry-server repo). Verified otel-collector logs showed incoming OTLP data.

test end-to-end with the telemetry server:
1.

cd ~/clp/tools/deployment/telemetry_server
docker compose up -d
  1. change build/clp-package/etc/clp-config.yaml's endpoint to http://172.17.0.1:4318

  2. test docker compose with start-clp.sh.

  3. check telemetry:

cd ~/clp/tools/deployment/telemetry_server
docker exec telemetry_server-clickhouse-1 clickhouse-client --user default --password clickhouse --query "SELECT * FROM clp_telemetry.otel_metrics_gauge ORDER BY TimeUnix DESC LIMIT 5 FORMAT Vertical"
  1. test helm
cat <<EOF | kind create cluster --name clp --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
  - hostPath: /home
    containerPath: /home
  extraPortMappings:
  - containerPort: 30000
    hostPort: 30000
  - containerPort: 30017
    hostPort: 30017
  - containerPort: 30301
    hostPort: 30301
  - containerPort: 30302
    hostPort: 30302
  - containerPort: 30306
    hostPort: 30306
  - containerPort: 30800
    hostPort: 30800
EOF

cd tools/deployment/package-helm
helm install clp . \
  --set clpConfig.telemetry.endpoint="http://172.17.0.1:4318" \
  --set clpConfig.logs_input.directory="/home" \
  --timeout 10m

kubectl wait --for=condition=ready pod --all --timeout=300s
  1. check telemetry:
cd ~/clp/tools/deployment/telemetry_server
docker exec telemetry_server-clickhouse-1 clickhouse-client --user default --password clickhouse --query "SELECT * FROM clp_telemetry.otel_metrics_gauge ORDER BY TimeUnix DESC LIMIT 5 FORMAT Vertical"

test helm uuid generation

# Host port mappings
export CLP_WEBUI_PORT=30000
export CLP_RESULTS_CACHE_PORT=30017
export CLP_API_SERVER_PORT=30301
export CLP_LOG_INGESTOR_PORT=30302
export CLP_DATABASE_PORT=30306
export CLP_MCP_SERVER_PORT=30800

# Credentials
export CLP_DB_PASS=$(openssl rand -hex 16)
export CLP_DB_ROOT_PASS=$(openssl rand -hex 16)
export CLP_QUEUE_PASS=$(openssl rand -hex 16)
export CLP_REDIS_PASS=$(openssl rand -hex 16)

# Create the `kind` cluster
cat <<EOF | kind create cluster --name clp --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
  - hostPath: /home
    containerPath: /home
  extraPortMappings:
  - containerPort: $CLP_WEBUI_PORT
    hostPort: $CLP_WEBUI_PORT
  - containerPort: $CLP_RESULTS_CACHE_PORT
    hostPort: $CLP_RESULTS_CACHE_PORT
  - containerPort: $CLP_API_SERVER_PORT
    hostPort: $CLP_API_SERVER_PORT
  - containerPort: $CLP_LOG_INGESTOR_PORT
    hostPort: $CLP_LOG_INGESTOR_PORT
  - containerPort: $CLP_DATABASE_PORT
    hostPort: $CLP_DATABASE_PORT
  - containerPort: $CLP_MCP_SERVER_PORT
    hostPort: $CLP_MCP_SERVER_PORT
EOF


cd /home/nathanhung/clp/tools/deployment/package-helm

helm install clp . \
  --set clpConfig.telemetry.disable=false \
  --set clpConfig.webui.port="$CLP_WEBUI_PORT" \
  --set clpConfig.results_cache.port="$CLP_RESULTS_CACHE_PORT" \
  --set clpConfig.api_server.port="$CLP_API_SERVER_PORT" \
  --set clpConfig.log_ingestor.port="$CLP_LOG_INGESTOR_PORT" \
  --set clpConfig.database.port="$CLP_DATABASE_PORT" \
  --set clpConfig.mcp_server.port="$CLP_MCP_SERVER_PORT" \
  --set credentials.database.password="$CLP_DB_PASS" \
  --set credentials.database.root_password="$CLP_DB_ROOT_PASS" \
  --set credentials.queue.password="$CLP_QUEUE_PASS" \
  --set credentials.redis.password="$CLP_REDIS_PASS"

kubectl wait pods --all --for=condition=Ready --timeout=300s

# get the generated UUID from the new ConfigMap:
export FIRST_UUID=$(kubectl get configmap clp-instance-id -o jsonpath='{.data.instanceId}')
echo "Generated UUID is: $FIRST_UUID"

# check that a Deployment (e.g., api-server) successfully received the exact same UUID in its OTEL_RESOURCE_ATTRIBUTES:
kubectl set env deploy/clp-api-server --list | grep OTEL_RESOURCE_ATTRIBUTES
# Look for clp.deployment.id=... and confirm it matches FIRST_UUID
 
# verify the Topology Metrics Job was also generated with the same UUID:
kubectl get job clp-topology-metrics-emitter -o yaml | grep "clp.deployment.id"
# Confirm the stringValue matches FIRST_UUID

# Perform a Helm Upgrade
helm upgrade clp . \
  --set clpConfig.telemetry.disable=false \
  --set clpConfig.webui.port="$CLP_WEBUI_PORT" \
  --set clpConfig.results_cache.port="$CLP_RESULTS_CACHE_PORT" \
  --set clpConfig.api_server.port="$CLP_API_SERVER_PORT" \
  --set clpConfig.log_ingestor.port="$CLP_LOG_INGESTOR_PORT" \
  --set clpConfig.database.port="$CLP_DATABASE_PORT" \
  --set clpConfig.mcp_server.port="$CLP_MCP_SERVER_PORT" \
  --set credentials.database.password="$CLP_DB_PASS" \
  --set credentials.database.root_password="$CLP_DB_ROOT_PASS" \
  --set credentials.queue.password="$CLP_QUEUE_PASS" \
  --set credentials.redis.password="$CLP_REDIS_PASS"

# check UUID is same
export UPGRADE_UUID=$(kubectl get configmap clp-instance-id -o jsonpath='{.data.instanceId}')
echo "Post-upgrade UUID is: $UPGRADE_UUID"

if [ "$FIRST_UUID" = "$UPGRADE_UUID" ]; then
    echo "SUCCESS: The UUID persisted correctly!"
else
    echo "FAILURE: The UUID changed during upgrade!"
fi

kubectl set env deploy/clp-api-server --list | grep OTEL_RESOURCE_ATTRIBUTES


[yscope-contrib-guidelines]: https://docs.yscope.com/dev-guide/contrib-guides-overview.html


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * Optional telemetry via an OpenTelemetry collector service and a telemetry compose/profile for gathering/exporting metrics.
  * Services now receive telemetry environment configuration (OTLP endpoint, resource/service attributes, disable toggle) and a development override exposing the collector port.
  * When enabled, runtime/service/version attributes are attached and topology metrics are emitted to the collector.

* **Chores**
  * Chart version bumped and Helm templates updated to support telemetry resources and conditional collector deployment.

<!-- review_stack_entry_start -->

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/y-scope/clp/pull/2278)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@Nathan903 Nathan903 requested a review from a team as a code owner May 14, 2026 02:28
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)

Walkthrough

Adds OpenTelemetry telemetry: controller toggles and OTEL env population, a new otel-collector service with config, OTEL env injection into fourteen services, and Compose-file wiring to enable the collector via a telemetry profile.

Changes

OpenTelemetry Telemetry Integration

Layer / File(s) Summary
Controller telemetry orchestration
components/clp-package-utils/clp_package_utils/controller.py
DockerComposeController stores instance_id, imports platform for OS/arch detection, conditionally sets CLP_DISABLE_TELEMETRY or populates OTEL_RESOURCE_ATTRIBUTES, OTEL_EXPORTER_OTLP_ENDPOINT, and TELEMETRY_ENDPOINT (reads VERSION, fallback "unknown"), and enables the Docker Compose telemetry profile when telemetry is enabled.
Telemetry instrumentation of application services
tools/deployment/package/docker-compose-all.yaml
Fourteen services (db-table-creator, results-cache-indices-creator, spider-scheduler, compression-scheduler, compression-worker, spider-compression-worker, webui, garbage-collector, query-scheduler, query-worker, reducer, mcp-server, api-server, log-ingestor) are updated with OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_RESOURCE_ATTRIBUTES, OTEL_SERVICE_NAME, and CLP_DISABLE_TELEMETRY environment variables.
OpenTelemetry Collector service and configuration
tools/deployment/package/otel-collector-config.yaml, tools/deployment/package/docker-compose-all.yaml, tools/deployment/package/docker-compose.dev.yaml
Adds otel-collector service running otel/opentelemetry-collector-contrib:0.152.0, mounts otel-collector-config.yaml, configures an OTLP HTTP receiver on port 4318, a batch processor, and an otlphttp exporter using TELEMETRY_ENDPOINT. Dev override exposes port 4318.
Telemetry wiring across Compose deployment profiles
tools/deployment/package/docker-compose.yaml, tools/deployment/package/docker-compose-base.yaml, tools/deployment/package/docker-compose-spider-base.yaml, tools/deployment/package/docker-compose-spider.yaml, tools/deployment/package/docker-compose.dev.yaml
The otel-collector service is added/extended into multiple Compose files so the collector is available when the telemetry profile is activated.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'feat(deployment): Add OpenTelemetry Collector.' is a clear, concise Conventional Commits-style title that directly summarizes the main change: adding OpenTelemetry Collector support to the deployment configuration.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/clp-package-utils/clp_package_utils/controller.py`:
- Around line 1019-1045: The telemetry block should defensively handle collector
endpoint and normalize more architectures: when building env_vars (the same
place where OTEL_RESOURCE_ATTRIBUTES, OTEL_EXPORTER_OTLP_ENDPOINT and
TELEMETRY_ENDPOINT are set), 1) only set OTEL_EXPORTER_OTLP_ENDPOINT from a
configurable value (e.g., self._clp_config.telemetry.collector_endpoint) and/or
set SDK timeout/connection env vars (e.g., OTEL_EXPORTER_OTLP_TRACES_TIMEOUT or
a generic OTEL_EXPORTER_OTLP_TIMEOUT) so instrumented services won't hang if the
collector is unreachable; 2) expand the arch normalization that currently maps
platform.machine().lower() -> "amd64"/"arm64" in the controller to include
common values (i386, i686 -> "386"; armv7l -> "armv7"; ppc64le, s390x, etc.) and
keep a clear fallback for unknown values; and 3) only set TELEMETRY_ENDPOINT
from self._clp_config.telemetry.endpoint if it is present/non-empty. Update the
code around the env_vars assignments and the arch mapping logic to reference the
existing variables (_clp_config.telemetry, _clp_home, env_vars,
OTEL_RESOURCE_ATTRIBUTES, OTEL_EXPORTER_OTLP_ENDPOINT) so the changes are
localized to the telemetry block.

In `@tools/deployment/package/docker-compose-all.yaml`:
- Around line 715-726: Update the otel-collector service: change the image tag
from "otel/opentelemetry-collector-contrib:0.127.0" to a current stable version
(e.g., 0.152.0) and add a healthcheck using the existing *healthcheck_defaults
anchor so the service follows the same health pattern as other services; modify
the otel-collector service block (service name: otel-collector, image field) to
the new tag and append a healthcheck: *healthcheck_defaults entry under that
service.

In `@tools/deployment/package/otel-collector-config.yaml`:
- Around line 16-21: The config only defines a metrics pipeline
(service.pipelines.metrics with receivers: [otlp], processors: [batch],
exporters: [otlphttp]) so traces and logs will be dropped; add
service.pipelines.traces and service.pipelines.logs entries mirroring the
metrics pipeline (using the same receivers, processors and exporters or
appropriate variants) so all three OTEL signal types are handled, or explicitly
document why traces/logs should remain excluded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 014a63d0-cea7-4ccf-9684-c9cf8b589f4a

📥 Commits

Reviewing files that changed from the base of the PR and between 8977316 and 14a46a2.

📒 Files selected for processing (8)
  • components/clp-package-utils/clp_package_utils/controller.py
  • tools/deployment/package/docker-compose-all.yaml
  • tools/deployment/package/docker-compose-base.yaml
  • tools/deployment/package/docker-compose-spider-base.yaml
  • tools/deployment/package/docker-compose-spider.yaml
  • tools/deployment/package/docker-compose.dev.yaml
  • tools/deployment/package/docker-compose.yaml
  • tools/deployment/package/otel-collector-config.yaml

Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread tools/deployment/package/docker-compose-all.yaml
Comment thread tools/deployment/package/otel-collector-config.yaml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tools/deployment/package/otel-collector-config.yaml (1)

20-26: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add traces and logs pipelines to complete OpenTelemetry support.

The collector configuration only defines a metrics pipeline. OpenTelemetry supports three signal types: metrics, traces, and logs. If instrumented services emit traces or logs (which many OTEL SDKs do by default), that telemetry data will be dropped or result in errors.

📊 Proposed fix to add all telemetry signals
 service:
   extensions: [health_check]
   pipelines:
     metrics:
       receivers: [otlp]
       processors: [batch]
       exporters: [otlphttp]
+    traces:
+      receivers: [otlp]
+      processors: [batch]
+      exporters: [otlphttp]
+    logs:
+      receivers: [otlp]
+      processors: [batch]
+      exporters: [otlphttp]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package/otel-collector-config.yaml` around lines 20 - 26,
The config only defines a metrics pipeline so traces and logs will be dropped;
add pipelines named "traces" and "logs" under service.pipelines mirroring the
existing metrics pipeline (use receivers: [otlp], processors: [batch],
exporters: [otlphttp]) so that the OpenTelemetry collector handles all three
signal types; ensure the new pipeline names match the OpenTelemetry signal keys
("traces" and "logs") and follow the same structure as the existing "metrics"
block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/deployment/package/docker-compose-all.yaml`:
- Around line 715-734: Apply the shared service defaults to the otel-collector
service by adding the YAML merge key for the existing service_defaults anchor
(i.e., include "<<: *service_defaults" under the otel-collector service), remove
the explicit restart: "${CLP_RESTART_POLICY...}" line (since restart is
inherited), and keep its existing healthcheck: block (which already uses
*healthcheck_defaults); this will ensure it inherits logging driver, user
(CLP_THIRD_PARTY_SERVICE_UID_GID), stop_grace_period, and extra_hosts like other
third-party services.

---

Duplicate comments:
In `@tools/deployment/package/otel-collector-config.yaml`:
- Around line 20-26: The config only defines a metrics pipeline so traces and
logs will be dropped; add pipelines named "traces" and "logs" under
service.pipelines mirroring the existing metrics pipeline (use receivers:
[otlp], processors: [batch], exporters: [otlphttp]) so that the OpenTelemetry
collector handles all three signal types; ensure the new pipeline names match
the OpenTelemetry signal keys ("traces" and "logs") and follow the same
structure as the existing "metrics" block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 255f26d4-25a0-4fe6-a430-01cc93fdfe2c

📥 Commits

Reviewing files that changed from the base of the PR and between 45ef706 and 24fe923.

📒 Files selected for processing (2)
  • tools/deployment/package/docker-compose-all.yaml
  • tools/deployment/package/otel-collector-config.yaml

Comment thread tools/deployment/package/docker-compose-all.yaml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/deployment/package/otel-collector-config.yaml`:
- Around line 3-4: Duplicate YAML key "endpoint" under the health_check block
causes ambiguity; remove one of the entries so only a single endpoint remains
(e.g., keep endpoint: "0.0.0.0:13133" and delete the other). Locate the
health_check section and ensure only one endpoint key is present and correctly
formatted, then validate the YAML to confirm no duplicate keys remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e7ca0525-caf7-4399-8df5-cffcb5310704

📥 Commits

Reviewing files that changed from the base of the PR and between 24fe923 and 21d37ca.

📒 Files selected for processing (1)
  • tools/deployment/package/otel-collector-config.yaml

Comment thread tools/deployment/package/otel-collector-config.yaml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/clp-package-utils/clp_package_utils/controller.py`:
- Around line 1176-1177: The function-level imports of json, time (and a
duplicate platform) should be moved to the module top: remove the in-function
import statements (the ones importing json and time around lines ~1176 and the
platform import around ~1217) and ensure json and time are imported alongside
the existing top-level platform import; update any functions that relied on
those local imports to use the module-level names (no other code changes
needed).
- Around line 1235-1257: The current code in controller.py spawns a docker
container via subprocess.run to POST metrics (the block that constructs the
docker run array referencing self._project_name and payload_str), which is
costly and fragile; replace that subprocess.run call with a direct HTTP POST
from the controller process (e.g., use requests.post or urllib) to
"http://otel-collector:4318/v1/metrics" with headers
{"Content-Type":"application/json"}, body payload_str, and timeout=5, preserve
the existing exception handling and logger.warning(e) behavior (the same message
used where you currently catch Exception as e) so failures are logged the same
way but without requiring the curl image or spawning containers.
- Around line 1213-1231: Duplicate logic: reading VERSION from
self._clp_home/VERSION and building resource_attrs is repeated between
set_up_env and _emit_topology_metrics; refactor by extracting this into a helper
or caching it on the instance. Specifically, in set_up_env populate
self._clp_version (from VERSION file) and self._resource_attrs (a dict/list
matching the current resource_attrs shape) and then update
_emit_topology_metrics to use self._clp_version and self._resource_attrs instead
of re-reading VERSION or reconstructing resource_attrs and appending directly to
payload["resourceMetrics"][0]["resource"]["attributes"]; ensure keys like
"clp.deployment.id", "service.version", "clp.deployment.method",
"clp.storage.engine", "os.type", "host.arch", and "service.name" are set once
and reused.

In `@tools/deployment/package-helm/templates/_helpers.tpl`:
- Around line 53-58: The clp.resourceAttributes template is incorrectly using
.Capabilities.KubeVersion for os.type (producing values like v1.28.0); update
the template (define "clp.resourceAttributes") to use a proper OS source such as
a new Values field or a static default, e.g., replace the os.type expression
with .Values.hostOS | default "linux" (or simply the literal "linux" if you
don't want a new value), ensuring the attribute reads os.type=<os> instead of
the Kubernetes version.

In `@tools/deployment/package-helm/templates/otel-collector-configmap.yaml`:
- Line 5: Replace the hardcoded ConfigMap name "otel-collector-config" with the
chart fullname helper so the resource uses {{ include "clp.fullname" . }} (i.e.,
set metadata.name to the clp.fullname helper) and update the corresponding
reference in otel-collector-deployment.yaml (the ConfigMap volume/envRef
reference at line ~34) to use the same include "clp.fullname" . helper so both
templates resolve to the same release-scoped name.

In `@tools/deployment/package-helm/templates/otel-collector-deployment.yaml`:
- Around line 6-7: Replace the hard-coded labels block in the otel-collector
Deployment template with the chart's standard Helm label helpers: use the
clp.labels helper for pod/metadata labels and clp.selectorLabels for the
selector/matchLabels so labels are consistent with the rest of the chart; update
the labels at the Deployment metadata and the pod template metadata (the blocks
currently using app: otel-collector) and the spec.selector.matchLabels to call
{{ include "clp.labels" . }} and {{ include "clp.selectorLabels" . }} (or the
chart's equivalent helper names) so selectors and labels remain in sync.
- Line 5: The Deployment name is hardcoded as "otel-collector"; change it to use
the chart helper so it follows the release-scoped naming pattern by replacing
the literal with the template helper call (use the same pattern as other
resources: {{ include "clp.fullname" . }}-otel-collector or similar) in the
otel-collector Deployment manifest and update any places that reference that
name (e.g., the corresponding Service and selector/ownerReferences) so all
references use the helper-derived name consistently; ensure you use the exact
helper symbol "clp.fullname" as used elsewhere in the chart.
- Around line 19-30: The otel-collector container (name: otel-collector) is
missing Kubernetes health probes; add both readinessProbe and livenessProbe to
the container spec using HTTP GET checks against the collector health endpoint
(e.g., path /health on port 13133) with sensible initialDelaySeconds,
periodSeconds and failureThreshold values; ensure your collector configuration
includes the health_check extension (or change the probe path/port to match your
config) so the probes succeed.
- Around line 21-22: The container command overrides the image ENTRYPOINT and
currently only supplies the "--config=/etc/otelcol-contrib/config.yaml" arg,
which prevents the otelcol binary from being executed; update the pod spec's
command array to include the binary name "/otelcontribcol" as the first element
followed by the existing "--config=/etc/otelcol-contrib/config.yaml" argument so
the process runs correctly (modify the command: array that currently contains
"--config=..." to be ["/otelcontribcol",
"--config=/etc/otelcol-contrib/config.yaml"]).
- Line 20: The template uses a guard checking .Values.clpConfig.telemetry but
then references .Values.telemetry.collector.image and
.Values.telemetry.collector.resources which are the wrong value paths; update
the image and resources references to use
.Values.clpConfig.telemetry.collector.image and
.Values.clpConfig.telemetry.collector.resources respectively so they match the
conditional (.Values.clpConfig.telemetry) and the configmap usage
(.Values.clpConfig.telemetry.endpoint).

In `@tools/deployment/package-helm/templates/otel-collector-service.yaml`:
- Around line 6-7: Replace the hardcoded labels and selector in the
otel-collector service template with the chart's standard label helpers: change
the literal "labels: app: otel-collector" block to use {{ include "clp.labels" .
}} for metadata.labels and replace the selector block to use {{ include
"clp.selectorLabels" . }} so the service uses the same controlled label/selector
helpers as the rest of the chart.
- Around line 10-12: The Service port block for the otel collector currently
lacks a name; add a descriptive name field (e.g., name: "otlp-http") to the port
definition where port: 4318 / protocol: TCP / targetPort: 4318 is declared so
the Service port is named (helps future multi-port exposure and follows k8s best
practices). Update the Service port entry in the otel-collector Service manifest
(the port definition with port: 4318, protocol: TCP, targetPort: 4318) to
include the name key.
- Line 5: The Service manifest currently hardcodes the name "otel-collector";
change it to use the chart helper clp.fullname so the Service name follows the
chart's naming convention and avoids collisions. Locate the Service resource
where name: otel-collector is set (the otel-collector Service template) and
replace the literal with the Helm template helper invocation that outputs the
release-scoped fullname (use the clp.fullname helper), ensuring any surrounding
quotes/templating match other templates in the chart.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 338898c2-d7e8-41c1-9d8c-544a0633c11a

📥 Commits

Reviewing files that changed from the base of the PR and between 21d37ca and e2e21ec.

📒 Files selected for processing (15)
  • components/clp-package-utils/clp_package_utils/controller.py
  • tools/deployment/package-helm/Chart.yaml
  • tools/deployment/package-helm/templates/_helpers.tpl
  • tools/deployment/package-helm/templates/api-server-deployment.yaml
  • tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml
  • tools/deployment/package-helm/templates/compression-worker-deployment.yaml
  • tools/deployment/package-helm/templates/log-ingestor-deployment.yaml
  • tools/deployment/package-helm/templates/otel-collector-configmap.yaml
  • tools/deployment/package-helm/templates/otel-collector-deployment.yaml
  • tools/deployment/package-helm/templates/otel-collector-service.yaml
  • tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
  • tools/deployment/package-helm/templates/query-worker-deployment.yaml
  • tools/deployment/package-helm/templates/reducer-deployment.yaml
  • tools/deployment/package/docker-compose-all.yaml
  • tools/deployment/package/otel-collector-config.yaml

Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread tools/deployment/package-helm/templates/_helpers.tpl
Comment thread tools/deployment/package-helm/templates/otel-collector-configmap.yaml Outdated
Comment thread tools/deployment/package-helm/templates/otel-collector-deployment.yaml Outdated
Comment thread tools/deployment/package-helm/templates/otel-collector-deployment.yaml Outdated
Comment thread tools/deployment/package-helm/templates/otel-collector-service.yaml Outdated
Comment thread tools/deployment/package-helm/templates/otel-collector-service.yaml Outdated
Comment thread tools/deployment/package-helm/templates/otel-collector-service.yaml Outdated
Nathan903 added 7 commits May 15, 2026 15:11
* Revert "Reapply "remove server""

This reverts commit c71e830.

* use resourcedetectionprocessor

* matching style quotes

* use resource detection

* move configmap.yaml

* persisting uuid

* follow deployment.yaml

* imagepull policy

* move file

* replica count method

* avoid hardcode

* remove dev.yaml

* healthchecks

* remove telemetry-server
Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good. here are the very last batch of suggestions

for the title, how about simply:

feat(deployment): Add OpenTelemetry Collector.

Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread tools/deployment/package-helm/values.yaml Outdated
Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread components/clp-package-utils/clp_package_utils/controller.py
Comment thread tools/deployment/package-helm/templates/otel-collector-service.yaml Outdated
Comment thread tools/deployment/package/docker-compose-all.yaml
Comment thread tools/deployment/package/docker-compose-all.yaml Outdated
Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
@Nathan903 Nathan903 changed the title feat(deployment): Add OpenTelemetry Collector and inject telemetry env vars in Docker Compose and Helm feat(deployment): Add OpenTelemetry Collector. May 19, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds optional (opt-out) OpenTelemetry telemetry support to CLP deployments by introducing an OpenTelemetry Collector in both Docker Compose and Helm, wiring OTEL environment variables into services, and emitting initial “deployment topology” gauge metrics at startup.

Changes:

  • Adds an otel-collector service to Docker Compose deployments and a conditional collector Deployment/Service to the Helm chart.
  • Injects OpenTelemetry-related env vars (OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_RESOURCE_ATTRIBUTES, CLP_DISABLE_TELEMETRY) into (some) components.
  • Emits topology metrics on Docker Compose startup and adds a Helm hook Job to emit a similar payload post-install/upgrade.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/deployment/package/docker-compose.yaml Includes otel-collector in the full Docker Compose deployment.
tools/deployment/package/docker-compose-spider.yaml Includes otel-collector in the spider full Docker Compose deployment.
tools/deployment/package/docker-compose-spider-base.yaml Includes otel-collector in the spider base Docker Compose deployment.
tools/deployment/package/docker-compose-base.yaml Includes otel-collector in the base Docker Compose deployment.
tools/deployment/package/docker-compose-all.yaml Adds shared telemetry env defaults, OTEL env injection to services, and defines the otel-collector service.
tools/deployment/package-helm/values.yaml Adds Helm values for collector image and telemetry config.
tools/deployment/package-helm/templates/topology-metrics-emitter-job.yaml New Helm hook Job that POSTs topology metrics to the in-cluster collector.
tools/deployment/package-helm/templates/reducer-deployment.yaml Injects OTEL env vars into reducer pod.
tools/deployment/package-helm/templates/query-worker-deployment.yaml Injects OTEL env vars into query-worker pod.
tools/deployment/package-helm/templates/query-scheduler-deployment.yaml Injects OTEL env vars into query-scheduler pod.
tools/deployment/package-helm/templates/otel-collector-service.yaml New Service for the in-cluster OTEL collector (conditional).
tools/deployment/package-helm/templates/otel-collector-deployment.yaml New Deployment for the in-cluster OTEL collector (conditional).
tools/deployment/package-helm/templates/log-ingestor-deployment.yaml Injects OTEL env vars into log-ingestor pod.
tools/deployment/package-helm/templates/instance-id-configmap.yaml New ConfigMap to persist a stable instance UUID across Helm upgrades.
tools/deployment/package-helm/templates/configmap.yaml Adds collector config content into the chart ConfigMap (conditional).
tools/deployment/package-helm/templates/compression-worker-deployment.yaml Injects OTEL env vars into compression-worker pod.
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml Injects OTEL env vars into compression-scheduler pod.
tools/deployment/package-helm/templates/api-server-deployment.yaml Injects OTEL env vars into api-server pod.
tools/deployment/package-helm/templates/_helpers.tpl Adds helpers for instanceId persistence, resource attributes, topology payload generation, and telemetry env var injection.
tools/deployment/package-helm/Chart.yaml Bumps chart version.
components/package-template/src/etc/otel-collector/config.yaml Adds a default OTEL collector config file to the package template (Docker Compose path).
components/clp-package-utils/clp_package_utils/general.py Adds a small HTTP helper with retries for emitting topology metrics.
components/clp-package-utils/clp_package_utils/controller.py Wires telemetry env setup, collector configuration, and topology metrics emission into DockerComposeController startup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread components/clp-package-utils/clp_package_utils/general.py
Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
Comment thread components/clp-package-utils/clp_package_utils/controller.py
Comment on lines +709 to +716
volumes:
- type: "bind"
source: "${CLP_OTEL_COLLECTOR_CONF_FILE_HOST:-./etc/otel-collector/config.yaml}"
target: "/etc/otelcol-contrib/config.yaml"
read_only: true
- type: "bind"
source: "/var/run/docker.sock"
target: "/var/run/docker.sock"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this access would mean needing env vars to pass in the system info. Not sure if the security benefit is worth the complexity of changing. @junhaoliao can you please advise

Comment thread tools/deployment/package-helm/values.yaml
Comment thread tools/deployment/package-helm/templates/_helpers.tpl
Comment thread tools/deployment/package-helm/templates/_helpers.tpl
Comment thread tools/deployment/package-helm/templates/_helpers.tpl
Comment thread tools/deployment/package-helm/templates/_helpers.tpl
Comment on lines +99 to +112
{{/*
Provides environment variables for telemetry (except service.name).
*/}}
{{- define "clp.telemetryEnv" -}}
{{- if and .Values.clpConfig.telemetry .Values.clpConfig.telemetry.disable }}
- name: CLP_DISABLE_TELEMETRY
value: "true"
{{- else }}
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://{{ include "clp.fullname" . }}-otel-collector:4318"
- name: OTEL_RESOURCE_ATTRIBUTES
value: {{ include "clp.resourceAttributes" . | quote }}
{{- end }}
{{- end -}}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao we dont want to include anything not yet used right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants