feat(deployment): Add OpenTelemetry metrics for Rust#2286
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughOpenTelemetry metrics added: clp-rust-utils exposes init/shutdown utilities and error variant; api-server and log-ingestor initialize telemetry, register meters and counters, emit startup metrics, and rely on guard drop for provider shutdown. Ingestion buffer records bytes and record counters. ChangesOpenTelemetry Metrics Infrastructure
Sequence DiagramsequenceDiagram
participant Service as Service (api-server/log-ingestor)
participant InitTel as init_telemetry()
participant OTLPExp as HTTP OTLP Exporter
participant Provider as SdkMeterProvider
participant Global as global::set_meter_provider
participant ShutdownTel as shutdown_telemetry()
Service->>InitTel: call with telemetry config and service_name
InitTel->>OTLPExp: create HTTP metric exporter
InitTel->>Provider: build SdkMeterProvider with exporter
InitTel->>Global: register as global meter provider
InitTel-->>Service: return Some(provider)
Service->>Service: create service meter and counters
Service->>Service: emit startup/operational metrics
Service->>ShutdownTel: call with provider on graceful exit (Drop)
ShutdownTel->>Provider: shutdown()
ShutdownTel-->>Service: return
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/api-server/src/bin/api_server.rs`:
- Around line 82-87: The
axum::serve(...).with_graceful_shutdown(shutdown_signal()).await currently uses
? which returns early and skips
clp_rust_utils::telemetry::shutdown_telemetry(tel_provider); change this to
capture the result (e.g., let serve_res =
axum::serve(...).with_graceful_shutdown(...).await;), then call
clp_rust_utils::telemetry::shutdown_telemetry(tel_provider); and finally
propagate the serve result (return or use serve_res?); ensure you reference the
axum::serve call, shutdown_signal, and
clp_rust_utils::telemetry::shutdown_telemetry(tel_provider) so telemetry always
runs on both success and error paths.
In `@components/clp-rust-utils/src/telemetry.rs`:
- Around line 26-30: The telemetry initialization currently panics via expect
when building the OTLP metric exporter (the call to
opentelemetry_otlp::MetricExporter::builder().with_http().with_endpoint(endpoint).build().expect(...)),
so change the telemetry init function to return a Result (or return an Option)
and handle exporter build failures instead of panicking: replace the expect call
with proper error handling (e.g., let exporter = ...build()? or match the Result
and log the error and disable telemetry by returning Ok(None) / an appropriate
Err), update the telemetry init function signature to return
Result<TelemetryHandle, Error> (or Option) and propagate or gracefully handle
the error where the function is called (ensuring code paths can run without
telemetry).
In `@components/log-ingestor/src/bin/log_ingestor.rs`:
- Around line 85-90: The call to
axum::serve(...).with_graceful_shutdown(shutdown_signal()).await? can return
early and skip clp_rust_utils::telemetry::shutdown_telemetry(tel_provider);
change it to first capture the serve result (e.g., let serve_result =
axum::serve(...).with_graceful_shutdown(...).await;), then always call
clp_rust_utils::telemetry::shutdown_telemetry(tel_provider); and finally
propagate the serve outcome (e.g., use serve_result? or return serve_result) so
telemetry is shut down regardless of server error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9b53f282-438c-467d-b5b0-7ba36000e79d
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
components/api-server/Cargo.tomlcomponents/api-server/src/bin/api_server.rscomponents/clp-rust-utils/Cargo.tomlcomponents/clp-rust-utils/src/lib.rscomponents/clp-rust-utils/src/telemetry.rscomponents/log-ingestor/Cargo.tomlcomponents/log-ingestor/src/bin/log_ingestor.rs
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/api-server/src/bin/api_server.rs`:
- Around line 58-60: The startup metric is emitted too early; move the
startup_counter.add(1, &[opentelemetry::KeyValue::new("type", "start")]) call so
it runs only after all critical startup preconditions (binding/listening,
connecting dependencies, router initialization) have completed successfully.
Locate the meter/u64_counter creation (meter and startup_counter) and keep
creation where it is but remove the add() call from its current position and
place it immediately after the code path that confirms successful
bind/connect/router setup (the success/ready branch that actually starts the
server), ensuring any error paths do not call startup_counter.add.
- Around line 56-57: Wrap the result of
clp_rust_utils::telemetry::init_telemetry(&config.telemetry) (stored in
tel_provider) in a drop guard so telemetry::shutdown is always called on
unwind/early return; implement a small RAII guard type (e.g., TelemetryGuard)
whose Drop calls clp_rust_utils::telemetry::shutdown/tel_provider.shutdown and
store the guard next to tel_provider, or use the scopeguard crate to defer the
shutdown, and do the same for the second telemetry init (the later init at the
other call site) so shutdown runs on all exit paths, not just after serve.
In `@components/clp-rust-utils/src/telemetry.rs`:
- Around line 31-34: Replace the use of Resource::default() when building the
SdkMeterProvider so metrics carry an explicit service identity: build a Resource
via Resource::builder_empty().with_attributes(vec![KeyValue::new("service.name",
"<service-identifier>")]) and pass that resource to
SdkMeterProvider::builder().with_reader(reader).with_resource(...) before
.build(); use an appropriate concrete service identifier string in place of
"<service-identifier>" so SdkMeterProvider (and exported metrics) are
unambiguously attributed.
In `@components/log-ingestor/src/bin/log_ingestor.rs`:
- Around line 64-67: Counters _bytes_total and _records_total are created with
meter.u64_counter("clp.ingest.bytes_total") and
meter.u64_counter("clp.ingest.records_total") but never used; find the ingest
path (e.g., the function handling incoming records / the loop that
writes/forwards data) and call add(...) on those instruments after successful
processing—increment _bytes_total.add(n_bytes, &[...]) with the number of bytes
ingested and _records_total.add(1, &[...]) per record (include any relevant
attributes/tags you already use). Ensure you keep the variables in scope (remove
the leading underscore if necessary) and place the add(...) calls in the success
branch (not on errors) so the counters reflect actual ingested bytes and
records.
- Around line 62-63: The telemetry provider created by
clp_rust_utils::telemetry::init_telemetry (assigned to tel_provider) may be
skipped on early returns because main uses the ? operator; wrap tel_provider in
a RAII guard or use a defer-like Drop handler so shutdown always runs on any
return from main (including pre-serve failures). Concretely, create a small
guard type or use scopeguard::defer in main right after tel_provider is created
(or replace tel_provider with a guard that holds it) and call the telemetry
shutdown/flush in its Drop/closure so telemetry shutdown executes regardless of
where main returns (this should cover the code paths around the tel_provider
creation and the later return points around the serve/exit logic).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bebc6afd-7b87-4f08-83c0-076b758311d0
📒 Files selected for processing (4)
components/api-server/src/bin/api_server.rscomponents/clp-rust-utils/src/error.rscomponents/clp-rust-utils/src/telemetry.rscomponents/log-ingestor/src/bin/log_ingestor.rs
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/log-ingestor/src/bin/log_ingestor.rs`:
- Line 62: The telemetry provider created by
clp_rust_utils::telemetry::init_telemetry (tel_provider) can be leaked on early
returns from TcpListener::bind, IngestionJobManagerState::from_config, or
create_router; wrap tel_provider in a guard (e.g., scopeguard::guard) or use an
RAII wrapper type from clp-rust-utils::telemetry that implements Drop to call
clp_rust_utils::telemetry::shutdown_telemetry automatically, and then remove the
explicit shutdown_telemetry call currently executed at the end of main so the
provider is always flushed and cleaned up on scope exit.
In `@components/log-ingestor/src/compression/buffer.rs`:
- Around line 81-82: The metrics calls to bytes_total.add(...) and
records_total.add(...) use empty attribute slices (&[]); if contextual labels
like job_id, source, or object_type are available in scope, construct an
attributes slice (e.g. using KeyValue::new(...) items) and pass it into both
bytes_total.add(entry.size, &attributes) and records_total.add(1, &attributes)
so the metrics become queryable; update the code around the bytes_total and
records_total uses to build and reuse the attributes array (or leave &[] if no
context is available).
- Around line 76-78: The meter and counters (meter, bytes_total, records_total)
are being created inside the hot add() path; move their creation out of add()
and into the Buffer struct as fields (e.g., add fields meter: Meter,
bytes_total: u64::Counter, records_total: u64::Counter or the correct
OpenTelemetry types) and initialize them once in Buffer::new (note Buffer::new
will no longer be const fn because instrument construction is runtime). Then
update add() to use these struct fields instead of rebuilding the instruments on
each call.
- Line 5: Remove the unused import opentelemetry::KeyValue from the top of the
file (it's unused because attributes are passed as empty slices around the code
where attributes are constructed at lines ~81–82); simply delete the `use
opentelemetry::KeyValue;` line to clean up imports and run a build/lint to
confirm no other references remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 88323138-4538-41c4-bfd5-2e420b819937
📒 Files selected for processing (2)
components/log-ingestor/src/bin/log_ingestor.rscomponents/log-ingestor/src/compression/buffer.rs
There was a problem hiding this comment.
♻️ Duplicate comments (1)
components/clp-rust-utils/src/telemetry.rs (1)
21-21:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore caller-provided
service.nameon the meter provider.
Resource::default()makes service identity depend on env/default fallback, soapi-serverandlog-ingestorcan both end up asunknown_service:*whenOTEL_SERVICE_NAMEis unset. Please threadservice_name: &strthroughinit_telemetryand build theResourcefrom that instead of using the default.For opentelemetry-sdk Rust 0.27.x metrics, how should `service.name` be set on an `SdkMeterProvider`, and what fallback value is used when it is omitted?Based on learnings: In
components/clp-rust-utils/src/telemetry.rs,init_telemetryis intended to takeservice_name: &strfrom callers and use it to populate theservice.nameResourceattribute on theSdkMeterProvider; it should not hardcode the service name inside the utility.Also applies to: 36-39
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/clp-rust-utils/src/telemetry.rs` at line 21, The init_telemetry function currently uses Resource::default(), which lets OTEL fallbacks produce unknown_service:*; change init_telemetry(telemetry_config: &Telemetry) -> ... to accept an extra service_name: &str parameter and construct the Resource explicitly from that service_name (e.g., Resource::new(vec![KeyValue::new("service.name", service_name)]) or equivalent) before building the SdkMeterProvider; update any callers to pass their concrete service name so SdkMeterProvider contains the caller-provided service.name instead of relying on env/default fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@components/clp-rust-utils/src/telemetry.rs`:
- Line 21: The init_telemetry function currently uses Resource::default(), which
lets OTEL fallbacks produce unknown_service:*; change
init_telemetry(telemetry_config: &Telemetry) -> ... to accept an extra
service_name: &str parameter and construct the Resource explicitly from that
service_name (e.g., Resource::new(vec![KeyValue::new("service.name",
service_name)]) or equivalent) before building the SdkMeterProvider; update any
callers to pass their concrete service name so SdkMeterProvider contains the
caller-provided service.name instead of relying on env/default fallback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ae23d7ed-7b45-46b8-b671-949a8fe800c8
📒 Files selected for processing (4)
components/api-server/src/bin/api_server.rscomponents/clp-rust-utils/src/telemetry.rscomponents/log-ingestor/src/bin/log_ingestor.rscomponents/log-ingestor/src/compression/buffer.rs
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry metrics support to CLP’s Rust services by introducing a shared telemetry initializer in clp-rust-utils, wiring it into api-server and log-ingestor, and emitting basic counters for service startup and ingestion volume.
Changes:
- Introduce
clp-rust-utils::telemetrywithinit_telemetry(),TelemetryGuard, andshutdown_telemetry()for OTLP metrics export. - Emit metrics:
api-server:clp.service.eventcounter on startup.log-ingestor:clp.ingest.bytes_total/clp.ingest.records_totalcounters per ingested object inBuffer::add.
- Update Rust telemetry disabling behavior to check env var values (vs. mere existence) and propagate exporter build failures as errors.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| components/log-ingestor/src/compression/buffer.rs | Adds OpenTelemetry counters to ingestion buffering to track bytes/records ingested. |
| components/log-ingestor/src/bin/log_ingestor.rs | Initializes telemetry at startup and ensures telemetry guard lifetime covers server execution. |
| components/log-ingestor/Cargo.toml | Adds opentelemetry dependency for metrics instrumentation. |
| components/clp-rust-utils/src/telemetry.rs | New shared telemetry initialization/shutdown module for OTLP metric export. |
| components/clp-rust-utils/src/lib.rs | Exposes the new telemetry module. |
| components/clp-rust-utils/src/error.rs | Adds a dedicated error variant for OTLP exporter build failures. |
| components/clp-rust-utils/Cargo.toml | Adds OpenTelemetry SDK + OTLP exporter dependencies. |
| components/api-server/src/error.rs | Maps the new telemetry exporter build error into ClientError. |
| components/api-server/src/bin/api_server.rs | Initializes telemetry and emits a startup event metric. |
| components/api-server/Cargo.toml | Adds opentelemetry dependency for metrics instrumentation. |
| Cargo.lock | Locks new OpenTelemetry deps and updates transitive crate versions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
Adds telemetry metrics to CLP's Rust services (api-server, log-ingestor).
telemetrymodule inclp-rust-utilswithinit_telemetry(),TelemetryGuard, andshutdown_telemetry()api-server: emitsclp.service.eventcounter on startuplog-ingestor: emitsclp.ingest.bytes_totalandclp.ingest.records_totalcounters per ingested object inBuffer::addCLP_DISABLE_TELEMETRYnow checks the env var's value against["1", "true", "yes", "y"]instead of checking existenceChecklist
breaking change.
Validation performed
Started CLP with the local telemetry backend (from the
clp-telemetry-serverrepo). Verifiedotel-collectorlogs showed incoming OTLP data.test end-to-end with the telemetry server:
1.
change build/clp-package/etc/clp-config.yaml's endpoint to http://172.17.0.1:4318
test docker compose with start-clp.sh.
check telemetry:
Summary by CodeRabbit
New Features
Bug Fixes