Skip to content

feat(storage): emit per-RPC latency log for put/get handlers#95

Merged
jacderida merged 1 commit into
WithAutonomi:mainfrom
jacderida:feat-storage-rpc-latency-log
May 12, 2026
Merged

feat(storage): emit per-RPC latency log for put/get handlers#95
jacderida merged 1 commit into
WithAutonomi:mainfrom
jacderida:feat-storage-rpc-latency-log

Conversation

@jacderida
Copy link
Copy Markdown
Collaborator

Summary

Adds a structured info! log event per chunk-store PUT and GET RPC with duration_ms, outcome, and address (and chunk_size on PUT). The tracing target is ant_node::storage::rpc_latency so Elasticsearch / Kibana can build p50/p95/p99 store-RPC latency histograms from the existing telegraf log forwarding without any node-side metric pipeline.

Motivation: a recent testnet (DEV-01) saw aggregate upload throughput halve over 12h before any node failures, with file-size-sensitive degradation that strongly suggests CLOSE_GROUP quorum / slowest-peer-dominates cost on the client. Confirming requires both client-side per-chunk timing (in flight as WithAutonomi/ant-client#87) and node-side per-RPC service time — this PR adds the latter.

Changes

  • src/storage/handler.rs: extract the existing handle_put / handle_get bodies into _inner helpers; wrap with Instant::now() and emit a tracing event on every exit path, including the early-return validation paths. Existing per-RPC info!/warn! log lines are kept for human-readable diagnostics.

Test plan

  • cargo check --workspace clean
  • cargo clippy --workspace --tests clean
  • Existing storage handler tests pass
  • Spot-check on a dev node that the new event appears in journal output with the expected fields

🤖 Generated with Claude Code

Adds a structured `info!` log event per chunk-store PUT and GET RPC
with duration_ms, outcome, and address (and chunk_size on PUT). The
target is `ant_node::storage::rpc_latency` so Elasticsearch / Kibana
can build p50/p95/p99 store-RPC latency histograms from the existing
telegraf log forwarding without any node-side metric pipeline.

Motivates DEV-01 post-mortem: aggregate upload throughput halved
over 12h before any node failures, file-size sensitivity points at
CLOSE_GROUP quorum / slowest-peer cost on the client, but the
node-side per-RPC service time is currently unobservable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 12, 2026 19:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds structured per-RPC latency tracing for chunk-store PUT/GET handlers so latency histograms can be built from forwarded logs (without adding a metrics pipeline), by wrapping the existing handler logic and emitting a single info! event with duration_ms, outcome, and chunk address (plus chunk_size on PUT).

Changes:

  • Wrapped handle_put/handle_get with timing and a structured info! event targeting ant_node::storage::rpc_latency.
  • Extracted the previous PUT/GET logic into handle_put_inner / handle_get_inner so the wrapper can log on all return paths.
Comments suppressed due to low confidence (1)

src/storage/handler.rs:296

  • Same as PUT: Instant::now(), hex::encode(...), and duration_ms/outcome are computed regardless of whether the rpc_latency info event will actually be emitted. Please gate this work with crate::logging::enabled! (preferably for this target) and avoid the extra hex-encode in the wrapper (the inner handler already encodes for its debug/warn logs).
        let start = std::time::Instant::now();
        let addr_hex = hex::encode(request.address);
        let response = self.handle_get_inner(request).await;
        let duration_ms = u64::try_from(start.elapsed().as_millis()).unwrap_or(u64::MAX);
        let outcome: &'static str = match &response {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/storage/handler.rs
Comment on lines +174 to +194
let start = std::time::Instant::now();
let addr_hex = hex::encode(request.address);
let chunk_size = request.content.len();
let response = self.handle_put_inner(request).await;
let duration_ms = u64::try_from(start.elapsed().as_millis()).unwrap_or(u64::MAX);
let outcome: &'static str = match &response {
ChunkPutResponse::Success { .. } => "success",
ChunkPutResponse::AlreadyExists { .. } => "already_exists",
ChunkPutResponse::PaymentRequired { .. } => "payment_required",
ChunkPutResponse::Error(_) => "error",
_ => "unknown",
};
info!(
target: "ant_node::storage::rpc_latency",
duration_ms,
chunk_size,
outcome,
addr = %addr_hex,
"put_rpc"
);
response
@jacderida jacderida merged commit d690d32 into WithAutonomi:main May 12, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants