Skip to content

feat: bulk_delete_data on Storage#199

Draft
sejori wants to merge 6 commits intomainfrom
feat/prometheus-user-throughput
Draft

feat: bulk_delete_data on Storage#199
sejori wants to merge 6 commits intomainfrom
feat/prometheus-user-throughput

Conversation

@sejori
Copy link
Copy Markdown
Contributor

@sejori sejori commented Mar 30, 2026

Changes:

  • src/daemon/mod.rs — Added email and completion_window labels to per-user Prometheus metrics (fusillade_user_requests_in_flight gauge, fusillade_user_requests_completed_total counter). Email falls back to user_id if not present in batch metadata.
  • src/manager/mod.rs — Added bulk_delete_data(creator_id, batch_size) to the Storage trait for chunked soft-deletion of a creator's batches and files with metadata nullification.
  • src/manager/postgres.rs — Implemented bulk_delete_data with two-stage chunked UPDATE using FOR UPDATE SKIP LOCKED: soft-deletes batches (nullifies metadata, cancels active ones) then soft-deletes files. Existing purge daemon handles child row cleanup.

@sejori sejori marked this pull request as ready for review March 30, 2026 13:03
Copilot AI review requested due to automatic review settings March 30, 2026 13:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds completion_window segmentation to per-user Prometheus metrics so dashboards can compare user throughput across SLA windows (e.g., 1h vs 24h).

Changes:

  • Extract completion_window from batch_metadata when spawning request processing tasks.
  • Add completion_window label to fusillade_user_requests_in_flight gauge (inc/dec) and fusillade_user_requests_completed_total counter (success/failed/cancelled paths).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// Spawn a processing task
let model_clone = model.clone();
let user_id = request.data.created_by.clone();
let completion_window = request.data.batch_metadata.get("completion_window").cloned().unwrap_or_default();
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completion_window is pulled from request.data.batch_metadata, which is populated based on DaemonConfig.batch_metadata_fields (and can be empty, as in several tests). In that configuration this label becomes an empty string via unwrap_or_default(), so the new dashboard split-by-completion_window won’t work and metrics will silently aggregate under a blank value. Consider sourcing completion_window from an always-present field (e.g., carry batch_completion_window on RequestData when claiming) or at least emit a distinct fallback like "unknown" and log/warn when the metadata key is missing.

Suggested change
let completion_window = request.data.batch_metadata.get("completion_window").cloned().unwrap_or_default();
let completion_window = match request.data.batch_metadata.get("completion_window") {
Some(value) if !value.is_empty() => value.clone(),
_ => {
tracing::warn!(
request_id = %request_id,
batch_id = %batch_id,
"Missing `completion_window` in batch_metadata; using \"unknown\" label for metrics"
);
"unknown".to_string()
}
};

Copilot uses AI. Check for mistakes.
.or_default()
.fetch_add(1, Ordering::Relaxed);
gauge!("fusillade_user_requests_in_flight", "user" => user_id.clone())
gauge!("fusillade_user_requests_in_flight", "user" => user_id.clone(), "completion_window" => completion_window.clone())
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding raw completion_window as a label to high-volume per-user metrics (fusillade_user_requests_in_flight / fusillade_user_requests_completed_total) risks very high label cardinality because completion_window is user-provided and not normalized (e.g., 60m, 1h, 1h0m, etc. all become distinct series). This can significantly increase Prometheus/OTel metric storage and query cost. Consider normalizing to a canonical representation (e.g., seconds or a fixed set of buckets) before using it as a label, or restricting allowed values.

Suggested change
gauge!("fusillade_user_requests_in_flight", "user" => user_id.clone(), "completion_window" => completion_window.clone())
gauge!("fusillade_user_requests_in_flight", "user" => user_id.clone())

Copilot uses AI. Check for mistakes.
@sejori sejori changed the title Feat/prometheus user throughput feat: bulk_delete_data on Storage Mar 30, 2026
@sejori sejori marked this pull request as draft March 31, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants