Skip to content

[FEATURE] Improve Shipwright Builds Metrics #2148

Description

@avinal

Is there an existing feature request for this?

  • I have searched the existing feature requests

Is your feature request related to a problem or use-case? Please describe.

Currently, Builds exposes several basic metrics. Here are the existing metrics:

Metric Type What It Tracks
build_builds_registered_total Counter Builds that pass validation
build_buildruns_completed_total Counter BuildRuns where an executor was created (misleading name)
build_buildrun_establish_duration_seconds Histogram Time from BuildRun creation to executor start
build_buildrun_completion_duration_seconds Histogram Time from BuildRun creation to executor completion
build_buildrun_rampup_duration_seconds Histogram Time from BuildRun creation to executor creation
build_buildrun_taskrun_rampup_duration_seconds Histogram Time from executor creation to pod creation
build_buildrun_taskrun_pod_rampup_duration_seconds Histogram Time from pod creation to last init container finish

With enabled labels, they can track plenty of stats, but some use cases remain undiscovered. Which I think will be useful to the user.

Describe the solution that you would like.

I am proposing these enhancements to Builds metrics.

Missing labels (not available on any metric, even as opt-in)

Label Values Why it's needed
strategy_kind BuildStrategy, ClusterBuildStrategy Distinguish namespace-scoped (custom) vs cluster-scoped (Red Hat provided) strategies. Currently impossible to tell.
executor TaskRun, PipelineRun Distinguish executor type. PipelineRun support was recently added but no metric tracks which executor ran a build.
result succeeded, failed, cancelled, timeout Build outcome. No metric tracks success vs failure today. This is the single biggest gap.
source_type Git, OCI, Local Source code delivery method. No metric tracks this.

Missing metrics

Metric Type What it would track
build_buildrun_result_total Counter BuildRun completions with outcome, the metric that build_buildruns_completed_total should have been
build_buildrun_failure_reason_total Counter Failed BuildRuns broken down by failure reason (OOM, eviction, timeout, etc.)
build_buildruns_active Gauge Point-in-time count of currently running BuildRuns
build_buildstrategy_count Gauge Point-in-time count of BuildStrategy and ClusterBuildStrategy objects

Existing metric issues

Metric Issue
build_buildruns_completed_total Misleadingly named, incremented when executor is created, not when it completes. Also has no outcome distinction. It should be deprecated in favor of build_buildrun_result_total.
build_builds_registered_total Only incremented on successful validation. Builds that fail registration (e.g., strategy not found) are invisible.

Describe alternatives you have considered.

No response

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions