Skip to content

fix(skills): correct perf tool-arg names + metric/runtime inaccuracies (code-review followup)#95

Merged
rlaope merged 1 commit into
masterfrom
fix/skills-review-followup
May 28, 2026
Merged

fix(skills): correct perf tool-arg names + metric/runtime inaccuracies (code-review followup)#95
rlaope merged 1 commit into
masterfrom
fix/skills-review-followup

Conversation

@rlaope
Copy link
Copy Markdown
Owner

@rlaope rlaope commented May 28, 2026

Summary

Applies the verified findings from an extra-high-effort multi-angle /code-review of the 11 skill playbooks merged in #94. All are doc-level corrections (no Go change); the most important class — wrong perf.* argument names — was confirmed by reading the actual tool schemas in internal/core/tools/perf/.

CRITICAL — perf tool calls used non-existent arg keys (silently dropped → wrong endpoint/default window)

  • node-runtime: v8_inspector_targets/cpu_profile used url= → real args are name / target_index / duration_seconds / top_n.
  • go-runtime: go_pprof_cpu used url=/seconds= → real args name / duration_seconds / top_n.
  • native-perf: linux_perf_record used duration= → real arg duration_seconds.

HIGH — PromQL / metric correctness

  • capacity-scheduling: kube_pod_container_resource_requests / kube_node_status_allocatable summed per node without a {resource="cpu"|"memory"} matcher mixes cores + bytes → added the filter.
  • slo-burn: time_to_exhaustion omitted the SLO window W (dimensionless result) → now W × remaining / burn_rate.
  • ai-inference: vllm:gpu_cache_usage_perc is a fraction in [0,1] despite the _perc suffix → thresholds go against 1.0, not 100.
  • go-runtime: mark-assist is concurrent inline work, not an STW pause (was "STW assist pauses"); throttle test now uses cfs_throttled_periods/cfs_periods, not a bare rate.

MEDIUM — runtime facts & routing

  • node-runtime: dropped the bare node trigger (hijacked k8s cluster-node queries like "node NotReady"); nodejs_active_* are gauges (non-recovering rise, not monotonic counter); --max-old-space-size default is memory-derived on Node ≥ 12, not a fixed ~1.5 GB.
  • dotnet-runtime: LOH threshold is 85,000 bytes (~83 KB), not "85 KB".

LOW — read-only guard / output-shape consistency

  • slo-burn: added a dedicated "never silence alerts / edit rules" read-only bullet; gated the fixed output shape on T/W being known rather than invented.
  • capacity-scheduling: marked the Recommend field as operator-applied/read-only (house style).

Test plan

  • go test ./internal/core/skills/... — all 24 skills parse/load
  • go test ./... — full suite green
  • grep confirms no remaining url=/seconds=/duration=<n> perf args in any skill body
  • perf arg names cross-checked against internal/core/tools/perf/{pprof_cpu,linux_perf,rbspy,v8,v8_cdp}.go

…rom xhigh code review

Verified findings from the multi-angle review of the 11 new skill playbooks
(perf tool arg names checked against internal/core/tools/perf/ source):

- perf invocations used non-existent arg keys (silently dropped at runtime):
  - node-runtime: v8_inspector_targets/cpu_profile used url= → name= / target_index=
  - go-runtime: go_pprof_cpu used url=/seconds= → name= / duration_seconds=
  - native-perf: linux_perf_record used duration= → duration_seconds=
- capacity-scheduling: kube_*_requests/allocatable summed without a
  {resource="cpu"|"memory"} matcher mixed cores and bytes — add the filter.
- slo-burn: time_to_exhaustion formula omitted the window W (dimensionless
  result); add explicit read-only "never silence/edit rules" bullet; gate the
  fixed output shape on T/W being known rather than invented.
- ai-inference: vllm:gpu_cache_usage_perc is a fraction in [0,1] despite the
  _perc suffix — thresholds go against 1.0, not 100.
- go-runtime: mark-assist is concurrent inline work, not an STW pause; throttle
  test now uses the cfs throttled/periods ratio, not a bare rate.
- node-runtime: drop the bare "node" trigger (hijacked k8s cluster-node
  queries); nodejs_active_* are gauges (non-recovering rise, not monotonic);
  --max-old-space-size default is memory-derived on Node ≥ 12, not ~1.5 GB.
- dotnet-runtime: LOH threshold is 85,000 bytes (~83 KB), not 85 KB.
- capacity-scheduling: mark the Recommend field as operator-applied/read-only.

Signed-off-by: rlaope <piyrw9754@gmail.com>
@rlaope rlaope merged commit 150b758 into master May 28, 2026
1 of 2 checks passed
@rlaope rlaope deleted the fix/skills-review-followup branch May 28, 2026 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant