feat(tools): correlate metric/log/trace symptoms onto the timeline (v2)#108
Merged
Conversation
correlate.workload now folds symptom signals onto the change timeline and aligns them with the change that most likely caused them: - metric: a PromQL breach (metric_query + metric_threshold args) emits a metric_breach event at the first sample over threshold (Prometheus range). - log: Loki error-line bursts emit a log_error event at the first error line. - trace: Jaeger error/slow spans emit trace_error/trace_slow at span start (adds a read-only JaegerClient.SearchErrorSpans; Tempo deferred to v3). - candidate-cause v2: the most recent change strictly before the earliest symptom, falling back to the newest change when there are no symptoms. Symptoms reuse change.ChangeEvent (symptom Kinds + Source) so everything merges on one newest-first timeline. Read-only, RiskLow. Signed-off-by: rlaope <piyrw9754@gmail.com>
Seed peakValue from the first observed sample (havePeak flag) so a metric whose values are all negative reports a real peak instead of a spurious 0, and drop the stale comment describing a re-scan that never happened. Found in architect review; Summary-string only, breach detection unchanged. Signed-off-by: rlaope <piyrw9754@gmail.com>
Fix the log_source LogQL comment that promised an unimplemented line-filter fallback, and clarify in the namespace arg description that the metric/log/ trace symptom sources are namespace-agnostic in v2. Comment/schema text only; no behavior change. Found in code review. Signed-off-by: rlaope <piyrw9754@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
correlate.workloadv2: folds symptom signals onto the change timeline and aligns them with the change that most likely caused them.metric_query+metric_thresholdargs) →metric_breachevent at the first sample over threshold (Prometheus range query).log_errorevent at the first error line (with in-window count).trace_error/trace_slowat span start (adds a read-onlyJaegerClient.SearchErrorSpans; Tempo deferred to v3).change.ChangeEvent(symptom Kinds + Source) so everything merges on one newest-first timeline. Read-only, RiskLow. Tool name unchanged.Follow-up to the Phase 4 correlate tool; closes the v2 scope I deferred there.
What's new
internal/core/tools/correlate/{metric_source,log_source,trace_source,cause}.go(+ tests);tool.go/register.goextended for symptom sources +metric_query/metric_thresholdargs.internal/core/tools/trace/jaeger.go— read-onlySearchErrorSpans(+JaegerSpan) exposing span start time + error status.internal/wiring/tools.go— threads prom/log/trace clients intocorrelate.RegisterAll; correlate registers when any change OR symptom backend exists.Test plan
go build ./...go test ./...(metric breach/threshold; log error-burst onset; jaeger span→event; candidate-cause symptom-alignment incl. earliest-symptom + no-prior-change + fallback; Run renders symptom lines + partial-failure tolerance)golangci-lint v2.12 run ./...→ 0 issues;gofmtcleanDeferred to v3 (noted): Tempo trace symptoms; ES log symptoms; consolidating the small
isErrorLine/matchesWorkloadhelpers shared across docker-backed tools.🤖 Generated with Claude Code