Skip to content

ref(crons): Reorganize incident creation / issue occurrence logic#10

Open
zaibkhan wants to merge 1 commit into
monitor-incident-refactor-beforefrom
monitor-incident-refactor-after
Open

ref(crons): Reorganize incident creation / issue occurrence logic#10
zaibkhan wants to merge 1 commit into
monitor-incident-refactor-beforefrom
monitor-incident-refactor-after

Conversation

@zaibkhan

@zaibkhan zaibkhan commented Sep 9, 2025

Copy link
Copy Markdown

Mirrors ai-code-review-evaluation#8 for like-for-like benchmarking.

  • Base: monitor-incident-refactor-before
  • Head: monitor-incident-refactor-after

Original PR excerpt:

Test 8

…0528)

Since we'll be doing more with issue occurrences split out the concept
of incidents into it's own logic module, as well as incident_occurrence
into it's own module

Part of GH-80527
@codoki-ai codoki-ai deleted a comment from codoki-pr-intelligence Bot Sep 10, 2025
@codoki-ai codoki-ai deleted a comment from codoki-pr-intelligence Bot Sep 10, 2025
@codoki-pr-intelligence

codoki-pr-intelligence Bot commented Sep 10, 2025

Copy link
Copy Markdown

Codoki PR Review

Summary: Refactor incidents/occurrence flow, fix context config usage
What’s good: - Clear separation of concerns: threshold logic moved to incidents.py and occurrence construction to incident_occurrence.py; mark_failed now delegates cleanly.

  • Efficient DB access: values(...).order_by(...).slice(...) minimizes fetched columns and rows.
  • Preserves side-effects (signal and Kafka occurrence) and existing guards (muted checks, status transitions).
    Review Status: ✅ Safe to merge

Issues (Medium)

Severity Issue Why it matters
Medium Correctness — Modified config is not returned in monitor context …/logic/incident_occurrence.py
The function creates a local copy of config and updates schedule_type to its display value, but returns the original unmodified config here. This drops the intended display mapping and may show raw/internal values to downstream consumers.

Showing up to 1 medium issue(s). See inline suggestions for more details.

Key Feedback (click to expand)
  • Needs improvement: - get_monitor_environment_context builds a modified config but returns the unmodified config; use the local copy to include schedule_type display.
  • Consider re-exporting moved helpers from their old module (mark_failed) to ease transition for any remaining call sites.
  • Testing: - Add a unit test for get_monitor_environment_context to assert that when schedule_type is present, the returned context['config']['schedule_type'] is the display value (e.g., 'Crontab' vs underlying code).
  • Add a test covering try_incident_threshold for both branches: (a) threshold > 1 with a successful check-in in the window (no incident), and (b) multiple failed statuses generating multiple occurrences.
  • Verify create_incident_occurrence populates last_successful_checkin correctly ('Never' vs ISO timestamp) and includes trace context only when trace_id is present.
  • Documentation: - Update any developer docs or inline comments referencing helpers in mark_failed to reflect their new locations (incidents.py and incident_occurrence.py).
  • Compatibility: - Backward compatibility: mark_failed no longer exposes get_failure_reason/get_monitor_environment_context; if other modules still import these, provide a transitional re-export or a deprecation path to avoid runtime ImportError.
  • Open questions: - Was any external/cron code importing get_failure_reason/get_monitor_environment_context from mark_failed? If yes, do we plan a follow-up to update those imports or provide a temporary shim?

Confidence: 4/5 — Looks good; minor fixes (1 medium)

Sequence Diagram

sequenceDiagram
    participant Caller
    participant MarkFailed
    participant Incidents
    participant Occurrence
    participant Kafka
    participant Signals
    Caller->>MarkFailed: mark_failed(failed_checkin, ts, received)
    MarkFailed->>Incidents: try_incident_threshold(failed_checkin, threshold, received)
    alt status OK/ACTIVE and threshold met
        Incidents-->>Incidents: set status ERROR, get_or_create incident
    else status ERROR
        Incidents-->>Incidents: use active incident
    else other statuses
        Incidents-->>MarkFailed: return False
    end
    alt not muted and incident exists
        loop for each checkin in window
            Incidents->>Occurrence: create_incident_occurrence(...)
            Occurrence->>Kafka: produce_occurrence_to_kafka(...)
        end
    end
    Incidents->>Signals: monitor_environment_failed.send(...)
    Incidents-->>MarkFailed: return True
Loading

React with 👍 or 👎 if you found this review useful.

"id": str(monitor_environment.monitor.guid),
"slug": str(monitor_environment.monitor.slug),
"name": monitor_environment.monitor.name,
"config": monitor_environment.monitor.config,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔷 Medium: The function creates a local copy of config and updates schedule_type to its display value, but returns the original unmodified config here. This drops the intended display mapping and may show raw/internal values to downstream consumers.

Suggested change
"config": monitor_environment.monitor.config,
```suggestion
"config": config,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants