ref(crons): Reorganize incident creation / issue occurrence logic by ShashankFC · Pull Request #13 · AI-Code-Review-Evals/coderabbit-sentry

ShashankFC · 2026-01-05T07:31:09Z

Test 8

Summary by CodeRabbit

Release Notes

New Features
- Implemented automatic incident creation and detection for monitor failures, respecting configurable failure thresholds.
- Enhanced incident context with human-readable failure reason summaries, environment details, and tracking of last successful check-in times to aid in incident diagnosis and resolution.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Replicated from ai-code-review-evaluation/sentry-coderabbit#8

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

ShashankFC · 2026-01-06T09:26:03Z

@coderabbitai review

coderabbitai · 2026-01-06T09:26:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-01-06T09:26:28Z

📝 Walkthrough

Walkthrough

A new incident occurrence module is introduced alongside refactored threshold-based incident detection logic. The changes establish a workflow for creating, serializing, and emitting monitor incident occurrences to Kafka when monitors fail, with comprehensive failure reason generation and monitor environment context capture.

Changes

Cohort / File(s)	Summary
Incident Occurrence Creation `src/sentry/monitors/logic/incident_occurrence.py`	New module creating IssueOccurrence objects with metadata (id, fingerprint, type, title, evidence display). Includes functions to construct human-readable failure reasons, serialize monitor environment context, and emit occurrences via Kafka. Defines localized failure status mappings for error/missed/timeout cases.
Incident Threshold Detection `src/sentry/monitors/logic/incidents.py`	New `try_incident_threshold` function determines incident creation/update based on monitor environment status and failure thresholds. Implements branching logic for OK, ACTIVE, and ERROR states; queries prior check-ins; manages status transitions; emits monitor environment signals.
Monitor Failure Refactoring `src/sentry/monitors/logic/mark_failed.py`	Delegates incident/threshold logic to `try_incident_threshold`, removing in-function branching. Retains timing field updates and environment refresh while consolidating incident creation handling.
Type System `src/sentry/monitors/types.py`	Introduces `SimpleCheckIn` TypedDict with id, date\_added, and status fields for check-in data representation.

Sequence Diagram

sequenceDiagram
    actor Monitor as Monitor System
    participant ME as MonitorEnvironment
    participant INC as Incident Handler
    participant KF as Kafka/Occurrence Producer
    
    Monitor->>ME: check_failed_checkin()
    ME-->>Monitor: status check (OK/ACTIVE/ERROR)
    
    alt Status: OK or ACTIVE
        Monitor->>INC: try_incident_threshold()
        activate INC
        INC->>INC: query prior check-ins up to threshold
        INC->>INC: validate check-in statuses
        INC->>ME: update status to ERROR
        INC->>INC: create or retrieve incident
        deactivate INC
    end
    
    alt Status: ERROR
        Monitor->>INC: try_incident_threshold()
        activate INC
        INC->>INC: retrieve active incident
        deactivate INC
    end
    
    alt Incident exists & not muted
        INC->>INC: create_incident_occurrence()
        activate INC
        INC->>INC: build IssueOccurrence (metadata, failure reason, context)
        INC->>KF: produce_occurrence_to_kafka()
        deactivate INC
    end
    
    KF-->>Monitor: ✓ Occurrence emitted

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A monitor fails, we hop into action swift,
With thresholds and status, we carefully sift,
Each incident born from a check-in's demise,
To Kafka it flows—oh, what graceful goodbyes!
New failures detected, the logic runs deep,
Our incident occurrences, cataloged and neat! 📋✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly summarizes the main change: reorganizing incident creation and issue occurrence logic across the codebase into separate modules.

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI Agents

In @src/sentry/monitors/logic/incident_occurrence.py:
- Around line 130-137: Docstring in get_failure_reason contains a typo "humam" —
change it to "human" in the triple-quoted description inside the
get_failure_reason function so the docstring reads "Builds a human readable
string from a list of failed check-ins." and ensure surrounding examples remain
unchanged.
- Around line 159-171: In get_monitor_environment_context, you copy and modify
config (variable config) but then return monitor_environment.monitor.config;
change the returned "config" field to use the modified local variable config so
the human-readable schedule_type is included; locate the function
get_monitor_environment_context and replace the returned config reference with
the local config variable.

🧹 Nitpick comments (2)

src/sentry/monitors/logic/incidents.py (1)
50-53: Use generator expression instead of list comprehension inside any().

any() short-circuits on the first truthy value, but the list comprehension evaluates all elements first. Use a generator expression for efficiency.
🔎 Proposed fix
-            if any([checkin["status"] == CheckInStatus.OK for checkin in previous_checkins]):
+            if any(checkin["status"] == CheckInStatus.OK for checkin in previous_checkins):
                 return False
src/sentry/monitors/logic/incident_occurrence.py (1)
145-146: Prefer next(iter(...)) over single-element slice.

Creating a list just to access the first element is less efficient than using an iterator directly.
🔎 Proposed fix
     if sum(status_counts.values()) == 1:
-        return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]
+        return SINGULAR_HUMAN_FAILURE_MAP[next(iter(status_counts.keys()))]

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fc9f67 and c3b32a7.

📒 Files selected for processing (4)

src/sentry/monitors/logic/incident_occurrence.py
src/sentry/monitors/logic/incidents.py
src/sentry/monitors/logic/mark_failed.py
src/sentry/monitors/types.py

🧰 Additional context used

🧬 Code graph analysis (3)

src/sentry/monitors/logic/incidents.py (3)

src/sentry/monitors/logic/incident_occurrence.py (1)

create_incident_occurrence (28-113)

src/sentry/monitors/models.py (3)

MonitorCheckIn (472-582)

MonitorIncident (747-795)

MonitorStatus (80-121)

src/sentry/monitors/types.py (1)

SimpleCheckIn (105-112)

src/sentry/monitors/logic/incident_occurrence.py (5)

src/sentry/eventstore/models.py (2)

datetime (112-120)

trace_id (90-94)

src/sentry/monitors/models.py (6)

timezone (340-341)

CheckInStatus (124-176)

MonitorCheckIn (472-582)

MonitorIncident (747-795)

get_last_successful_checkin (707-712)

get_schedule_type_display (343-344)

src/sentry/issues/grouptype.py (1)

MonitorIncidentType (528-536)

src/sentry/monitors/types.py (1)

SimpleCheckIn (105-112)

src/sentry/issues/producer.py (2)

PayloadType (25-34)

produce_occurrence_to_kafka (50-90)

src/sentry/monitors/logic/mark_failed.py (2)

src/sentry/monitors/logic/incidents.py (1)

try_incident_threshold (14-104)

src/sentry/monitors/models.py (3)

CheckInStatus (124-176)

MonitorCheckIn (472-582)

MonitorEnvironment (644-724)

🪛 Ruff (0.14.10)

src/sentry/monitors/logic/incident_occurrence.py

146-146: Prefer next(iter(status_counts.keys())) over single element slice

Replace with next(iter(status_counts.keys()))

(RUF015)

🔇 Additional comments (4)

src/sentry/monitors/types.py (1)

105-113: LGTM!

The SimpleCheckIn TypedDict cleanly encapsulates the minimal check-in data needed for incident processing, and its fields align with the .values() query in incidents.py.

src/sentry/monitors/logic/mark_failed.py (1)

78-79: Clean refactor separating concerns.

The delegation to try_incident_threshold after updating timing fields and refreshing from DB is a clean separation. This keeps mark_failed focused on monitor environment state updates while incident creation logic is centralized.

src/sentry/monitors/logic/incidents.py (1)

89-104: LGTM!

The occurrence creation logic correctly handles both initial incident creation (multiple occurrences for threshold > 1) and ongoing failures (single occurrence per check-in). The muting check and signal emission are appropriately placed.

src/sentry/monitors/logic/incident_occurrence.py (1)

28-113: LGTM!

The occurrence creation is well-structured with proper null handling, comprehensive evidence display, and appropriate Kafka event data including optional trace context.

coderabbitai · 2026-01-06T09:30:37Z

+def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]):
+    """
+    Builds a humam readible string from a list of failed check-ins.
+
+    "3 missed check-ins detected"
+    "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected"
+    "A failed check-in was detected"
+    """


⚠️ Potential issue | 🟡 Minor

Fix typo: "humam" → "human".

🔎 Proposed fix

def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]): """ - Builds a humam readible string from a list of failed check-ins. + Builds a human readable string from a list of failed check-ins. "3 missed check-ins detected" "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected" "A failed check-in was detected" """

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]):

"""

Builds a humam readible string from a list of failed check-ins.

"3 missed check-ins detected"

"2 missed check-ins, 1 timeout check-in and 1 error check-in were detected"

"A failed check-in was detected"

"""

def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]):

"""

Builds a human readable string from a list of failed check-ins.

"3 missed check-ins detected"

"2 missed check-ins, 1 timeout check-in and 1 error check-in were detected"

"A failed check-in was detected"

"""

🤖 Prompt for AI Agents

In @src/sentry/monitors/logic/incident_occurrence.py around lines 130 - 137, Docstring in get_failure_reason contains a typo "humam" — change it to "human" in the triple-quoted description inside the get_failure_reason function so the docstring reads "Builds a human readable string from a list of failed check-ins." and ensure surrounding examples remain unchanged.

coderabbitai · 2026-01-06T09:30:37Z

+    human_status = get_text_list(
+        [
+            "%(count)d %(status)s" % {"count": count, "status": HUMAN_FAILURE_STATUS_MAP[status]}
+            for status, count in status_counts.items()
+        ],
+        last_word=_("and"),
+    )
+
+    return _("%(problem_checkins)s check-ins detected") % {"problem_checkins": human_status}


⚠️ Potential issue | 🟡 Minor

Grammar issue: Plural formatting doesn't handle singular counts correctly.

The current format produces strings like "1 missed and 1 error check-ins detected" when counts are 1, which is grammatically incorrect. The docstring example shows "1 timeout check-in" (singular) which the implementation doesn't produce.

Consider including "check-in(s)" per status entry with proper pluralization, or simplifying the docstring to match actual output.

coderabbitai · 2026-01-06T09:30:37Z

+def get_monitor_environment_context(monitor_environment: MonitorEnvironment):
+    config = monitor_environment.monitor.config.copy()
+    if "schedule_type" in config:
+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),
+        "slug": str(monitor_environment.monitor.slug),
+        "name": monitor_environment.monitor.name,
+        "config": monitor_environment.monitor.config,
+        "status": monitor_environment.get_status_display(),
+        "type": monitor_environment.monitor.get_type_display(),
+    }


⚠️ Potential issue | 🟠 Major

Bug: Modified config is not returned.

Lines 160-162 create a copy of config and update schedule_type to a human-readable display value, but line 168 returns the original monitor_environment.monitor.config instead of the modified config variable.

🔎 Proposed fix

def get_monitor_environment_context(monitor_environment: MonitorEnvironment): config = monitor_environment.monitor.config.copy() if "schedule_type" in config: config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display() return { "id": str(monitor_environment.monitor.guid), "slug": str(monitor_environment.monitor.slug), "name": monitor_environment.monitor.name, - "config": monitor_environment.monitor.config, + "config": config, "status": monitor_environment.get_status_display(), "type": monitor_environment.monitor.get_type_display(), }

🤖 Prompt for AI Agents

In @src/sentry/monitors/logic/incident_occurrence.py around lines 159 - 171, In get_monitor_environment_context, you copy and modify config (variable config) but then return monitor_environment.monitor.config; change the returned "config" field to use the modified local variable config so the human-readable schedule_type is included; locate the function get_monitor_environment_context and replace the returned config reference with the local config variable.

ref(crons): Reorganize incident creation / issue occurrence logic (#8…

c3b32a7

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

coderabbitai Bot reviewed Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(crons): Reorganize incident creation / issue occurrence logic#13

ref(crons): Reorganize incident creation / issue occurrence logic#13
ShashankFC wants to merge 1 commit into
monitor-incident-refactor-beforefrom
monitor-incident-refactor-after

ShashankFC commented Jan 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

ShashankFC commented Jan 6, 2026

Uh oh!

coderabbitai Bot commented Jan 6, 2026

Uh oh!

coderabbitai Bot commented Jan 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jan 6, 2026

Uh oh!

coderabbitai Bot Jan 6, 2026

Uh oh!

coderabbitai Bot Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShashankFC commented Jan 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

ShashankFC commented Jan 6, 2026

Uh oh!

coderabbitai Bot commented Jan 6, 2026

Uh oh!

coderabbitai Bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShashankFC commented Jan 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 6, 2026 •

edited

Loading