ref(crons): Reorganize incident creation / issue occurrence logic by Frawatson · Pull Request #3 · Frawatson/sentry-grapplepr

Frawatson · 2026-04-07T18:00:50Z

Benchmark test PR #8 - Replicating Greptile benchmark for Grapple PR evaluation.

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

grapple-pr · 2026-04-07T18:46:25Z

🤖 Grapple PR Review

Summary

This PR reorganizes cron monitor incident creation and issue occurrence logic by extracting it from mark_failed.py into two new dedicated modules: incident_occurrence.py and incidents.py. This is a structural refactor to improve separation of concerns in the monitors subsystem.

Changes: 4 files (+289 / -264 lines)

🟢 2 files added: src/sentry/monitors/logic/incident_occurrence.py, src/sentry/monitors/logic/incidents.py
🟡 2 files modified: src/sentry/monitors/logic/mark_failed.py, src/sentry/monitors/types.py

Resolves: #8 (feat(upsampling) - Support upsampled error count with performance optimizations)

CI: ⏳ 0 passed, 0 failed

Impact Diagram

graph TD
  pr_0[/"🔀 PR Changes: 4 files"/]
  f1["🟢 incident_occurrence.py +171/-0"]
  pr_0 --> f1
  f2["🟢 incidents.py +104/-0"]
  pr_0 --> f2
  f3["🟡 mark_failed.py +4/-264"]
  pr_0 --> f3
  f4["🟡 types.py +10/-0"]
  pr_0 --> f4
  f1 -.- f2
  f2 -.- f3
  style f1 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f2 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f3 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f4 fill:#7c3aed,color:#fff,stroke:#7c3aed

3 major issues found — review recommended.

Severity	Count
🟠 Major	3
🟡 Minor	6
🔵 Info	5

Review details

Reviewed at: 2026-04-07T18:46:24.115Z
Agents used: security, logic, style, architecture, performance, verification
Context level: full
Execution time: 57.1s
Low-confidence findings suppressed: 1
Blast radius: 0 (contained)

Powered by Grapple PR — AI that understands your code.

grapple-pr · 2026-04-07T18:46:25Z

MAJOR — src/sentry/monitors/logic/incidents.py:1

Two new modules are added under src/sentry/monitors/logic/ but there is no change to an __init__.py to export them. If the logic package's __init__.py has explicit __all__ or explicit imports that control the public surface, these new modules won't be discoverable. This is a minor concern — Python can still import them directly — but it could violate existing package conventions.

grapple-pr · 2026-04-07T18:46:26Z

MAJOR — src/sentry/monitors/logic/incidents.py:79

A separate database query is issued for each check-in ID collected in previous_checkins, followed by a loop that calls create_incident_occurrence per check-in. When failure_issue_threshold is large, this results in N individual Kafka produce calls and an extra DB round-trip. The original code had the same pattern, so this is not a regression introduced by the refactor, but it is worth flagging.

grapple-pr · 2026-04-07T18:46:27Z

MAJOR — src/sentry/monitors/logic/mark_failed.py:6

After the refactor, MonitorEnvironment is still imported in mark_failed.py. While this may be used in the remaining code (lines not shown in the diff context), it's worth verifying. More importantly, MonitorStatus was removed from the import but is likely still needed by the remaining mark_failed function body which checks/sets monitor environment statuses. However, looking at the remaining code more carefully, the status checks appear to have been moved to incidents.py, so this is likely fine.

grapple-pr · 2026-04-07T18:46:27Z

MINOR — src/sentry/monitors/logic/incidents.py:14

The function was renamed from mark_failed_threshold to try_incident_threshold during the refactor. Any external callers or tests that imported mark_failed_threshold directly from mark_failed.py will break. While the intent says the public API of mark_failed (the top-level function) is preserved, the internal helper had a known name. If any test or module references mark_failed_threshold by name, this is a breaking change.

grapple-pr · 2026-04-07T18:46:28Z

MINOR — src/sentry/monitors/logic/incident_occurrence.py:108

get_monitor_environment_context copies monitor_environment.monitor.config and conditionally replaces schedule_type, but the result is never stored in occurrence — it is only placed in the event_data dict. The copy is cheap, but config is assigned into the local config variable yet the returned dict still uses monitor_environment.monitor.config (the original), not the modified copy. This is a logic issue that also represents unnecessary work.

grapple-pr · 2026-04-07T18:46:28Z

MINOR — src/sentry/monitors/logic/incident_occurrence.py:29

The function was renamed from create_issue_platform_occurrence to create_incident_occurrence during the refactor. Any external callers or tests that referenced the original name will break.

grapple-pr · 2026-04-07T18:46:29Z

MINOR — src/sentry/monitors/logic/mark_failed.py:7

After the refactor, mark_failed.py still imports MonitorEnvironment but it is unclear from the visible diff whether the remaining code (lines 1-76) still uses it. If the remaining mark_failed function body references MonitorEnvironment, this is fine; otherwise it's an unused import.

grapple-pr · 2026-04-07T18:46:29Z

MINOR — src/sentry/monitors/logic/incident_occurrence.py:20

incident_occurrence.py imports SimpleCheckIn from sentry.monitors.types, which is the new canonical location. This is correct. However, incidents.py defines its own import of SimpleCheckIn also from sentry.monitors.types. Both new modules correctly import from the new types.py location - this is fine.

grapple-pr · 2026-04-07T18:46:30Z

MINOR — src/sentry/monitors/logic/incident_occurrence.py:130

Typo in docstring: 'humam readible' should be 'human readable'

grapple-pr · 2026-04-07T18:46:31Z

INFO — src/sentry/monitors/logic/incidents.py:7

The dependency direction is mark_failed.py → incidents.py → incident_occurrence.py, which forms a clean unidirectional chain with no circular imports. types.py is a leaf dependency used by both new modules. This is a well-structured decomposition.

grapple-pr

🤖 Grapple PR Review

Summary

This PR reorganizes cron monitor incident creation and issue occurrence logic by extracting it from mark_failed.py into two new dedicated modules: incident_occurrence.py and incidents.py, reducing mark_failed.py by ~264 lines.

Changes: 4 files (+289 / -264 lines)

🟢 2 files added: src/sentry/monitors/logic/incident_occurrence.py, src/sentry/monitors/logic/incidents.py
🟡 2 files modified: src/sentry/monitors/logic/mark_failed.py, src/sentry/monitors/types.py

Resolves: #8 (feat(upsampling) - Support upsampled error count with performance optimizations)

CI: ⏳ 0 passed, 0 failed

Impact Diagram

graph TD
  pr_0[/"🔀 PR Changes: 4 files"/]
  f1["🟢 incident_occurrence.py +171/-0"]
  pr_0 --> f1
  f2["🟢 incidents.py +104/-0"]
  pr_0 --> f2
  f3["🟡 mark_failed.py +4/-264"]
  pr_0 --> f3
  f4["🟡 types.py +10/-0"]
  pr_0 --> f4
  f1 -.- f2
  f2 -.- f3
  style f1 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f2 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f3 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f4 fill:#7c3aed,color:#fff,stroke:#7c3aed

2 major issues found — review recommended.

Severity	Count
🟠 Major	2
🟡 Minor	3
🔵 Info	1

Review details

Reviewed at: 2026-04-07T21:07:19.372Z
Agents used: security, logic, style, architecture, performance, verification
Context level: full
Execution time: 40.2s
Low-confidence findings suppressed: 1
Blast radius: 0 (contained)

Powered by Grapple PR — AI that understands your code.

grapple-pr · 2026-04-07T21:07:19Z

-        "status": monitor_environment.get_status_display(),
-        "type": monitor_environment.monitor.get_type_display(),
-    }
+    return try_incident_threshold(failed_checkin, failure_issue_threshold, received)


🟠 MAJOR — Broken external imports (confidence: 100%)

The functions mark_failed_threshold, create_issue_platform_occurrence, get_failure_reason, get_monitor_environment_context, and the type SimpleCheckIn were previously importable from mark_failed.py. Any external callers importing these symbols (e.g., from sentry.monitors.logic.mark_failed import mark_failed_threshold or from sentry.monitors.logic.mark_failed import SimpleCheckIn) will now get ImportError. The PR does not re-export these symbols from mark_failed.py.

Evidence:

The original mark_failed.py defined mark_failed_threshold, SimpleCheckIn, create_issue_platform_occurrence, get_failure_reason, get_monitor_environment_context as module-level names.

After the refactor, none of these names are re-exported or aliased in mark_failed.py.

The acceptance criteria state: 'All public functions/methods previously accessible via mark_failed.py remain importable or are re-exported correctly'.

Test files or other modules in the codebase that import these symbols directly from mark_failed.py will break.

_{Agent: logic}

grapple-pr · 2026-04-07T21:07:19Z

+logger = logging.getLogger(__name__)
+
+
+def try_incident_threshold(


🟠 MAJOR — Function rename changes public API (confidence: 94%)

The original function was named mark_failed_threshold and has been renamed to try_incident_threshold. The original occurrence creation function was named create_issue_platform_occurrence and has been renamed to create_incident_occurrence. These are not just internal refactors — any code referencing the old names (tests, signal handlers, external modules) will break.

Evidence:

Original mark_failed.py had def mark_failed_threshold(...) — now def try_incident_threshold(...) in incidents.py

Original mark_failed.py had def create_issue_platform_occurrence(...) — now def create_incident_occurrence(...) in incident_occurrence.py

Acceptance criteria: 'The net behavior of mark_failed.py is identical before and after the refactor'

_{Agent: logic}

grapple-pr · 2026-04-07T21:07:19Z

+
+
+def get_monitor_environment_context(monitor_environment: MonitorEnvironment):
+    config = monitor_environment.monitor.config.copy()


🟡 MINOR — Code Organization (confidence: 100%)

Inconsistent handling of monitor config in get_monitor_environment_context: the function modifies a shallow copy of config but returns the original unmodified config in the dict

Evidence:

Line 156-157: config = monitor_environment.monitor.config.copy() followed by conditional mutation of config['schedule_type']

Line 162: 'config': monitor_environment.monitor.config, returns the original, not the modified config variable

This means the schedule_type transformation is lost in the returned context

_{Agent: style}

grapple-pr · 2026-04-07T21:07:19Z

+    MonitorEnvironment,
+    MonitorIncident,
+)
+from sentry.monitors.types import SimpleCheckIn


🟡 MINOR — Unused import in types.py consumer (confidence: 88%)

incident_occurrence.py imports SimpleCheckIn from sentry.monitors.types, which is correct. However, incidents.py imports SimpleCheckIn from sentry.monitors.types as well (line 9). Both new modules correctly use the new type from types.py. This is consistent — noting for completeness that the type is not orphaned.

Evidence:

incident_occurrence.py line 20: from sentry.monitors.types import SimpleCheckIn

incidents.py line 9: from sentry.monitors.types import SimpleCheckIn

Both modules use SimpleCheckIn in type annotations and dict construction

_{Agent: logic}

grapple-pr · 2026-04-07T21:07:19Z

+    "3 missed check-ins detected"
+    "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected"
+    "A failed check-in was detected"
+    """


🟡 MINOR — Documentation (confidence: 76%)

Typo in docstring: 'humam readible' should be 'human readable'

Evidence:

The function get_failure_reason has a docstring with a spelling error: 'Builds a humam readible string...'

This is a public function that may be referenced in documentation or by other developers

_{Agent: style}

grapple-pr · 2026-04-07T21:07:19Z

-    MonitorIncident,
-    MonitorStatus,
-)
+from sentry.monitors.logic.incidents import try_incident_threshold


🔵 INFO — Dead import after refactor (confidence: 75%)

MonitorEnvironment is still imported in mark_failed.py. Verify it is still used in the remaining mark_failed() function body (lines not shown in the diff but presumably still reference it). If not used, it's a dead import.

Evidence:

Line 8: from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorEnvironment

The remaining mark_failed function likely still uses MonitorEnvironment for the query at the top of the function

_{Agent: logic}

grapple-pr

🔧 [Grapple PR] 3 auto-fix suggestions available

Apply individually or use GitHub's "Add suggestion to batch" → "Commit suggestions" to apply all at once.

_{Commits from Grapple PR are prefixed with [Grapple PR] for easy identification in merge conflicts.}

grapple-pr · 2026-04-07T21:08:16Z

+
+
+def get_monitor_environment_context(monitor_environment: MonitorEnvironment):
+    config = monitor_environment.monitor.config.copy()


✅ [Grapple PR] Auto-fix — style agent (Small fix (2 lines, 1 file))

Inconsistent handling of monitor config in get_monitor_environment_context: the function modifies a shallow copy of config but returns the original unmodified config in the dict

Suggested change

config = monitor_environment.monitor.config.copy()

"config": config,

_{🤖 Grapple PR auto-fix • minor • confidence: 100%}

grapple-pr · 2026-04-07T21:08:16Z

+    MonitorEnvironment,
+    MonitorIncident,
+)
+from sentry.monitors.types import SimpleCheckIn


✅ [Grapple PR] Auto-fix — logic agent (Small fix (2 lines, 1 file))

incident_occurrence.py imports SimpleCheckIn from sentry.monitors.types, which is correct. However, incidents.py imports SimpleCheckIn from sentry.monitors.types as well (line 9). Both new modules correctly use the new type from types.py. This is consistent — noting for completeness that the type is not orphaned.

Suggested change

from sentry.monitors.types import SimpleCheckIn

from sentry.monitors.types import SimpleCheckIn

_{🤖 Grapple PR auto-fix • minor • confidence: 88%}

grapple-pr · 2026-04-07T21:08:16Z

+    "3 missed check-ins detected"
+    "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected"
+    "A failed check-in was detected"
+    """


✅ [Grapple PR] Auto-fix — style agent (Small fix (2 lines, 1 file))

Typo in docstring: 'humam readible' should be 'human readable'

Suggested change

"""

Builds a human readable string from a list of failed check-ins.

_{🤖 Grapple PR auto-fix • minor • confidence: 76%}

grapple-pr · 2026-04-07T21:08:17Z

-        "status": monitor_environment.get_status_display(),
-        "type": monitor_environment.monitor.get_type_display(),
-    }
+    return try_incident_threshold(failed_checkin, failure_issue_threshold, received)


🟡 [Grapple PR] Suggested fix — logic agent (Larger fix (33 lines, 1 file) — review recommended)

The functions mark_failed_threshold, create_issue_platform_occurrence, get_failure_reason, get_monitor_environment_context, and the type SimpleCheckIn were previously importable from mark_failed.py. Any external callers importing these symbols (e.g., from sentry.monitors.logic.mark_failed import mark_failed_threshold or from sentry.monitors.logic.mark_failed import SimpleCheckIn) will now get ImportError. The PR does not re-export these symbols from mark_failed.py.

--- a/src/sentry/monitors/logic/mark_failed.py +++ b/src/sentry/monitors/logic/mark_failed.py @@ -1,6 +1,7 @@ from __future__ import annotations import logging +from datetime import datetime # noqa: F401 - re-exported for backward compatibility from datetime import datetime from django.db.models import Q @@ -10,6 +11,28 @@ from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorEnviron logger = logging.getLogger(__name__) +# --------------------------------------------------------------------------- +# Backward-compatibility re-exports +# +# These symbols were previously defined in this module. They have been moved +# to dedicated sub-modules as part of a refactor. We re-export them here so +# that any existing import sites of the form: +# +# from sentry.monitors.logic.mark_failed import mark_failed_threshold +# from sentry.monitors.logic.mark_failed import SimpleCheckIn +# from sentry.monitors.logic.mark_failed import create_issue_platform_occurrence +# from sentry.monitors.logic.mark_failed import get_failure_reason +# from sentry.monitors.logic.mark_failed import get_monitor_environment_context +# +# continue to work without modification. +# --------------------------------------------------------------------------- +from sentry.monitors.logic.incident_occurrence import ( # noqa: F401, E402 + SimpleCheckIn, + create_issue_platform_occurrence, + get_failure_reason, + get_monitor_environment_context, +) +from sentry.monitors.logic.incidents import try_incident_threshold as mark_failed_threshold # noqa: F401, E402 + +__all__ = [ + "mark_failed", + "mark_failed_threshold", + "SimpleCheckIn", + "create_issue_platform_occurrence", + "get_failure_reason", + "get_monitor_environment_context", +] + def mark_failed(

_{🤖 Grapple PR auto-fix • major • Review this diff before applying}

grapple-pr · 2026-04-07T21:08:18Z

+logger = logging.getLogger(__name__)
+
+
+def try_incident_threshold(


🟡 [Grapple PR] Suggested fix — logic agent (Larger fix (22 lines, 1 file) — review recommended)

The original function was named mark_failed_threshold and has been renamed to try_incident_threshold. The original occurrence creation function was named create_issue_platform_occurrence and has been renamed to create_incident_occurrence. These are not just internal refactors — any code referencing the old names (tests, signal handlers, external modules) will break.

--- a/src/sentry/monitors/logic/incidents.py +++ b/src/sentry/monitors/logic/incidents.py @@ -1,6 +1,8 @@ from __future__ import annotations import logging +import warnings from datetime import datetime from typing import cast @@ -101,3 +103,23 @@ def try_incident_threshold( monitor_environment_failed.send(monitor_environment=monitor_env, sender=type(monitor_env)) return True + + +def mark_failed_threshold( + failed_checkin: MonitorCheckIn, + failure_issue_threshold: int, + received: datetime | None, +) -> bool: + """ + Deprecated: Use `try_incident_threshold` instead. + + Backward-compatible alias preserved because external callers (tests, + signal handlers, other modules) may still reference the original name + from mark_failed.py. This alias ensures the refactor does not break + the public API contract while callers are migrated. + """ + warnings.warn( + "mark_failed_threshold is deprecated; use try_incident_threshold instead.", + DeprecationWarning, + stacklevel=2, + ) + return try_incident_threshold(failed_checkin, failure_issue_threshold, received)

_{🤖 Grapple PR auto-fix • major • Review this diff before applying}

grapple-pr

🤖 Grapple PR Review

Summary

This PR reorganizes cron monitor incident creation and issue occurrence logic by extracting it from mark_failed.py into two new dedicated modules: incident_occurrence.py and incidents.py, reducing mark_failed.py by 264 lines while adding 275 lines across the new files.

Changes: 4 files (+289 / -264 lines)

🟢 2 files added: src/sentry/monitors/logic/incident_occurrence.py, src/sentry/monitors/logic/incidents.py
🟡 2 files modified: src/sentry/monitors/logic/mark_failed.py, src/sentry/monitors/types.py

Resolves: #8 (feat(upsampling) - Support upsampled error count with performance optimizations)

CI: ⏳ 0 passed, 0 failed

Impact Diagram

graph TD
  pr_0[/"🔀 PR Changes: 4 files"/]
  f1["🟢 incident_occurrence.py +171/-0"]
  pr_0 --> f1
  f2["🟢 incidents.py +104/-0"]
  pr_0 --> f2
  f3["🟡 mark_failed.py +4/-264"]
  pr_0 --> f3
  f4["🟡 types.py +10/-0"]
  pr_0 --> f4
  f1 -.- f2
  f2 -.- f3
  style f1 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f2 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f3 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f4 fill:#7c3aed,color:#fff,stroke:#7c3aed

1 critical issue found — please address before merging.

Severity	Count
🔴 Critical	1
🟠 Major	5
🟡 Minor	6
🔵 Info	3

Review details

Reviewed at: 2026-04-08T00:47:12.319Z
Agents used: security, logic, style, architecture, performance, verification
Context level: full
Execution time: 104.7s
Low-confidence findings suppressed: 4
Blast radius: 0 (contained)

Powered by Grapple PR — AI that understands your code.

grapple-pr · 2026-04-08T00:47:12Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


🔴 CRITICAL — Data flow / Bug (confidence: 100%)

get_monitor_environment_context builds a local config variable with the schedule_type overwritten, but then the returned dictionary uses monitor_environment.monitor.config (the original, unmodified config) instead of the local config variable. This means the schedule_type transformation is dead code and the context will always contain the raw schedule_type value instead of the display value. This bug existed in the original code and was faithfully copied, but it is still a logic error.

Evidence:

Line 158: config = monitor_environment.monitor.config.copy()

Line 159-160: if 'schedule_type' in config: config['schedule_type'] = monitor_environment.monitor.get_schedule_type_display()

Line 165: 'config': monitor_environment.monitor.config, — uses the original config, not the local config variable

_{Agent: logic}

grapple-pr · 2026-04-08T00:47:12Z

+    failed_checkin: MonitorCheckIn,
+    failure_issue_threshold: int,
+    received: datetime | None,
+) -> bool:


🟠 MAJOR — API Contract / Naming Inconsistency (confidence: 100%)

The extracted function was renamed from mark_failed_threshold to try_incident_threshold, which changes the internal API surface. While mark_failed.py is updated to call the new name, any other callers in the codebase that directly imported or called mark_failed_threshold will break. The blast radius analysis only covers the 4 changed files, but a codebase-wide search for mark_failed_threshold references (tests, tasks, other modules) is needed to confirm no breakage.

Evidence:

Original function in mark_failed.py was named mark_failed_threshold

New function in incidents.py is named try_incident_threshold

No re-export or alias is provided in mark_failed.py for backward compatibility

Tests that directly import mark_failed_threshold from mark_failed will fail

_{Agent: architecture}

grapple-pr · 2026-04-08T00:47:12Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


🟠 MAJOR — Logic Error / Data Integrity (confidence: 100%)

The config copy with humanized schedule_type is computed but never used. The returned dict references monitor_environment.monitor.config (the original dict) instead of the local config variable, so the schedule_type transformation is silently discarded.

Evidence:

Line 155: config = monitor_environment.monitor.config.copy() — copy is made and potentially modified

Line 157: config['schedule_type'] = monitor_environment.monitor.get_schedule_type_display() — written to local config

Line 160: 'config': monitor_environment.monitor.config — returns the ORIGINAL unmodified dict, not config

This bug exists verbatim in the original mark_failed.py and was faithfully copied, but it means the monitor context sent with every occurrence always has the raw integer schedule_type instead of the human-readable string

_{Agent: security}

grapple-pr · 2026-04-08T00:47:12Z

+    status_counts = Counter(
+        checkin["status"]
+        for checkin in failed_checkins
+        if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys()


🟠 MAJOR — Edge case / KeyError risk (confidence: 100%)

In get_failure_reason, when sum(status_counts.values()) == 1, the code accesses SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]. If a failed check-in has a status that is NOT in HUMAN_FAILURE_STATUS_MAP (filtered out on line 138), status_counts could be empty (sum == 0), and the code falls through to the get_text_list path which would produce an empty string. However, if a check-in has a status that IS in HUMAN_FAILURE_STATUS_MAP but NOT in SINGULAR_HUMAN_FAILURE_MAP, this would raise a KeyError. Currently the maps have identical keys so this is safe, but there's no defensive check.

Evidence:

Lines 136-139: status_counts = Counter(checkin['status'] for checkin in failed_checkins if checkin['status'] in HUMAN_FAILURE_STATUS_MAP.keys())

If no check-ins match, status_counts is empty and sum(status_counts.values()) == 0

Falls through to get_text_list([], last_word=_('and')) which returns ''

Returns _('%(problem_checkins)s check-ins detected') % {'problem_checkins': ''} → ' check-ins detected'

This is then surfaced as the 'Failure reason' evidence in the issue occurrence shown to users

HUMAN_FAILURE_STATUS_MAP has keys: ERROR, MISSED, TIMEOUT

SINGULAR_HUMAN_FAILURE_MAP has keys: ERROR, MISSED, TIMEOUT

If these maps ever diverge, line 142 would raise KeyError

Also: if all check-ins are filtered out (status_counts is empty), sum == 0, so the code goes to get_text_list with an empty list, producing a malformed string like ' check-ins detected'

_{Agent: logic}

grapple-pr · 2026-04-08T00:47:13Z

+    incident: MonitorIncident,
+    received: datetime | None,
+) -> None:
+    from sentry.issues.issue_occurrence import IssueEvidence, IssueOccurrence


🟠 MAJOR — API Contract / Naming Inconsistency (confidence: 99%)

The occurrence creation function was renamed from create_issue_platform_occurrence to create_incident_occurrence. Any code outside mark_failed.py that imports or references create_issue_platform_occurrence by name (e.g., tests that mock it, or other callers) will break without a backward-compatible alias.

Evidence:

Original function was create_issue_platform_occurrence in mark_failed.py

New function is create_incident_occurrence in incident_occurrence.py

Tests commonly mock internal functions like create_issue_platform_occurrence by path; those mocks will silently pass (no-op) or fail

_{Agent: architecture}

grapple-pr · 2026-04-08T00:47:13Z

+    received: datetime | None,
+) -> bool:
+    from sentry.signals import monitor_environment_failed
+


🟡 MINOR — Naming conventions (confidence: 95%)

Function name try_incident_threshold uses a 'try_' prefix which typically suggests an operation that may fail gracefully and returns a success/failure boolean. However, the name doesn't clearly indicate it may create incidents. Consider if check_incident_threshold or handle_incident_threshold would be more precise.

Evidence:

The function both checks thresholds AND creates incidents/occurrences with side effects (database writes, Kafka calls)

The 'try_' prefix typically suggests non-throwing error handling, but this function performs substantial work

This naming pattern could be confusing to future maintainers about the function's scope

_{Agent: style}

grapple-pr · 2026-04-08T00:47:13Z

+
+    id: int
+    date_added: datetime
+    status: int


🟡 MINOR — Type Definition Location (confidence: 81%)

SimpleCheckIn is correctly defined in types.py but is also still used by incidents.py which imports it from types.py. Verify that no residual definition of SimpleCheckIn exists in the old mark_failed.py. The diff confirms it was removed — this is good. However, the docstring says 'A stripped down check in object' which could be more descriptive about its purpose (it represents check-ins used for incident threshold evaluation).

Evidence:

SimpleCheckIn moved from mark_failed.py to types.py

Imported correctly in both incidents.py (line 9) and incident_occurrence.py (line 20)

_{Agent: architecture}

grapple-pr · 2026-04-08T00:47:13Z

+
+    monitor_environment_failed.send(monitor_environment=monitor_env, sender=type(monitor_env))
+
+    return True


🔵 INFO — Refactor Completeness (confidence: 100%)

The function was renamed from mark_failed_threshold to try_incident_threshold during extraction. While this is a reasonable rename for clarity, any external callers or tests that directly imported mark_failed_threshold from mark_failed.py would break. The intent says 'The public API surface consumed by callers of mark_failed.py should remain unchanged or be backward compatible.' If mark_failed_threshold was only called internally within mark_failed, this is fine. But if any test or module imported it directly, a backward-compatible alias should be provided.

Evidence:

Original: def mark_failed_threshold(...) in mark_failed.py

New: def try_incident_threshold(...) in incidents.py

mark_failed.py now calls try_incident_threshold instead of the old internal function

The top-level mark_failed() function signature is preserved, which is the primary public API

_{Agent: logic}

grapple-pr · 2026-04-08T00:47:13Z

+from typing import cast
+
+from sentry.monitors.logic.incident_occurrence import create_incident_occurrence
+from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus


🔵 INFO — Import Hygiene (confidence: 85%)

The MonitorStatus import is used in incidents.py but not explicitly imported in the file's imports section — actually it is imported on line 8. However, MonitorIncident is imported but only used as a type annotation for incident variable and in MonitorIncident.objects.get_or_create. This is fine, but note that CheckInStatus and MonitorCheckIn are also imported and used. All imports appear valid.

Evidence:

Line 8: from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus

All four are used within the function body

_{Agent: logic}

grapple-pr · 2026-04-08T00:47:13Z

+    "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected"
+    "A failed check-in was detected"
+    """
+


🔵 INFO — Documentation (confidence: 82%)

Typo in docstring: 'humam readible' should be 'human readable'

Evidence:

Line 130 contains: 'Builds a humam readible string from a list of failed check-ins.'

This is a documentation typo that should be corrected for clarity

_{Agent: style}

grapple-pr

🔧 [Grapple PR] 8 auto-fix suggestions available

Apply individually or use GitHub's "Add suggestion to batch" → "Commit suggestions" to apply all at once.

_{Commits from Grapple PR are prefixed with [Grapple PR] for easy identification in merge conflicts.}

grapple-pr · 2026-04-08T00:49:28Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


✅ [Grapple PR] Auto-fix — logic agent (Small fix (2 lines, 1 file))

get_monitor_environment_context builds a local config variable with the schedule_type overwritten, but then the returned dictionary uses monitor_environment.monitor.config (the original, unmodified config) instead of the local config variable. This means the schedule_type transformation is dead code and the context will always contain the raw schedule_type value instead of the display value. This bug existed in the original code and was faithfully copied, but it is still a logic error.

Suggested change

"id": str(monitor_environment.monitor.guid),

"config": config,

_{🤖 Grapple PR auto-fix • critical • confidence: 100%}

grapple-pr · 2026-04-08T00:49:29Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


✅ [Grapple PR] Auto-fix — security agent (Small fix (2 lines, 1 file))

The config copy with humanized schedule_type is computed but never used. The returned dict references monitor_environment.monitor.config (the original dict) instead of the local config variable, so the schedule_type transformation is silently discarded.

Suggested change

"id": str(monitor_environment.monitor.guid),

"config": config,

_{🤖 Grapple PR auto-fix • major • confidence: 100%}

grapple-pr · 2026-04-08T00:49:29Z

-from datetime import datetime, timezone
-from typing import TYPE_CHECKING, TypedDict, cast
+from datetime import datetime



✅ [Grapple PR] Auto-fix — logic agent (Small fix (2 lines, 1 file))

The MonitorEnvironment import is retained in mark_failed.py but after the refactor it is no longer used directly in this file. The mark_failed function receives MonitorCheckIn and accesses monitor_environment through it; MonitorEnvironment is not referenced as a type annotation or used anywhere in the remaining code of mark_failed.py.

Suggested change

from sentry.monitors.models import CheckInStatus, MonitorCheckIn

_{🤖 Grapple PR auto-fix • major • confidence: 89%}

grapple-pr · 2026-04-08T00:49:29Z

+    return _("%(problem_checkins)s check-ins detected") % {"problem_checkins": human_status}
+
+
+def get_monitor_environment_context(monitor_environment: MonitorEnvironment):


✅ [Grapple PR] Auto-fix — logic agent (Small fix (2 lines, 1 file))

In get_monitor_environment_context, the local variable config is computed and modified (line 152-153: config = monitor_environment.monitor.config.copy() and the schedule_type override), but the returned dict uses monitor_environment.monitor.config (the original, unmodified config) instead of the local config variable. This is a pre-existing bug that was faithfully moved from mark_failed.py, but it means the schedule_type display override is silently discarded.

Suggested change

def get_monitor_environment_context(monitor_environment: MonitorEnvironment):

"config": config,

_{🤖 Grapple PR auto-fix • minor • confidence: 100%}

grapple-pr · 2026-04-08T00:49:29Z

-

 def mark_failed(
    failed_checkin: MonitorCheckIn,


✅ [Grapple PR] Auto-fix — security agent (Small fix (5 lines, 1 file))

SimpleCheckIn TypedDict was previously exported from sentry.monitors.logic.mark_failed and is now only defined in sentry.monitors.types. Any code outside this PR that imported SimpleCheckIn from mark_failed will get an ImportError at runtime. No re-export shim is provided.

Suggested change

failed_checkin: MonitorCheckIn,

# Re-exported for backward compatibility: SimpleCheckIn was previously defined

# in this module. It has been moved to sentry.monitors.types as part of the

# incident logic refactor. External callers importing from this path will

# continue to work without modification.

from sentry.monitors.types import SimpleCheckIn as SimpleCheckIn # noqa: F401

_{🤖 Grapple PR auto-fix • minor • confidence: 100%}

grapple-pr · 2026-04-08T00:49:29Z

+                .order_by("-date_added")
+                .values("id", "date_added", "status"),
+            )
+


✅ [Grapple PR] Auto-fix — security agent (Small fix (4 lines, 1 file))

When failure_issue_threshold > 1, the DB query fetches check-ins filtered only by date_added__lte=failed_checkin.date_added with no upper bound on count before the query executes. The slice [:failure_issue_threshold] is applied in Python after the full queryset is evaluated via list(reversed(...)), meaning Django will fetch ALL matching check-ins into memory before slicing. This is a performance regression risk for monitors with long histories, and diverges from the intent of using .values() 'for performance reasons'.

Suggested change

.values("id", "date_added", "status")[:failure_issue_threshold],

previous_checkins = list(reversed(previous_checkins))

_{🤖 Grapple PR auto-fix • minor • confidence: 100%}

grapple-pr · 2026-04-08T00:49:29Z

+    received: datetime | None,
+) -> bool:
+    from sentry.signals import monitor_environment_failed
+


✅ [Grapple PR] Auto-fix — style agent (Small fix (2 lines, 1 file))

Function name try_incident_threshold uses a 'try_' prefix which typically suggests an operation that may fail gracefully and returns a success/failure boolean. However, the name doesn't clearly indicate it may create incidents. Consider if check_incident_threshold or handle_incident_threshold would be more precise.

Suggested change

def handle_incident_threshold(

_{🤖 Grapple PR auto-fix • minor • confidence: 95%}

grapple-pr · 2026-04-08T00:49:29Z

+
+    id: int
+    date_added: datetime
+    status: int


✅ [Grapple PR] Auto-fix — architecture agent (Small fix (4 lines, 1 file))

SimpleCheckIn is correctly defined in types.py but is also still used by incidents.py which imports it from types.py. Verify that no residual definition of SimpleCheckIn exists in the old mark_failed.py. The diff confirms it was removed — this is good. However, the docstring says 'A stripped down check in object' which could be more descriptive about its purpose (it represents check-ins used for incident threshold evaluation).

Suggested change

status: int

A minimal check-in representation used for incident threshold evaluation.

Contains only the fields required to assess monitor failure and recovery

conditions in incident and occurrence processing logic.

_{🤖 Grapple PR auto-fix • minor • confidence: 81%}

grapple-pr · 2026-04-08T00:49:31Z

+    failed_checkin: MonitorCheckIn,
+    failure_issue_threshold: int,
+    received: datetime | None,
+) -> bool:


🟡 [Grapple PR] Suggested fix — architecture agent (Small fix (7 lines, 1 file))

The extracted function was renamed from mark_failed_threshold to try_incident_threshold, which changes the internal API surface. While mark_failed.py is updated to call the new name, any other callers in the codebase that directly imported or called mark_failed_threshold will break. The blast radius analysis only covers the 4 changed files, but a codebase-wide search for mark_failed_threshold references (tests, tasks, other modules) is needed to confirm no breakage.

--- a/src/sentry/monitors/logic/mark_failed.py +++ b/src/sentry/monitors/logic/mark_failed.py @@ -1,6 +1,8 @@ from __future__ import annotations import logging from datetime import datetime +from sentry.monitors.logic.incidents import try_incident_threshold + logger = logging.getLogger(__name__) @@ -10,6 +12,10 @@ logger = logging.getLogger(__name__) +# Backward-compatibility alias: `mark_failed_threshold` was extracted and +# renamed to `try_incident_threshold` in incidents.py. Any existing callers +# (tests, tasks, other modules) that imported `mark_failed_threshold` from +# this module will continue to work without modification. +mark_failed_threshold = try_incident_threshold

_{🤖 Grapple PR auto-fix • major • Review this diff before applying}

grapple-pr · 2026-04-08T00:49:31Z

+    status_counts = Counter(
+        checkin["status"]
+        for checkin in failed_checkins
+        if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys()


🟡 [Grapple PR] Suggested fix — logic agent (Larger fix (16 lines, 1 file) — review recommended)

In get_failure_reason, when sum(status_counts.values()) == 1, the code accesses SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]. If a failed check-in has a status that is NOT in HUMAN_FAILURE_STATUS_MAP (filtered out on line 138), status_counts could be empty (sum == 0), and the code falls through to the get_text_list path which would produce an empty string. However, if a check-in has a status that IS in HUMAN_FAILURE_STATUS_MAP but NOT in SINGULAR_HUMAN_FAILURE_MAP, this would raise a KeyError. Currently the maps have identical keys so this is safe, but there's no defensive check.

--- a/src/sentry/monitors/logic/incident_occurrence.py +++ b/src/sentry/monitors/logic/incident_occurrence.py @@ -139,10 +139,18 @@ def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]): status_counts = Counter( checkin["status"] for checkin in failed_checkins if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys() ) + total = sum(status_counts.values()) + + # No check-ins with recognized failure statuses; return a generic fallback + # rather than producing a malformed string like ' check-ins detected'. + if total == 0: + return _("A failed check-in was detected") + - if sum(status_counts.values()) == 1: - return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]] + if total == 1: + status_key = list(status_counts.keys())[0] + # Defensive .get() in case SINGULAR_HUMAN_FAILURE_MAP and + # HUMAN_FAILURE_STATUS_MAP ever diverge (they currently share keys). + return SINGULAR_HUMAN_FAILURE_MAP.get( + status_key, _("A failed check-in was detected") + ) human_status = get_text_list( [

_{🤖 Grapple PR auto-fix • major • Review this diff before applying}

grapple-pr · 2026-04-08T00:49:32Z

+    incident: MonitorIncident,
+    received: datetime | None,
+) -> None:
+    from sentry.issues.issue_occurrence import IssueEvidence, IssueOccurrence


🟡 [Grapple PR] Suggested fix — architecture agent (Larger fix (11 lines, 1 file) — review recommended)

The occurrence creation function was renamed from create_issue_platform_occurrence to create_incident_occurrence. Any code outside mark_failed.py that imports or references create_issue_platform_occurrence by name (e.g., tests that mock it, or other callers) will break without a backward-compatible alias.

--- a/src/sentry/monitors/logic/incident_occurrence.py +++ b/src/sentry/monitors/logic/incident_occurrence.py @@ -168,3 +168,13 @@ "type": monitor_environment.monitor.get_type_display(), } + + +# Backward-compatible alias for the previous name of this function when it lived +# in mark_failed.py. Tests that mock or import `create_issue_platform_occurrence` +# by path (e.g. via `mock.patch("...incident_occurrence.create_issue_platform_occurrence")`) +# will continue to work without modification. +# +# Deprecated: use `create_incident_occurrence` directly. +# TODO: Remove this alias once all references to `create_issue_platform_occurrence` +# have been updated across tests and callers. +create_issue_platform_occurrence = create_incident_occurrence

_{🤖 Grapple PR auto-fix • major • Review this diff before applying}

grapple-pr · 2026-04-08T00:49:33Z

+from datetime import datetime
+from typing import cast
+
+from sentry.monitors.logic.incident_occurrence import create_incident_occurrence


🟡 [Grapple PR] Suggested fix — architecture agent (Larger fix (12 lines, 1 file) — review recommended)

incidents.py imports directly from incident_occurrence.py, creating a fixed coupling direction. This is acceptable for now, but if incident_occurrence.py ever needs to reference incident logic, a circular dependency will result. Consider whether the call to create_incident_occurrence should instead be injected or orchestrated from a higher-level caller (like mark_failed.py) to keep the two new modules independent.

--- a/src/sentry/monitors/logic/incidents.py +++ b/src/sentry/monitors/logic/incidents.py @@ -1,11 +1,11 @@ from __future__ import annotations import logging from datetime import datetime -from typing import cast +from typing import Callable, cast -from sentry.monitors.logic.incident_occurrence import create_incident_occurrence from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus from sentry.monitors.types import SimpleCheckIn logger = logging.getLogger(__name__) def try_incident_threshold( failed_checkin: MonitorCheckIn, failure_issue_threshold: int, received: datetime | None, + # Injected to avoid a coupling dependency between incidents.py and + # incident_occurrence.py. If incident_occurrence.py ever needs to reference + # incident state, a circular import would result. The orchestrating caller + # (mark_failed.py) is responsible for wiring this dependency. + create_occurrence: Callable = None, ) -> bool: @@ -88,10 +88,13 @@ # Only create an occurrence if: # - We have an active incident and fingerprint # - The monitor and env are not muted - if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident: + if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident and create_occurrence is not None: checkins = MonitorCheckIn.objects.filter(id__in=[c["id"] for c in previous_checkins]) for checkin in checkins: - create_incident_occurrence( + create_occurrence( previous_checkins, checkin, incident, received=received, )

_{🤖 Grapple PR auto-fix • minor • Review this diff before applying}

Frawatson · 2026-04-12T01:51:05Z

@grapple-pr review

grapple-pr

🤖 Grapple PR Review

Summary

This PR reorganizes the cron monitor incident creation and issue occurrence logic by extracting it from mark_failed.py into two new dedicated modules: incident_occurrence.py and incidents.py. This is a refactoring change that improves code organization and separation of concerns.

Changes: 4 files (+289 / -264 lines)

🟢 2 files added: src/sentry/monitors/logic/incident_occurrence.py, src/sentry/monitors/logic/incidents.py
🟡 2 files modified: src/sentry/monitors/logic/mark_failed.py, src/sentry/monitors/types.py

Resolves: #8 (feat(upsampling) - Support upsampled error count with performance optimizations)

CI: ⏳ 0 passed, 0 failed

Impact Diagram

graph TD
  pr_0[/"🔀 PR Changes: 4 files"/]
  f1["🟢 incident_occurrence.py +171/-0"]
  pr_0 --> f1
  f2["🟢 incidents.py +104/-0"]
  pr_0 --> f2
  f3["🟡 mark_failed.py +4/-264"]
  pr_0 --> f3
  f4["🟡 types.py +10/-0"]
  pr_0 --> f4
  f1 -.- f2
  f2 -.- f3
  style f1 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f2 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f3 fill:#7c3aed,color:#fff,stroke:#7c3aed
  style f4 fill:#7c3aed,color:#fff,stroke:#7c3aed

5 major issues found — review recommended.

Severity	Count
🟠 Major	5
🟡 Minor	5
🔵 Info	1

Review details

Reviewed at: 2026-04-12T03:17:25.375Z
Agents used: security, logic, style, architecture, performance, verification
Context level: full
Execution time: 84.2s
Low-confidence findings suppressed: 3
Blast radius: 0 (contained)

Powered by Grapple PR — AI that understands your code.

grapple-pr · 2026-04-12T03:17:25Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


🟠 MAJOR — Logic Error / Potential KeyError or Misleading Output (confidence: 100%)

get_failure_reason has two edge cases: (1) If all check-ins in failed_checkins have statuses not in HUMAN_FAILURE_STATUS_MAP, status_counts is empty, sum is 0, and the function falls through to produce a malformed/empty string like '0 check-ins detected'. (2) The docstring mentions 'A failed check-in was detected' as a possible output but this string does not exist in the code — it appears the original had a fallback that was not ported.

Evidence:

Line 140-144: Counter filters only statuses in HUMAN_FAILURE_STATUS_MAP (ERROR, MISSED, TIMEOUT)

Line 146: if sum(status_counts.values()) == 1 — only handles the singular case

No else/fallback branch for sum == 0 (empty status_counts)

Docstring at line 133 mentions 'A failed check-in was detected' but no code path produces that string — SINGULAR_HUMAN_FAILURE_MAP only covers ERROR, MISSED, TIMEOUT

If a check-in with status IN_PROGRESS (or any unrecognized status) is in failed_checkins, get_text_list receives an empty list

_{Agent: security}

grapple-pr · 2026-04-12T03:17:25Z

+        monitor_env.save(update_fields=("status",))
+
+        starting_checkin = previous_checkins[0]
+


🟠 MAJOR — Logic Error / Potential IndexError (confidence: 100%)

When failure_issue_threshold > 1, previous_checkins is populated from a DB query. If the query returns 0 rows (race condition, data deleted), the subsequent previous_checkins[0] at line 60 raises an IndexError, causing an unhandled exception. Also, the query does not filter out OK check-ins before slicing — it fetches the N most recent and then checks if any are OK, but if there are fewer than N check-ins total, the threshold check is applied to fewer items than intended.

Evidence:

Lines 42-50: DB query fetches check-ins with date_added__lte=failed_checkin.date_added, no guarantee of non-empty result

Line 52: previous_checkins = list(reversed(previous_checkins[:failure_issue_threshold])) — empty if DB returns nothing

Line 60: starting_checkin = previous_checkins[0] — IndexError if list is empty

No guard between lines 56 and 60 to handle empty previous_checkins

This was present in original code but is a real bug that should be fixed during this refactor

_{Agent: security}

grapple-pr · 2026-04-12T03:17:25Z

+        "config": monitor_environment.monitor.config,
+        "status": monitor_environment.get_status_display(),
+        "type": monitor_environment.monitor.get_type_display(),
+    }


🟠 MAJOR — Logic Error - Config Context Bug Preserved (confidence: 100%)

The get_monitor_environment_context function builds a config copy and conditionally modifies schedule_type, but then returns monitor_environment.monitor.config (the original) instead of the modified config copy. This bug existed in the original mark_failed.py and was faithfully copied, but it's worth noting as a pre-existing logic error where the config copy is created but never used in the return value.

Evidence:

Line 163: config = monitor_environment.monitor.config.copy()

Line 164-165: if 'schedule_type' in config: config['schedule_type'] = ... modifies the copy

Line 170: 'config': monitor_environment.monitor.config returns the ORIGINAL, not the modified copy

This means the schedule_type display transformation is silently discarded

_{Agent: logic}

grapple-pr · 2026-04-12T03:17:25Z

+logger = logging.getLogger(__name__)
+
+
+def try_incident_threshold(


🟠 MAJOR — Import path changes breaking test mocks (confidence: 94%)

The function mark_failed_threshold was renamed to try_incident_threshold and create_issue_platform_occurrence was renamed to create_incident_occurrence. Any existing tests that mock or patch these functions by their original names and module paths (e.g., @mock.patch('sentry.monitors.logic.mark_failed.mark_failed_threshold') or @mock.patch('sentry.monitors.logic.mark_failed.create_issue_platform_occurrence')) will silently stop patching and tests may pass or fail for the wrong reasons. The acceptance criteria states 'All existing tests for mark_failed.py continue to pass without modification', but this rename makes that unlikely if any tests reference the old function names/paths.

Evidence:

Original function in mark_failed.py was named mark_failed_threshold, renamed to try_incident_threshold in incidents.py

Original function create_issue_platform_occurrence renamed to create_incident_occurrence in incident_occurrence.py

The intent specification says 'No logic is lost or altered during the extraction' and 'All existing tests for mark_failed.py continue to pass without modification'

Test files are not included in this PR, so we cannot confirm whether mocks referencing old paths exist

_{Agent: logic}

grapple-pr · 2026-04-12T03:17:25Z

-        "status": monitor_environment.get_status_display(),
-        "type": monitor_environment.monitor.get_type_display(),
-    }
+    return try_incident_threshold(failed_checkin, failure_issue_threshold, received)


🟠 MAJOR — API contract / Test mock breakage (confidence: 94%)

The function mark_failed_threshold has been renamed to try_incident_threshold and moved to a new module. Similarly, create_issue_platform_occurrence has been renamed to create_incident_occurrence and moved. Any existing tests or callers that mock or patch these functions at their old paths (e.g., @mock.patch('sentry.monitors.logic.mark_failed.mark_failed_threshold'), @mock.patch('sentry.monitors.logic.mark_failed.create_issue_platform_occurrence')) will silently stop intercepting the real function and tests will either fail or pass incorrectly. The function renaming (not just moving) doubles the risk since even callers doing from sentry.monitors.logic.mark_failed import create_issue_platform_occurrence will get an ImportError.

Evidence:

Original: mark_failed_threshold in mark_failed.py → now try_incident_threshold in incidents.py

Original: create_issue_platform_occurrence in mark_failed.py → now create_incident_occurrence in incident_occurrence.py

The intent specification states 'All existing tests for mark_failed.py continue to pass without modification', but function renaming means any mock targeting old names/paths will break

No re-exports of old names are provided in mark_failed.py for backward compatibility

_{Agent: architecture}

grapple-pr · 2026-04-12T03:17:26Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


🟡 MINOR — Logic Error / Stale Data (confidence: 100%)

get_monitor_environment_context returns monitor_environment.monitor.config (the original dict) instead of the local config copy that has schedule_type transformed to a display string. The copy is built and modified but then discarded.

Evidence:

Line 155: config = monitor_environment.monitor.config.copy() — a copy is made and potentially mutated

Line 157: config['schedule_type'] = monitor_environment.monitor.get_schedule_type_display() — human-readable value stored in copy

Line 163: 'config': monitor_environment.monitor.config — returns the original raw dict, not the transformed copy

The caller in create_incident_occurrence at line 86 embeds this context in event_data sent to Kafka, so consumers see the raw schedule_type integer instead of the display string

_{Agent: security}

grapple-pr · 2026-04-12T03:17:26Z

+    if sum(status_counts.values()) == 1:
+        return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]
+
+    human_status = get_text_list(


🟡 MINOR — Edge Case - Empty failed_checkins (confidence: 100%)

The get_failure_reason function will raise a KeyError if failed_checkins contains only check-ins with statuses NOT in HUMAN_FAILURE_STATUS_MAP (e.g., all filtered out). In that case status_counts would be empty, sum(status_counts.values()) would be 0, which is not == 1, so it falls through to the get_text_list path with an empty list. While this is a pre-existing edge case, the extraction is a good opportunity to note it.

Evidence:

If all check-in statuses are filtered out by the if checkin['status'] in HUMAN_FAILURE_STATUS_MAP.keys() guard, status_counts is an empty Counter

sum of empty Counter values is 0, which != 1, so it proceeds to format an empty list

get_text_list with an empty list returns '' which then gets interpolated into the message as ' check-ins detected'

_{Agent: logic}

grapple-pr · 2026-04-12T03:17:26Z

+    "A failed check-in was detected"
+    """
+
+    status_counts = Counter(


🟡 MINOR — Documentation (confidence: 86%)

Typo in docstring: 'humam readible' should be 'human readable'

Evidence:

The docstring for get_failure_reason contains a spelling error that reduces code clarity

This is a public function with a docstring that should follow proper English conventions

_{Agent: style}

grapple-pr · 2026-04-12T03:17:26Z

+from datetime import datetime
+from typing import cast
+
+from sentry.monitors.logic.incident_occurrence import create_incident_occurrence


🟡 MINOR — Module boundaries / Dependency direction (confidence: 84%)

incidents.py directly imports from incident_occurrence.py, establishing a one-way dependency. This is architecturally fine for now, but if incident_occurrence.py ever needs to reference incident logic, a circular dependency would result. Consider whether the create_incident_occurrence call should instead be injected or dispatched from mark_failed.py (the orchestrator) to keep both new modules independent of each other.

Evidence:

incidents.py line 7: from sentry.monitors.logic.incident_occurrence import create_incident_occurrence

mark_failed.py already serves as the orchestrator; it could call both modules sequentially rather than having incidents.py call incident_occurrence.py

_{Agent: architecture}

grapple-pr · 2026-04-12T03:17:26Z

+
+    id: int
+    date_added: datetime
+    status: int


🔵 INFO — Type definition placement (confidence: 85%)

The SimpleCheckIn TypedDict was correctly moved from mark_failed.py to the shared types.py module. Both incidents.py and incident_occurrence.py import it from types.py, which is the right pattern for shared type definitions. Good architectural choice.

Evidence:

incident_occurrence.py line 20: from sentry.monitors.types import SimpleCheckIn

incidents.py line 9: from sentry.monitors.types import SimpleCheckIn

_{Agent: architecture}

grapple-pr

🔧 [Grapple PR] 6 auto-fix suggestions available

Apply individually or use GitHub's "Add suggestion to batch" → "Commit suggestions" to apply all at once.

_{Commits from Grapple PR are prefixed with [Grapple PR] for easy identification in merge conflicts.}

grapple-pr · 2026-04-12T03:19:29Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


✅ [Grapple PR] Auto-fix — security agent (Small fix (3 lines, 1 file))

get_failure_reason has two edge cases: (1) If all check-ins in failed_checkins have statuses not in HUMAN_FAILURE_STATUS_MAP, status_counts is empty, sum is 0, and the function falls through to produce a malformed/empty string like '0 check-ins detected'. (2) The docstring mentions 'A failed check-in was detected' as a possible output but this string does not exist in the code — it appears the original had a fallback that was not ported.

Suggested change

"id": str(monitor_environment.monitor.guid),

if sum(status_counts.values()) == 0:

return _("A failed check-in was detected")

_{🤖 Grapple PR auto-fix • major • confidence: 100%}

grapple-pr · 2026-04-12T03:19:29Z

+        "config": monitor_environment.monitor.config,
+        "status": monitor_environment.get_status_display(),
+        "type": monitor_environment.monitor.get_type_display(),
+    }


✅ [Grapple PR] Auto-fix — logic agent (Small fix (2 lines, 1 file))

The get_monitor_environment_context function builds a config copy and conditionally modifies schedule_type, but then returns monitor_environment.monitor.config (the original) instead of the modified config copy. This bug existed in the original mark_failed.py and was faithfully copied, but it's worth noting as a pre-existing logic error where the config copy is created but never used in the return value.

Suggested change

}

"config": config,

_{🤖 Grapple PR auto-fix • major • confidence: 100%}

grapple-pr · 2026-04-12T03:19:29Z

+    MonitorEnvironment,
+    MonitorIncident,
+)
+from sentry.monitors.types import SimpleCheckIn


✅ [Grapple PR] Auto-fix — logic agent (Small fix (6 lines, 1 file))

The SimpleCheckIn type is imported from sentry.monitors.types which is correct, but any external code that previously imported SimpleCheckIn from sentry.monitors.logic.mark_failed will break since it has been removed from that module and no re-export is provided.

Suggested change

from sentry.monitors.types import SimpleCheckIn

# Re-export SimpleCheckIn for backward compatibility.

# The canonical definition has moved to sentry.monitors.types as part of the

# incident_occurrence.py refactor. External code importing SimpleCheckIn from

# this module will continue to work without modification.

from sentry.monitors.types import SimpleCheckIn as SimpleCheckIn # noqa: F401

_{🤖 Grapple PR auto-fix • minor • confidence: 100%}

grapple-pr · 2026-04-12T03:19:29Z

+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),


✅ [Grapple PR] Auto-fix — security agent (Small fix (2 lines, 1 file))

get_monitor_environment_context returns monitor_environment.monitor.config (the original dict) instead of the local config copy that has schedule_type transformed to a display string. The copy is built and modified but then discarded.

Suggested change

"id": str(monitor_environment.monitor.guid),

"config": config,

_{🤖 Grapple PR auto-fix • minor • confidence: 100%}

grapple-pr · 2026-04-12T03:19:29Z

+    if sum(status_counts.values()) == 1:
+        return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]
+
+    human_status = get_text_list(


✅ [Grapple PR] Auto-fix — logic agent (Small fix (7 lines, 1 file))

The get_failure_reason function will raise a KeyError if failed_checkins contains only check-ins with statuses NOT in HUMAN_FAILURE_STATUS_MAP (e.g., all filtered out). In that case status_counts would be empty, sum(status_counts.values()) would be 0, which is not == 1, so it falls through to the get_text_list path with an empty list. While this is a pre-existing edge case, the extraction is a good opportunity to note it.

Suggested change

human_status = get_text_list(

# Guard against the edge case where all check-ins have statuses not present

# in HUMAN_FAILURE_STATUS_MAP (e.g. unexpected/future statuses). Without this,

# get_text_list([]) returns '' and we'd produce the malformed string

# " check-ins detected".

if not status_counts:

return _("A failed check-in was detected")

_{🤖 Grapple PR auto-fix • minor • confidence: 100%}

grapple-pr · 2026-04-12T03:19:30Z

+    "A failed check-in was detected"
+    """
+
+    status_counts = Counter(


✅ [Grapple PR] Auto-fix — style agent (Small fix (2 lines, 1 file))

Typo in docstring: 'humam readible' should be 'human readable'

Suggested change

status_counts = Counter(

Builds a human readable string from a list of failed check-ins.

_{🤖 Grapple PR auto-fix • minor • confidence: 86%}

grapple-pr · 2026-04-12T03:19:31Z

+        monitor_env.save(update_fields=("status",))
+
+        starting_checkin = previous_checkins[0]
+


🟡 [Grapple PR] Suggested fix — security agent (Larger fix (12 lines, 1 file) — review recommended)

When failure_issue_threshold > 1, previous_checkins is populated from a DB query. If the query returns 0 rows (race condition, data deleted), the subsequent previous_checkins[0] at line 60 raises an IndexError, causing an unhandled exception. Also, the query does not filter out OK check-ins before slicing — it fetches the N most recent and then checks if any are OK, but if there are fewer than N check-ins total, the threshold check is applied to fewer items than intended.

--- a/src/sentry/monitors/logic/incidents.py +++ b/src/sentry/monitors/logic/incidents.py @@ -37,16 +37,25 @@ else: previous_checkins = cast( list[SimpleCheckIn], # Using .values for performance reasons MonitorCheckIn.objects.filter( monitor_environment=monitor_env, date_added__lte=failed_checkin.date_added ) + .exclude(status=CheckInStatus.OK) .order_by("-date_added") + [: failure_issue_threshold] .values("id", "date_added", "status"), ) # reverse the list after slicing in order to start with oldest check-in - previous_checkins = list(reversed(previous_checkins[:failure_issue_threshold])) + previous_checkins = list(reversed(previous_checkins)) - # If we have any successful check-ins within the threshold of - # commits we have NOT reached an incident state - if any([checkin["status"] == CheckInStatus.OK for checkin in previous_checkins]): + # If we don't have enough consecutive failures to meet the threshold, + # we have NOT reached an incident state. + # Also guard against an empty result (race condition / data deleted) + # which would cause IndexError at previous_checkins[0] below. + if len(previous_checkins) < failure_issue_threshold: return False # change monitor status + update fingerprint timestamp

_{🤖 Grapple PR auto-fix • major • Review this diff before applying}

grapple-pr · 2026-04-12T03:19:32Z

🔴 [Grapple PR] Partial fix — logic agent (Validation failed: No valid hunks found in patch)

The function mark_failed_threshold was renamed to try_incident_threshold and create_issue_platform_occurrence was renamed to create_incident_occurrence. Any existing tests that mock or patch these functions by their original names and module paths (e.g., @mock.patch('sentry.monitors.logic.mark_failed.mark_failed_threshold') or @mock.patch('sentry.monitors.logic.mark_failed.create_issue_platform_occurrence')) will silently stop patching and tests may pass or fail for the wrong reasons. The acceptance criteria states 'All existing tests for mark_failed.py continue to pass without modification', but this rename makes that unlikely if any tests reference the old function names/paths.

View suggested fix (needs manual review)

--- a/src/sentry/monitors/logic/incidents.py
+++ b/src/sentry/monitors/logic/incidents.py
@@ -1,6 +1,7 @@
 from __future__ import annotations
 
 import logging
 from datetime import datetime
 from typing import cast
 
 from sentry.monitors.logic.incident_occurrence import create_incident_occurrence
 from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus
 from sentry.monitors.types import SimpleCheckIn
 
 logger = logging.getLogger(__name__)
 
 
 def try_incident_threshold(
     failed_checkin: MonitorCheckIn,
     failure_issue_threshold: int,
     received: datetime | None,
 ) -> bool:

⚠️ Validation failed: No valid hunks found in patch

_{🤖 Grapple PR auto-fix • major • This fix needs manual verification}

grapple-pr · 2026-04-12T03:19:32Z

-        "status": monitor_environment.get_status_display(),
-        "type": monitor_environment.monitor.get_type_display(),
-    }
+    return try_incident_threshold(failed_checkin, failure_issue_threshold, received)


🟡 [Grapple PR] Suggested fix — architecture agent (Larger fix (20 lines, 1 file) — review recommended)

The function mark_failed_threshold has been renamed to try_incident_threshold and moved to a new module. Similarly, create_issue_platform_occurrence has been renamed to create_incident_occurrence and moved. Any existing tests or callers that mock or patch these functions at their old paths (e.g., @mock.patch('sentry.monitors.logic.mark_failed.mark_failed_threshold'), @mock.patch('sentry.monitors.logic.mark_failed.create_issue_platform_occurrence')) will silently stop intercepting the real function and tests will either fail or pass incorrectly. The function renaming (not just moving) doubles the risk since even callers doing from sentry.monitors.logic.mark_failed import create_issue_platform_occurrence will get an ImportError.

--- a/src/sentry/monitors/logic/mark_failed.py +++ b/src/sentry/monitors/logic/mark_failed.py @@ -1,7 +1,7 @@ from __future__ import annotations import logging -from datetime import datetime +from datetime import datetime # noqa: F401 from django.db.models import Q -from sentry.monitors.logic.incidents import try_incident_threshold +from sentry.monitors.logic.incidents import try_incident_threshold +from sentry.monitors.logic.incident_occurrence import create_incident_occurrence from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorEnvironment logger = logging.getLogger(__name__) +# Backward-compatible re-exports: these old names are preserved so that any +# existing test mocks patching the old paths continue to work correctly: +# +# @mock.patch('sentry.monitors.logic.mark_failed.mark_failed_threshold') +# @mock.patch('sentry.monitors.logic.mark_failed.create_issue_platform_occurrence') +# +# Without these aliases, mocks targeting the old names would silently stop +# intercepting the real function, causing tests to either fail or pass +# incorrectly (depending on whether the mock was expected to prevent side effects +# or assert call counts). +# +# When all callers and test mocks have been updated to use the new names and +# module paths, these aliases can be safely removed. +mark_failed_threshold = try_incident_threshold +create_issue_platform_occurrence = create_incident_occurrence

_{🤖 Grapple PR auto-fix • major • Review this diff before applying}

grapple-pr · 2026-04-12T03:19:33Z

+from datetime import datetime
+from typing import cast
+
+from sentry.monitors.logic.incident_occurrence import create_incident_occurrence


🟡 [Grapple PR] Suggested fix — architecture agent (Larger fix (33 lines, 1 file) — review recommended)

incidents.py directly imports from incident_occurrence.py, establishing a one-way dependency. This is architecturally fine for now, but if incident_occurrence.py ever needs to reference incident logic, a circular dependency would result. Consider whether the create_incident_occurrence call should instead be injected or dispatched from mark_failed.py (the orchestrator) to keep both new modules independent of each other.

--- a/src/sentry/monitors/logic/incidents.py +++ b/src/sentry/monitors/logic/incidents.py @@ -1,10 +1,9 @@ from __future__ import annotations import logging from datetime import datetime from typing import cast -from sentry.monitors.logic.incident_occurrence import create_incident_occurrence from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus from sentry.monitors.types import SimpleCheckIn @@ -14,7 +13,14 @@ logger = logging.getLogger(__name__) def try_incident_threshold( failed_checkin: MonitorCheckIn, failure_issue_threshold: int, received: datetime | None, -) -> bool: +) -> tuple[bool, list[SimpleCheckIn], MonitorIncident | None]: + """ + Evaluate whether the failure threshold has been reached and return the + incident context needed to create occurrences. + + Returns a 3-tuple of (threshold_reached, previous_checkins, incident). + The caller (mark_failed.py orchestrator) is responsible for calling + create_incident_occurrence so that incidents.py and incident_occurrence.py + remain independent of each other. + """ from sentry.signals import monitor_environment_failed monitor_env = failed_checkin.monitor_environment if monitor_env is None: - return False + return False, [], None # check to see if we need to update the status if monitor_env.status in [MonitorStatus.OK, MonitorStatus.ACTIVE]: @@ -63,7 +69,7 @@ def try_incident_threshold( # get the active incident from the monitor environment incident = monitor_env.active_incident else: # don't send occurrence for other statuses - return False + return False, [], None # Only create an occurrence if: # - We have an active incident and fingerprint # - The monitor and env are not muted - if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident: - checkins = MonitorCheckIn.objects.filter(id__in=[c["id"] for c in previous_checkins]) - for checkin in checkins: - create_incident_occurrence( - previous_checkins, - checkin, - incident, - received=received, - ) - monitor_environment_failed.send(monitor_environment=monitor_env, sender=type(monitor_env)) - return True + # Return context to the orchestrator so it can call create_incident_occurrence + # without incidents.py taking a direct dependency on incident_occurrence.py. + if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident: + return True, previous_checkins, incident + + return True, [], None

_{🤖 Grapple PR auto-fix • minor • Review this diff before applying}

ref(crons): Reorganize incident creation / issue occurrence logic (#8…

c3b32a7

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

Frawatson mentioned this pull request Apr 7, 2026

feat(upsampling) - Support upsampled error count with performance optimizations #8

Open

Frawatson closed this Apr 7, 2026

Frawatson reopened this Apr 7, 2026

Frawatson closed this Apr 7, 2026

Frawatson reopened this Apr 7, 2026

grapple-pr Bot reviewed Apr 7, 2026

View reviewed changes

Frawatson closed this Apr 7, 2026

Frawatson reopened this Apr 7, 2026

grapple-pr Bot reviewed Apr 8, 2026

View reviewed changes

grapple-pr Bot reviewed Apr 12, 2026

View reviewed changes

		logger = logging.getLogger(__name__)


		def try_incident_threshold(



		def get_monitor_environment_context(monitor_environment: MonitorEnvironment):
		config = monitor_environment.monitor.config.copy()

	config = monitor_environment.monitor.config.copy()
	"config": config,

	from sentry.monitors.types import SimpleCheckIn
	from sentry.monitors.types import SimpleCheckIn

	"""
	Builds a human readable string from a list of failed check-ins.


		monitor_environment_failed.send(monitor_environment=monitor_env, sender=type(monitor_env))

		return True

	"id": str(monitor_environment.monitor.guid),
	"config": config,


	from sentry.monitors.models import CheckInStatus, MonitorCheckIn

		return _("%(problem_checkins)s check-ins detected") % {"problem_checkins": human_status}


		def get_monitor_environment_context(monitor_environment: MonitorEnvironment):

-    failed_checkin: MonitorCheckIn,
+# Re-exported for backward compatibility: SimpleCheckIn was previously defined
+# in this module. It has been moved to sentry.monitors.types as part of the
+# incident logic refactor. External callers importing from this path will
+# continue to work without modification.
+from sentry.monitors.types import SimpleCheckIn as SimpleCheckIn  # noqa: F401


	.values("id", "date_added", "status")[:failure_issue_threshold],
	previous_checkins = list(reversed(previous_checkins))

-    status: int
+    A minimal check-in representation used for incident threshold evaluation.
+    Contains only the fields required to assess monitor failure and recovery
+    conditions in incident and occurrence processing logic.

		monitor_env.save(update_fields=("status",))

		starting_checkin = previous_checkins[0]

-        "id": str(monitor_environment.monitor.guid),
+    if sum(status_counts.values()) == 0:
+        return _("A failed check-in was detected")

-from sentry.monitors.types import SimpleCheckIn
+# Re-export SimpleCheckIn for backward compatibility.
+# The canonical definition has moved to sentry.monitors.types as part of the
+# incident_occurrence.py refactor. External code importing SimpleCheckIn from
+# this module will continue to work without modification.
+from sentry.monitors.types import SimpleCheckIn as SimpleCheckIn  # noqa: F401

-    human_status = get_text_list(
+    # Guard against the edge case where all check-ins have statuses not present
+    # in HUMAN_FAILURE_STATUS_MAP (e.g. unexpected/future statuses). Without this,
+    # get_text_list([]) returns '' and we'd produce the malformed string
+    # " check-ins detected".
+    if not status_counts:
+        return _("A failed check-in was detected")

	status_counts = Counter(
	Builds a human readable string from a list of failed check-ins.

Conversation

Frawatson commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

🤖 Grapple PR Review

Summary

Impact Diagram

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot commented Apr 7, 2026

Uh oh!

grapple-pr Bot left a comment

Choose a reason for hiding this comment

🤖 Grapple PR Review

Summary

Impact Diagram

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot left a comment

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot left a comment

Choose a reason for hiding this comment

🤖 Grapple PR Review

Summary

Impact Diagram

Uh oh!

grapple-pr Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

grapple-pr Bot Apr 8, 2026