-
Notifications
You must be signed in to change notification settings - Fork 0
ref(crons): Reorganize incident creation / issue occurrence logic #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: monitor-incident-refactor-before
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,171 @@ | ||||||||||||||||||||||
| from __future__ import annotations | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| import logging | ||||||||||||||||||||||
| import uuid | ||||||||||||||||||||||
| from collections import Counter | ||||||||||||||||||||||
| from collections.abc import Mapping, Sequence | ||||||||||||||||||||||
| from datetime import datetime, timezone | ||||||||||||||||||||||
| from typing import TYPE_CHECKING | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| from django.utils.text import get_text_list | ||||||||||||||||||||||
| from django.utils.translation import gettext_lazy as _ | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| from sentry.issues.grouptype import MonitorIncidentType | ||||||||||||||||||||||
| from sentry.monitors.models import ( | ||||||||||||||||||||||
| CheckInStatus, | ||||||||||||||||||||||
| MonitorCheckIn, | ||||||||||||||||||||||
| MonitorEnvironment, | ||||||||||||||||||||||
| MonitorIncident, | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
| from sentry.monitors.types import SimpleCheckIn | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 88% There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Incomplete type export (confidence: 100%) The Evidence:
Agent: logic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 100% |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if TYPE_CHECKING: | ||||||||||||||||||||||
| from django.utils.functional import _StrPromise | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| logger = logging.getLogger(__name__) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| def create_incident_occurrence( | ||||||||||||||||||||||
| failed_checkins: Sequence[SimpleCheckIn], | ||||||||||||||||||||||
| failed_checkin: MonitorCheckIn, | ||||||||||||||||||||||
| incident: MonitorIncident, | ||||||||||||||||||||||
| received: datetime | None, | ||||||||||||||||||||||
| ) -> None: | ||||||||||||||||||||||
| from sentry.issues.issue_occurrence import IssueEvidence, IssueOccurrence | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — API Contract / Naming Inconsistency (confidence: 99%) The occurrence creation function was renamed from Evidence:
Agent: architecture There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
--- a/src/sentry/monitors/logic/incident_occurrence.py
+++ b/src/sentry/monitors/logic/incident_occurrence.py
@@ -168,3 +168,13 @@
"type": monitor_environment.monitor.get_type_display(),
}
+
+
+# Backward-compatible alias for the previous name of this function when it lived
+# in mark_failed.py. Tests that mock or import `create_issue_platform_occurrence`
+# by path (e.g. via `mock.patch("...incident_occurrence.create_issue_platform_occurrence")`)
+# will continue to work without modification.
+#
+# Deprecated: use `create_incident_occurrence` directly.
+# TODO: Remove this alias once all references to `create_issue_platform_occurrence`
+# have been updated across tests and callers.
+create_issue_platform_occurrence = create_incident_occurrence🤖 Grapple PR auto-fix • major • Review this diff before applying |
||||||||||||||||||||||
| from sentry.issues.producer import PayloadType, produce_occurrence_to_kafka | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| monitor_env = failed_checkin.monitor_environment | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if monitor_env is None: | ||||||||||||||||||||||
| return | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| current_timestamp = datetime.now(timezone.utc) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Get last successful check-in to show in evidence display | ||||||||||||||||||||||
| last_successful_checkin_timestamp = "Never" | ||||||||||||||||||||||
| last_successful_checkin = monitor_env.get_last_successful_checkin() | ||||||||||||||||||||||
| if last_successful_checkin: | ||||||||||||||||||||||
| last_successful_checkin_timestamp = last_successful_checkin.date_added.isoformat() | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| occurrence = IssueOccurrence( | ||||||||||||||||||||||
| id=uuid.uuid4().hex, | ||||||||||||||||||||||
| resource_id=None, | ||||||||||||||||||||||
| project_id=monitor_env.monitor.project_id, | ||||||||||||||||||||||
| event_id=uuid.uuid4().hex, | ||||||||||||||||||||||
| fingerprint=[incident.grouphash], | ||||||||||||||||||||||
| type=MonitorIncidentType, | ||||||||||||||||||||||
| issue_title=f"Monitor failure: {monitor_env.monitor.name}", | ||||||||||||||||||||||
| subtitle="Your monitor has reached its failure threshold.", | ||||||||||||||||||||||
| evidence_display=[ | ||||||||||||||||||||||
| IssueEvidence( | ||||||||||||||||||||||
| name="Failure reason", | ||||||||||||||||||||||
| value=str(get_failure_reason(failed_checkins)), | ||||||||||||||||||||||
| important=True, | ||||||||||||||||||||||
| ), | ||||||||||||||||||||||
| IssueEvidence( | ||||||||||||||||||||||
| name="Environment", | ||||||||||||||||||||||
| value=monitor_env.get_environment().name, | ||||||||||||||||||||||
| important=False, | ||||||||||||||||||||||
| ), | ||||||||||||||||||||||
| IssueEvidence( | ||||||||||||||||||||||
| name="Last successful check-in", | ||||||||||||||||||||||
| value=last_successful_checkin_timestamp, | ||||||||||||||||||||||
| important=False, | ||||||||||||||||||||||
| ), | ||||||||||||||||||||||
| ], | ||||||||||||||||||||||
| evidence_data={}, | ||||||||||||||||||||||
| culprit="", | ||||||||||||||||||||||
| detection_time=current_timestamp, | ||||||||||||||||||||||
| level="error", | ||||||||||||||||||||||
| assignee=monitor_env.monitor.owner_actor, | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if failed_checkin.trace_id: | ||||||||||||||||||||||
| trace_id = failed_checkin.trace_id.hex | ||||||||||||||||||||||
| else: | ||||||||||||||||||||||
| trace_id = None | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| event_data = { | ||||||||||||||||||||||
| "contexts": {"monitor": get_monitor_environment_context(monitor_env)}, | ||||||||||||||||||||||
| "environment": monitor_env.get_environment().name, | ||||||||||||||||||||||
| "event_id": occurrence.event_id, | ||||||||||||||||||||||
| "fingerprint": [incident.grouphash], | ||||||||||||||||||||||
| "platform": "other", | ||||||||||||||||||||||
| "project_id": monitor_env.monitor.project_id, | ||||||||||||||||||||||
| # We set this to the time that the checkin that triggered the occurrence was written to relay if available | ||||||||||||||||||||||
| "received": (received if received else current_timestamp).isoformat(), | ||||||||||||||||||||||
| "sdk": None, | ||||||||||||||||||||||
| "tags": { | ||||||||||||||||||||||
| "monitor.id": str(monitor_env.monitor.guid), | ||||||||||||||||||||||
| "monitor.slug": str(monitor_env.monitor.slug), | ||||||||||||||||||||||
| "monitor.incident": str(incident.id), | ||||||||||||||||||||||
| }, | ||||||||||||||||||||||
| "timestamp": current_timestamp.isoformat(), | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if trace_id: | ||||||||||||||||||||||
| event_data["contexts"]["trace"] = {"trace_id": trace_id, "span_id": None} | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| produce_occurrence_to_kafka( | ||||||||||||||||||||||
| payload_type=PayloadType.OCCURRENCE, | ||||||||||||||||||||||
| occurrence=occurrence, | ||||||||||||||||||||||
| event_data=event_data, | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| HUMAN_FAILURE_STATUS_MAP: Mapping[int, _StrPromise] = { | ||||||||||||||||||||||
| CheckInStatus.ERROR: _("error"), | ||||||||||||||||||||||
| CheckInStatus.MISSED: _("missed"), | ||||||||||||||||||||||
| CheckInStatus.TIMEOUT: _("timeout"), | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Exists due to the vowel differences (A vs An) in the statuses | ||||||||||||||||||||||
| SINGULAR_HUMAN_FAILURE_MAP: Mapping[int, _StrPromise] = { | ||||||||||||||||||||||
| CheckInStatus.ERROR: _("An error check-in was detected"), | ||||||||||||||||||||||
| CheckInStatus.MISSED: _("A missed check-in was detected"), | ||||||||||||||||||||||
| CheckInStatus.TIMEOUT: _("A timeout check-in was detected"), | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]): | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
| Builds a humam readible string from a list of failed check-ins. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| "3 missed check-ins detected" | ||||||||||||||||||||||
| "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected" | ||||||||||||||||||||||
| "A failed check-in was detected" | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Documentation (confidence: 76%) Typo in docstring: 'humam readible' should be 'human readable' Evidence:
Agent: style There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 76% |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔵 INFO — Documentation (confidence: 82%) Typo in docstring: 'humam readible' should be 'human readable' Evidence:
Agent: style |
||||||||||||||||||||||
| status_counts = Counter( | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Documentation (confidence: 86%) Typo in docstring: 'humam readible' should be 'human readable' Evidence:
Agent: style There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 86% |
||||||||||||||||||||||
| checkin["status"] | ||||||||||||||||||||||
| for checkin in failed_checkins | ||||||||||||||||||||||
| if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys() | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — Edge case / KeyError risk (confidence: 100%) In get_failure_reason, when sum(status_counts.values()) == 1, the code accesses SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]. If a failed check-in has a status that is NOT in HUMAN_FAILURE_STATUS_MAP (filtered out on line 138), status_counts could be empty (sum == 0), and the code falls through to the get_text_list path which would produce an empty string. However, if a check-in has a status that IS in HUMAN_FAILURE_STATUS_MAP but NOT in SINGULAR_HUMAN_FAILURE_MAP, this would raise a KeyError. Currently the maps have identical keys so this is safe, but there's no defensive check. Evidence:
Agent: logic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
--- a/src/sentry/monitors/logic/incident_occurrence.py
+++ b/src/sentry/monitors/logic/incident_occurrence.py
@@ -139,10 +139,18 @@ def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]):
status_counts = Counter(
checkin["status"]
for checkin in failed_checkins
if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys()
)
+ total = sum(status_counts.values())
+
+ # No check-ins with recognized failure statuses; return a generic fallback
+ # rather than producing a malformed string like ' check-ins detected'.
+ if total == 0:
+ return _("A failed check-in was detected")
+
- if sum(status_counts.values()) == 1:
- return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]
+ if total == 1:
+ status_key = list(status_counts.keys())[0]
+ # Defensive .get() in case SINGULAR_HUMAN_FAILURE_MAP and
+ # HUMAN_FAILURE_STATUS_MAP ever diverge (they currently share keys).
+ return SINGULAR_HUMAN_FAILURE_MAP.get(
+ status_key, _("A failed check-in was detected")
+ )
human_status = get_text_list(
[🤖 Grapple PR auto-fix • major • Review this diff before applying |
||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if sum(status_counts.values()) == 1: | ||||||||||||||||||||||
| return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]] | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| human_status = get_text_list( | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Edge Case - Empty failed_checkins (confidence: 100%) The Evidence:
Agent: logic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 100% |
||||||||||||||||||||||
| [ | ||||||||||||||||||||||
| "%(count)d %(status)s" % {"count": count, "status": HUMAN_FAILURE_STATUS_MAP[status]} | ||||||||||||||||||||||
| for status, count in status_counts.items() | ||||||||||||||||||||||
| ], | ||||||||||||||||||||||
| last_word=_("and"), | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| return _("%(problem_checkins)s check-ins detected") % {"problem_checkins": human_status} | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| def get_monitor_environment_context(monitor_environment: MonitorEnvironment): | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Unused Variable (confidence: 100%) In Evidence:
Agent: logic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 100% |
||||||||||||||||||||||
| config = monitor_environment.monitor.config.copy() | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Code Organization (confidence: 100%) Inconsistent handling of monitor config in Evidence:
Agent: style There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 100% |
||||||||||||||||||||||
| if "schedule_type" in config: | ||||||||||||||||||||||
| config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display() | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| return { | ||||||||||||||||||||||
| "id": str(monitor_environment.monitor.guid), | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 CRITICAL — Data flow / Bug (confidence: 100%) get_monitor_environment_context builds a local Evidence:
Agent: logic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — Logic Error / Data Integrity (confidence: 100%) The Evidence:
Agent: security There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • critical • confidence: 100% There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • major • confidence: 100% There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — Logic Error / Potential KeyError or Misleading Output (confidence: 100%) get_failure_reason has two edge cases: (1) If all check-ins in failed_checkins have statuses not in HUMAN_FAILURE_STATUS_MAP, status_counts is empty, sum is 0, and the function falls through to produce a malformed/empty string like '0 check-ins detected'. (2) The docstring mentions 'A failed check-in was detected' as a possible output but this string does not exist in the code — it appears the original had a fallback that was not ported. Evidence:
Agent: security There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Logic Error / Stale Data (confidence: 100%) get_monitor_environment_context returns monitor_environment.monitor.config (the original dict) instead of the local Evidence:
Agent: security There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • major • confidence: 100% There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 100% |
||||||||||||||||||||||
| "slug": str(monitor_environment.monitor.slug), | ||||||||||||||||||||||
| "name": monitor_environment.monitor.name, | ||||||||||||||||||||||
| "config": monitor_environment.monitor.config, | ||||||||||||||||||||||
| "status": monitor_environment.get_status_display(), | ||||||||||||||||||||||
| "type": monitor_environment.monitor.get_type_display(), | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — Logic Error - Config Context Bug Preserved (confidence: 100%) The Evidence:
Agent: logic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • major • confidence: 100% |
||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,104 @@ | ||||||||
| from __future__ import annotations | ||||||||
|
|
||||||||
| import logging | ||||||||
| from datetime import datetime | ||||||||
| from typing import cast | ||||||||
|
|
||||||||
| from sentry.monitors.logic.incident_occurrence import create_incident_occurrence | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Module Boundaries / Import Direction (confidence: 100%) incidents.py imports directly from incident_occurrence.py, creating a fixed coupling direction. This is acceptable for now, but if incident_occurrence.py ever needs to reference incident logic, a circular dependency will result. Consider whether the call to Evidence:
Agent: architecture There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
--- a/src/sentry/monitors/logic/incidents.py
+++ b/src/sentry/monitors/logic/incidents.py
@@ -1,11 +1,11 @@
from __future__ import annotations
import logging
from datetime import datetime
-from typing import cast
+from typing import Callable, cast
-from sentry.monitors.logic.incident_occurrence import create_incident_occurrence
from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus
from sentry.monitors.types import SimpleCheckIn
logger = logging.getLogger(__name__)
def try_incident_threshold(
failed_checkin: MonitorCheckIn,
failure_issue_threshold: int,
received: datetime | None,
+ # Injected to avoid a coupling dependency between incidents.py and
+ # incident_occurrence.py. If incident_occurrence.py ever needs to reference
+ # incident state, a circular import would result. The orchestrating caller
+ # (mark_failed.py) is responsible for wiring this dependency.
+ create_occurrence: Callable = None,
) -> bool:
@@ -88,10 +88,13 @@
# Only create an occurrence if:
# - We have an active incident and fingerprint
# - The monitor and env are not muted
- if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident:
+ if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident and create_occurrence is not None:
checkins = MonitorCheckIn.objects.filter(id__in=[c["id"] for c in previous_checkins])
for checkin in checkins:
- create_incident_occurrence(
+ create_occurrence(
previous_checkins,
checkin,
incident,
received=received,
)🤖 Grapple PR auto-fix • minor • Review this diff before applying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Module boundaries / Dependency direction (confidence: 84%)
Evidence:
Agent: architecture There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
--- a/src/sentry/monitors/logic/incidents.py
+++ b/src/sentry/monitors/logic/incidents.py
@@ -1,10 +1,9 @@
from __future__ import annotations
import logging
from datetime import datetime
from typing import cast
-from sentry.monitors.logic.incident_occurrence import create_incident_occurrence
from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus
from sentry.monitors.types import SimpleCheckIn
@@ -14,7 +13,14 @@ logger = logging.getLogger(__name__)
def try_incident_threshold(
failed_checkin: MonitorCheckIn,
failure_issue_threshold: int,
received: datetime | None,
-) -> bool:
+) -> tuple[bool, list[SimpleCheckIn], MonitorIncident | None]:
+ """
+ Evaluate whether the failure threshold has been reached and return the
+ incident context needed to create occurrences.
+
+ Returns a 3-tuple of (threshold_reached, previous_checkins, incident).
+ The caller (mark_failed.py orchestrator) is responsible for calling
+ create_incident_occurrence so that incidents.py and incident_occurrence.py
+ remain independent of each other.
+ """
from sentry.signals import monitor_environment_failed
monitor_env = failed_checkin.monitor_environment
if monitor_env is None:
- return False
+ return False, [], None
# check to see if we need to update the status
if monitor_env.status in [MonitorStatus.OK, MonitorStatus.ACTIVE]:
@@ -63,7 +69,7 @@ def try_incident_threshold(
# get the active incident from the monitor environment
incident = monitor_env.active_incident
else:
# don't send occurrence for other statuses
- return False
+ return False, [], None
# Only create an occurrence if:
# - We have an active incident and fingerprint
# - The monitor and env are not muted
- if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident:
- checkins = MonitorCheckIn.objects.filter(id__in=[c["id"] for c in previous_checkins])
- for checkin in checkins:
- create_incident_occurrence(
- previous_checkins,
- checkin,
- incident,
- received=received,
- )
-
monitor_environment_failed.send(monitor_environment=monitor_env, sender=type(monitor_env))
- return True
+ # Return context to the orchestrator so it can call create_incident_occurrence
+ # without incidents.py taking a direct dependency on incident_occurrence.py.
+ if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident:
+ return True, previous_checkins, incident
+
+ return True, [], None🤖 Grapple PR auto-fix • minor • Review this diff before applying |
||||||||
| from sentry.monitors.models import CheckInStatus, MonitorCheckIn, MonitorIncident, MonitorStatus | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔵 INFO — Import Hygiene (confidence: 85%) The Evidence:
Agent: logic |
||||||||
| from sentry.monitors.types import SimpleCheckIn | ||||||||
|
|
||||||||
| logger = logging.getLogger(__name__) | ||||||||
|
|
||||||||
|
|
||||||||
| def try_incident_threshold( | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — Function rename changes public API (confidence: 94%) The original function was named Evidence:
Agent: logic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
--- a/src/sentry/monitors/logic/incidents.py
+++ b/src/sentry/monitors/logic/incidents.py
@@ -1,6 +1,8 @@
from __future__ import annotations
import logging
+import warnings
from datetime import datetime
from typing import cast
@@ -101,3 +103,23 @@ def try_incident_threshold(
monitor_environment_failed.send(monitor_environment=monitor_env, sender=type(monitor_env))
return True
+
+
+def mark_failed_threshold(
+ failed_checkin: MonitorCheckIn,
+ failure_issue_threshold: int,
+ received: datetime | None,
+) -> bool:
+ """
+ Deprecated: Use `try_incident_threshold` instead.
+
+ Backward-compatible alias preserved because external callers (tests,
+ signal handlers, other modules) may still reference the original name
+ from mark_failed.py. This alias ensures the refactor does not break
+ the public API contract while callers are migrated.
+ """
+ warnings.warn(
+ "mark_failed_threshold is deprecated; use try_incident_threshold instead.",
+ DeprecationWarning,
+ stacklevel=2,
+ )
+ return try_incident_threshold(failed_checkin, failure_issue_threshold, received)🤖 Grapple PR auto-fix • major • Review this diff before applying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — Import path changes breaking test mocks (confidence: 94%) The function Evidence:
Agent: logic |
||||||||
| failed_checkin: MonitorCheckIn, | ||||||||
| failure_issue_threshold: int, | ||||||||
| received: datetime | None, | ||||||||
| ) -> bool: | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — API Contract / Naming Inconsistency (confidence: 100%) The extracted function was renamed from Evidence:
Agent: architecture There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
--- a/src/sentry/monitors/logic/mark_failed.py
+++ b/src/sentry/monitors/logic/mark_failed.py
@@ -1,6 +1,8 @@
from __future__ import annotations
import logging
from datetime import datetime
+from sentry.monitors.logic.incidents import try_incident_threshold
+
logger = logging.getLogger(__name__)
@@ -10,6 +12,10 @@ logger = logging.getLogger(__name__)
+# Backward-compatibility alias: `mark_failed_threshold` was extracted and
+# renamed to `try_incident_threshold` in incidents.py. Any existing callers
+# (tests, tasks, other modules) that imported `mark_failed_threshold` from
+# this module will continue to work without modification.
+mark_failed_threshold = try_incident_threshold🤖 Grapple PR auto-fix • major • Review this diff before applying |
||||||||
| from sentry.signals import monitor_environment_failed | ||||||||
|
|
||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Naming conventions (confidence: 95%) Function name Evidence:
Agent: style There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 95% |
||||||||
| monitor_env = failed_checkin.monitor_environment | ||||||||
|
|
||||||||
| if monitor_env is None: | ||||||||
| return False | ||||||||
|
|
||||||||
| # check to see if we need to update the status | ||||||||
| if monitor_env.status in [MonitorStatus.OK, MonitorStatus.ACTIVE]: | ||||||||
| if failure_issue_threshold == 1: | ||||||||
| previous_checkins: list[SimpleCheckIn] = [ | ||||||||
| { | ||||||||
| "id": failed_checkin.id, | ||||||||
| "date_added": failed_checkin.date_added, | ||||||||
| "status": failed_checkin.status, | ||||||||
| } | ||||||||
| ] | ||||||||
| else: | ||||||||
| previous_checkins = cast( | ||||||||
| list[SimpleCheckIn], | ||||||||
| # Using .values for performance reasons | ||||||||
| MonitorCheckIn.objects.filter( | ||||||||
| monitor_environment=monitor_env, date_added__lte=failed_checkin.date_added | ||||||||
| ) | ||||||||
| .order_by("-date_added") | ||||||||
| .values("id", "date_added", "status"), | ||||||||
| ) | ||||||||
|
|
||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MINOR — Logic Error / Missing Filter (confidence: 100%) When Evidence:
Agent: security There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅
Suggested change
🤖 Grapple PR auto-fix • minor • confidence: 100% |
||||||||
| # reverse the list after slicing in order to start with oldest check-in | ||||||||
| previous_checkins = list(reversed(previous_checkins[:failure_issue_threshold])) | ||||||||
|
|
||||||||
| # If we have any successful check-ins within the threshold of | ||||||||
| # commits we have NOT reached an incident state | ||||||||
| if any([checkin["status"] == CheckInStatus.OK for checkin in previous_checkins]): | ||||||||
| return False | ||||||||
|
|
||||||||
| # change monitor status + update fingerprint timestamp | ||||||||
| monitor_env.status = MonitorStatus.ERROR | ||||||||
| monitor_env.save(update_fields=("status",)) | ||||||||
|
|
||||||||
| starting_checkin = previous_checkins[0] | ||||||||
|
|
||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 MAJOR — Logic Error / Potential IndexError (confidence: 100%) When failure_issue_threshold > 1, previous_checkins is populated from a DB query. If the query returns 0 rows (race condition, data deleted), the subsequent Evidence:
Agent: security There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
--- a/src/sentry/monitors/logic/incidents.py
+++ b/src/sentry/monitors/logic/incidents.py
@@ -37,16 +37,25 @@
else:
previous_checkins = cast(
list[SimpleCheckIn],
# Using .values for performance reasons
MonitorCheckIn.objects.filter(
monitor_environment=monitor_env, date_added__lte=failed_checkin.date_added
)
+ .exclude(status=CheckInStatus.OK)
.order_by("-date_added")
+ [: failure_issue_threshold]
.values("id", "date_added", "status"),
)
# reverse the list after slicing in order to start with oldest check-in
- previous_checkins = list(reversed(previous_checkins[:failure_issue_threshold]))
+ previous_checkins = list(reversed(previous_checkins))
- # If we have any successful check-ins within the threshold of
- # commits we have NOT reached an incident state
- if any([checkin["status"] == CheckInStatus.OK for checkin in previous_checkins]):
+ # If we don't have enough consecutive failures to meet the threshold,
+ # we have NOT reached an incident state.
+ # Also guard against an empty result (race condition / data deleted)
+ # which would cause IndexError at previous_checkins[0] below.
+ if len(previous_checkins) < failure_issue_threshold:
return False
# change monitor status + update fingerprint timestamp🤖 Grapple PR auto-fix • major • Review this diff before applying |
||||||||
| incident: MonitorIncident | None | ||||||||
| incident, _ = MonitorIncident.objects.get_or_create( | ||||||||
| monitor_environment=monitor_env, | ||||||||
| resolving_checkin=None, | ||||||||
| defaults={ | ||||||||
| "monitor": monitor_env.monitor, | ||||||||
| "starting_checkin_id": starting_checkin["id"], | ||||||||
| "starting_timestamp": starting_checkin["date_added"], | ||||||||
| }, | ||||||||
| ) | ||||||||
|
|
||||||||
| elif monitor_env.status == MonitorStatus.ERROR: | ||||||||
| # if monitor environment has a failed status, use the failed | ||||||||
| # check-in and send occurrence | ||||||||
| previous_checkins = [ | ||||||||
| { | ||||||||
| "id": failed_checkin.id, | ||||||||
| "date_added": failed_checkin.date_added, | ||||||||
| "status": failed_checkin.status, | ||||||||
| } | ||||||||
| ] | ||||||||
|
|
||||||||
| # get the active incident from the monitor environment | ||||||||
| incident = monitor_env.active_incident | ||||||||
| else: | ||||||||
| # don't send occurrence for other statuses | ||||||||
| return False | ||||||||
|
|
||||||||
| # Only create an occurrence if: | ||||||||
| # - We have an active incident and fingerprint | ||||||||
| # - The monitor and env are not muted | ||||||||
| if not monitor_env.monitor.is_muted and not monitor_env.is_muted and incident: | ||||||||
| checkins = MonitorCheckIn.objects.filter(id__in=[c["id"] for c in previous_checkins]) | ||||||||
| for checkin in checkins: | ||||||||
| create_incident_occurrence( | ||||||||
| previous_checkins, | ||||||||
| checkin, | ||||||||
| incident, | ||||||||
| received=received, | ||||||||
| ) | ||||||||
|
|
||||||||
| monitor_environment_failed.send(monitor_environment=monitor_env, sender=type(monitor_env)) | ||||||||
|
|
||||||||
| return True | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔵 INFO — Refactor Completeness (confidence: 100%) The function was renamed from Evidence:
Agent: logic |
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 MINOR — Unused import in types.py consumer (confidence: 88%)
incident_occurrence.py imports SimpleCheckIn from sentry.monitors.types, which is correct. However, incidents.py imports SimpleCheckIn from sentry.monitors.types as well (line 9). Both new modules correctly use the new type from types.py. This is consistent — noting for completeness that the type is not orphaned.
Evidence:
from sentry.monitors.types import SimpleCheckInfrom sentry.monitors.types import SimpleCheckInAgent: logic