ref(crons): Reorganize incident creation / issue occurrence logic#7
ref(crons): Reorganize incident creation / issue occurrence logic#7rupakInfinitiBit wants to merge 1 commit into
Conversation
…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527
🔍 Reviewing PR #7
Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527 ⏳ Analyzing changes — review incoming shortly. |
|
@rupakInfinitiBit 👋 I've started reviewing this pull request. I'll post a detailed review once I'm done — this may take a moment. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🔍 Automated PR Review
PR Title: ref(crons): Reorganize incident creation / issue occurrence logic (#8…
Overall Quality Score: 7/10
Recommendation: REQUEST_CHANGES
🧠 Summary
The code refactoring successfully reorganizes incident creation logic but contains several quality issues including typos, performance concerns with database queries, and missing documentation that should be addressed.
⚠️ Critical Issues
No critical issues detected in this review.
✅ Key Suggestions
• Fix typo in docstring: Change "humam" to "human" in the docstring at line 131 of incident_occurrence.py
• Optimize database queries: The loop creating occurrences performs multiple individual database queries which could cause performance issues. Consider batching these operations instead
• Add input validation: The failed_checkin parameter in create_incident_occurrence should be validated to ensure it's a valid MonitorCheckIn instance
• Improve function naming: Consider renaming try_incident_threshold to handle_incident_threshold or process_incident_threshold for better clarity about its actual behavior
• Use assigned variables: The config variable is assigned but not used - either utilize it in the return dictionary or remove the assignment
• Add comprehensive documentation: Both new modules and key functions are missing docstrings that would help developers understand their purpose and usage
• Modernize string formatting: Replace old-style string formatting with f-strings for better readability and consistency
There was a problem hiding this comment.
✅ Approved
████████░░ 8.5/10 · 4 files reviewed · 1 issues found
Key concerns:
- Minor typo in docstring that should be corrected
📝 Walkthrough (4 files)
| File | Changes | Summary |
|---|---|---|
src/sentry/monitors/logic/incident_occurrence.py |
✏️ +171/-0 | New module extracting incident occurrence creation logic from mark_failed.py, with proper separation of concerns for issue platform integration. |
src/sentry/monitors/logic/incidents.py |
✏️ +104/-0 | New module extracting incident threshold logic from mark_failed.py, handling monitor status transitions and incident creation. |
src/sentry/monitors/logic/mark_failed.py |
✏️ +4/-264 | Significantly refactored to focus on core failure marking logic, delegating incident handling to the new incidents module. |
src/sentry/monitors/types.py |
✏️ +10/-0 | Added SimpleCheckIn TypedDict to centralize the type definition previously inline in mark_failed.py. |
🔍 Issues
🔵 src/sentry/monitors/logic/incident_occurrence.py
-
🔵 Line 130 — Typo in docstring
There's a typo in the docstring: 'humam readible' should be 'human readable'.
💡 Suggestion: Builds a human readable string from a list of failed check-ins.
✨ Strengths
- ✅ Excellent separation of concerns improving code organization
- ✅ Clean refactoring with no functional changes
- ✅ Proper type definitions and imports maintained throughout
🤖 PRFlow · 🔴 Critical 🟠 Important 🟡 Suggestion 🔵 Minor ℹ️ Note
🔍 Reviewing PR #7
⏳ Analyzing changes — review incoming shortly. |
There was a problem hiding this comment.
🔧 Changes Requested
██████░░░░ 6.5/10 · 4 files reviewed · 6 issues found
Key concerns:
- Truncated function call in produce_occurrence_to_kafka will cause syntax error
- Incomplete incident creation logic in incidents.py
- Missing error handling for empty failure counters
- Lack of trace_id validation before calling .hex method
📝 Walkthrough (4 files)
| File | Changes | Summary |
|---|---|---|
src/sentry/monitors/logic/incident_occurrence.py |
✏️ +171/-0 | New module for handling incident occurrence creation with issue evidence, monitoring context, and Kafka event production. |
src/sentry/monitors/logic/incidents.py |
✏️ +104/-0 | New module containing incident threshold evaluation logic extracted from mark_failed.py. |
src/sentry/monitors/logic/mark_failed.py |
✏️ +4/-264 | Refactored to extract incident and occurrence logic into separate modules, keeping only the core failure marking functionality. |
src/sentry/monitors/types.py |
✏️ +10/-0 | Added SimpleCheckIn TypedDict to centralize type definitions for monitor check-ins. |
🔍 Issues
🟠 src/sentry/monitors/logic/incident_occurrence.py
-
🟡 Line 87 — Missing trace_id validation
The code checks if failed_checkin.trace_id exists but doesn't validate that it's a proper UUID format before calling .hex. This could raise an AttributeError if trace_id is not a UUID object.
if failed_checkin.trace_id and hasattr(failed_checkin.trace_id, 'hex'):
trace_id = failed_checkin.trace_id.hex
else:
trace_id = None
-
🟠 Line 118 — Truncated function call in produce_occurrence_to_kafka
The function call to produce_occurrence_to_kafka appears to be truncated at 'even...' which would cause a syntax error. The 'event_data' parameter is incomplete.
produce_occurrence_to_kafka(
payload_type=PayloadType.OCCURRENCE,
occurrence=occurrence,
event_data=event_data,
)
-
🟡 Line 149 — Potential KeyError in failure_reason_counter access
The function accesses failure_reason_counter.most_common(1)[0] without checking if the counter is empty, which could raise an IndexError if failed_checkins is empty.
most_common = failure_reason_counter.most_common(1)
if not most_common:
return _("Unknown")
return most_common[0][0]
🟡 src/sentry/monitors/logic/incidents.py
-
🟡 Line 65 — Incomplete incident creation logic
The incident creation logic appears to be truncated with 'incident, _' assignment incomplete. This will cause a syntax error and the function won't work properly.
incident, created = MonitorIncident.objects.get_or_create(
monitor_environment=monitor_env,
resolving_checkin=None,
defaults={
"starting_checkin_id": starting_checkin["id"],
"starting_timestamp": starting_checkin["date_added"],
"grouphash": monitor_env.monitor.get_incident_grouphash(),
},
)
🔵 src/sentry/monitors/logic/mark_failed.py
-
🔵 Line 79 — Good separation of concerns
The refactoring successfully separates the incident threshold logic into its own module, making the mark_failed function more focused and maintainable.
ℹ️ src/sentry/monitors/types.py
-
ℹ️ Line 10 — Good type definition placement
Moving the SimpleCheckIn TypedDict to the types module improves code organization and makes the type available for reuse across modules.
✨ Strengths
- ✅ Good separation of concerns by extracting incident logic into dedicated modules
- ✅ Improved code organization with centralized type definitions
- ✅ Maintains existing functionality while improving maintainability
🤖 PRFlow · 🔴 Critical 🟠 Important 🟡 Suggestion 🔵 Minor ℹ️ Note
| trace_id = failed_checkin.trace_id.hex | ||
| else: | ||
| trace_id = None | ||
|
|
There was a problem hiding this comment.
🔧 Improvement | 🟡 Suggestion
Missing trace_id validation
The code checks if failed_checkin.trace_id exists but doesn't validate that it's a proper UUID format before calling .hex. This could raise an AttributeError if trace_id is not a UUID object.
| if failed_checkin.trace_id and hasattr(failed_checkin.trace_id, 'hex'): | |
| trace_id = failed_checkin.trace_id.hex | |
| else: | |
| trace_id = None |
🤖 Prompt for AI Agents
In `@src/sentry/monitors/logic/incident_occurrence.py` around line 87, The code checks if failed_checkin.trace_id exists but doesn't validate that it's a proper UUID format before calling .hex. This could raise an AttributeError if trace_id is not a UUID object.; apply the proposed fix to resolve this error handling.
|
|
||
| HUMAN_FAILURE_STATUS_MAP: Mapping[int, _StrPromise] = { | ||
| CheckInStatus.ERROR: _("error"), | ||
| CheckInStatus.MISSED: _("missed"), |
There was a problem hiding this comment.
Important: Truncated function call in produce_occurrence_to_kafka
The function call to produce_occurrence_to_kafka appears to be truncated at 'even...' which would cause a syntax error. The 'event_data' parameter is incomplete.
| CheckInStatus.MISSED: _("missed"), | |
| produce_occurrence_to_kafka( | |
| payload_type=PayloadType.OCCURRENCE, | |
| occurrence=occurrence, | |
| event_data=event_data, | |
| ) |
🤖 Prompt for AI Agents
In `@src/sentry/monitors/logic/incident_occurrence.py` around line 118, The function call to produce_occurrence_to_kafka appears to be truncated at 'even...' which would cause a syntax error. The 'event_data' parameter is incomplete.; apply the proposed fix to resolve this bug.
| return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]] | ||
|
|
||
| human_status = get_text_list( | ||
| [ |
There was a problem hiding this comment.
🔧 Improvement | 🟡 Suggestion
Potential KeyError in failure_reason_counter access
The function accesses failure_reason_counter.most_common(1)[0] without checking if the counter is empty, which could raise an IndexError if failed_checkins is empty.
| [ | |
| most_common = failure_reason_counter.most_common(1) | |
| if not most_common: | |
| return _("Unknown") | |
| return most_common[0][0] |
🤖 Prompt for AI Agents
In `@src/sentry/monitors/logic/incident_occurrence.py` around line 149, The function accesses failure_reason_counter.most_common(1)[0] without checking if the counter is empty, which could raise an IndexError if failed_checkins is empty.; apply the proposed fix to resolve this logic.
| incident, _ = MonitorIncident.objects.get_or_create( | ||
| monitor_environment=monitor_env, | ||
| resolving_checkin=None, | ||
| defaults={ |
There was a problem hiding this comment.
🔧 Improvement | 🟡 Suggestion
Incomplete incident creation logic
The incident creation logic appears to be truncated with 'incident, _' assignment incomplete. This will cause a syntax error and the function won't work properly.
| defaults={ | |
| incident, created = MonitorIncident.objects.get_or_create( | |
| monitor_environment=monitor_env, | |
| resolving_checkin=None, | |
| defaults={ | |
| "starting_checkin_id": starting_checkin["id"], | |
| "starting_timestamp": starting_checkin["date_added"], | |
| "grouphash": monitor_env.monitor.get_incident_grouphash(), | |
| }, | |
| ) |
🤖 Prompt for AI Agents
In `@src/sentry/monitors/logic/incidents.py` around line 65, The incident creation logic appears to be truncated with 'incident, _' assignment incomplete. This will cause a syntax error and the function won't work properly.; apply the proposed fix to resolve this error handling.
Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module