Replays Self-Serve Bulk Delete System#5
Conversation
This validates both the [Working Draft](https://www.w3.org/TR/reporting-1/#concept-reports) and the [Editor's Draft](https://w3c.github.io/reporting/#concept-reports) formats. Fixes [ID-730 - Accept current and upcoming data model](https://linear.app/getsentry/issue/ID-730/accept-current-and-upcoming-data-model).
…o 'low' (#93927)" This reverts commit 8d04522. Co-authored-by: roaga <47861399+roaga@users.noreply.github.com>
Missed in the initial commit, leading to some relevant logs being unannotated.
We have had a few tasks get killed at 10% rollout.
Also add a test, so that this doesn't happen again
Fixes DE-129 and DE-156 --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>
These transitions should be matching
…` (#93946) Use `project_id` on the replay record instead of the URL (where it does not always exist). --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
Also fixed `replay.view_html` -> `replay.view-html` --------- Co-authored-by: Michelle Zhang <56095982+michellewzhang@users.noreply.github.com>
…948) gets `npx @typescript/native-preview` passing again
This adds mode for all things tracing. This encompasses transactions/metrics/spans. Taken from https://github.com/getsentry/sentry/blob/feeaf393deeca8b97675bff23039c6320270aab5/src/sentry/runner/commands/devserver.py#L370
The conditions associated with a DCG can change over time, and it's good if we can be completely confident that they're consistent within a given task execution.
This is unused and most regex experiments have required broader changes to ensure that regexes are evaluated in a specific order (ex: traceparent). Removing this for now to simplify the code and very slightly improve runtime performance.
From some testing (on feedback lists of all different lengths), this prompt seems to work better. It doesn't write overly long sentences and also does a better job at "summarizing" versus just mentioning a few specific topics and leaving out others.
Just remove a couple custom Flex* classes in favor of the Flex primitive
This has been killed a few times. Refs SENTRY-42M7
…n table (#93892) <!-- Describe your PR here. --> [ticket](https://linear.app/getsentry/issue/ID-156/grouping-info-remove-type-field-from-ui) The Type field in the Grouping Info section of the issue details page was redundant. This removes the Type row from all variant types while keeping the underlying data structure intact. before  after 
### Changes Related to this PR: getsentry/sentry#93810. This is part 1 of the change, which is pulling out the new component and just adding it to the repo. Also includes some simplification of the logic in the base component. Part 2 will be replacing tables in widgets. ### Before/After There is no UI change as the table is not being used yet. There is a new story page for the component.
…93943) to prevent this issue from becoming too noisy, add a noise config
Unfortunately, 'event_data' went from being the variable for current event context to being the complete parsed data from Redis, and we continued logging it per group. That's more data than we should be logging even arguably once, let alone per group.
Co-authored-by: Abdullah Khan <abdullahkhan@PG9Y57YDXQ.local>
Adds some simple analytics to our endpoint so we can begin building a dashboard in Amplitude.
Previously, explore supported multiple y axis per chart, so each visualize supported multiple y axis. That functionality has since been removed for simplicity so update the types here to match. Keep in mind that saved queries still store them as an array so when serializing/deserializing, we still need to treat it as an array.
We'll need the `useGetTraceItemAttributeKeys` hook in other places so refactoring it so that it can exported.
- getsentry/sentry#93894 removed usage - getsentry/sentry-options-automator#4243 removed the last override
When the max segment ID is null the process fails. We should exit early since if there aren't any segments to delete there's nothing to do.
|
Claude finished @ShashankFC's task in 2m 19s —— View job Code Review CompleteI've reviewed this PR for the Replays Self-Serve Bulk Delete System. This is a large PR with 100 files changed (+2312/-981 lines) across multiple areas of the codebase. Below are my findings organized by severity. Critical Issues1. Security: Unbounded Thread Pool in Replay DeletionFile: with cf.ThreadPoolExecutor(max_workers=100) as pool:
pool.map(_delete_if_exists, _make_recording_filenames(project_id, row))Issue: The thread pool creates up to 100 threads per replay deletion. For replays with many segments, this could exhaust system resources. If multiple deletions happen concurrently, this could create thousands of threads. Recommendation: Limit max_workers to a more reasonable number (e.g., 10-20) and rely on batching multiple replays instead of parallel thread execution per replay. 2. Potential Data Loss: Silent Exception HandlingFile: def _delete_if_exists(filename: str) -> None:
"""Delete the blob if it exists or silence the 404."""
try:
storage_kv.delete(filename)
except NotFound:
passIssue: This function silences ALL exceptions from storage_kv.delete(), not just NotFound. Other failures (permissions, network errors, etc.) will be silently ignored, potentially leaving orphaned data. Recommendation: Be more specific with exception handling: def _delete_if_exists(filename: str) -> None:
try:
storage_kv.delete(filename)
except NotFound:
pass # File doesn't exist, which is fine
except Exception as e:
logger.warning("Failed to delete replay segment", extra={"filename": filename, "error": str(e)})
sentry_sdk.capture_exception(e)3. TODO in Production Code with Detector LogicFile: if (
evaluation.logic_result and workflow_id is None
): # TODO: detector trigger passes. do something like create issue
passIssue: This TODO indicates incomplete detector trigger logic. When a detector condition passes but there's no workflow_id, nothing happens. This could be a silent failure case. Recommendation: Either implement the detector trigger logic or add logging/metrics to track when this condition occurs. High Priority Issues4. Performance: N+1 Query Pattern in Delayed WorkflowFile: Organization.objects.filter(project__id=project_id).get().idIssue: This query is executed inside a retry loop within Recommendation: Pass organization_id as a parameter or cache it to avoid repeated queries. 5. XXX Comments Indicating Technical DebtMultiple XXX comments in the codebase indicate areas needing attention: File: # XXX: I have to deserialize this request so it can be "automatically" reserialized by the
# paginate method. This is less than ideal.
return json.loads(make_seer_request(request_data).decode("utf-8"))File: # XXX: Request isn't streaming. Limitation of Seer authentication. Would be much faster if we
# could stream the request data since the GCS download will (likely) dominate latency.These indicate performance and architecture concerns that should be tracked. 6. Missing Permission Checks in Replay EndpointsFile: The endpoint checks for feature flags but doesn't appear to check user permissions beyond basic authentication. Ensure replay access permissions are properly validated. Medium Priority Issues7. Browser Reporting Validation Logic IssueFile: def validate_timestamp(self, value: int) -> int:
"""Validate that age is absent, but timestamp is present."""
if self.initial_data.get("age"):
raise serializers.ValidationError("If timestamp is present, age must be absent")
return value
def validate_age(self, value: int) -> int:
"""Validate that age is present, but not timestamp."""
if self.initial_data.get("timestamp"):
raise serializers.ValidationError("If age is present, timestamp must be absent")
return valueIssue: Both validators raise errors if the other field is present, which will result in duplicate error messages when both fields are provided (as shown in test line 168-175). This is confusing for API consumers. Recommendation: Use a single 8. Inconsistent Error Handling in Replay SummarizationFile: The 9. Hardcoded Sample RateFile: sample_rate=1.0, # XXX: Remove this once we have a ballpark figureThis 100% sampling rate should be reduced once initial data collection is complete to avoid metric explosion. Low Priority / Code Quality Issues10. TODO Comments Throughout CodebaseMultiple TODO comments indicate future work:
11. Deprecated Test FileFile: Good - this test file was properly deleted, indicating the migration is now stable. 12. Type Safety in EventInstance ValidationFile: The validator for Positive Observations
Testing Recommendations
SummaryThis PR implements a significant feature for replay deletion but has some critical issues around resource management and error handling that should be addressed before merging. The most important fixes are:
The codebase shows good practices overall with feature flags, comprehensive testing, and proper logging, but the technical debt indicated by XXX/TODO comments should be tracked and addressed. |
Test 5nn---nReplicated from ai-code-review-evaluation/sentry-greptile#5