Database Performance Optimizations#4
Conversation
… deadlocks on MySQL (#80329) * Split subquery when cleaning annotations * update comment * Raise batch size, now that we pay attention to it * Iterate in batches * Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use * Use split-out utility in outer batching loop so it respects context cancellation * guard against empty queries * Use SQL parameters * Use same approach for tags * drop unused function * Work around parameter limit on sqlite for large batches * Bulk insert test data in DB * Refactor test to customise test data creation * Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit * Turn annotation cleanup test to integration tests * lint --------- Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
|
@CodeAnt-AI: review |
|
CodeAnt AI is running the review. |
Nitpicks 🔍
|
| r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err) | ||
|
|
||
| affected, err := r.executeUntilDoneOrCancelled(ctx, sql) | ||
| x, y := r.deleteByIDs(ctx, "annotation", ids) | ||
| r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y) |
There was a problem hiding this comment.
Suggestion: In the time-based cleanup branch, every batch logs the full slice of annotation IDs at error level even on successful execution; with large batch sizes this can generate extremely large log entries and unnecessary error-level noise, impacting performance and log usability. The fix is to log only aggregate information (like counts and conditions), use debug-level for normal operation, and reserve error-level logging for actual errors from the delete call. [performance]
Severity Level: Major ⚠️
- ⚠️ Annotation cleanup job emits frequent error-level log entries.
- ⚠️ Large ID slices increase logging overhead and allocations.
- ⚠️ Error logs polluted, obscuring genuine annotation cleanup failures.| r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err) | |
| affected, err := r.executeUntilDoneOrCancelled(ctx, sql) | |
| x, y := r.deleteByIDs(ctx, "annotation", ids) | |
| r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y) | |
| r.log.Debug("Annotations to clean by time", "count", len(ids), "cond", cond) | |
| x, y := r.deleteByIDs(ctx, "annotation", ids) | |
| if y != nil { | |
| r.log.Error("Failed to clean annotations by time", "count", len(ids), "err", y) | |
| } else { | |
| r.log.Debug("Cleaned annotations by time", "count", len(ids), "affected", x) | |
| } |
Steps of Reproduction ✅
1. Execute `TestIntegrationAnnotationCleanUp` in
`pkg/services/annotations/annotationsimpl/cleanup_test.go:18-149`, focusing on the test
case `"should remove annotations created before cut off point"` defined at lines 53-66,
which configures non-zero `MaxAge` and sets up cleanup batch size.
2. Inside that test, `createTestAnnotations` at `cleanup_test.go:232-292` bulk-inserts
annotations into the `annotation` table, then `ProvideCleanupService` is called at
`cleanup_test.go:131` to construct a `CleanupServiceImpl` with a real `xormRepositoryImpl`
store.
3. The test calls `cleaner.Run` at `cleanup_test.go:132`, which in
`CleanupServiceImpl.Run` (`cleanup.go:35-57`) invokes `cs.store.CleanAnnotations(ctx,
cfg.AlertingAnnotationCleanupSetting, alertAnnotationType)` when `MaxAge > 0`, entering
`xormRepositoryImpl.CleanAnnotations` at `xorm_store.go:519`.
4. In `CleanAnnotations`, the time-based branch at `xorm_store.go:521-544` calls
`untilDoneOrCancelled` with a batch callback defined at `xorm_store.go:528-539`; on each
batch, after fetching IDs at `xorm_store.go:530-533`, the logger at `xorm_store.go:534`
emits an error-level log `"Annotations to clean by time"` including the full `ids` slice,
and after deletion at `xorm_store.go:536` it logs `"cleaned annotations by time"` at
`xorm_store.go:537` as another error-level entry, even when `err`/`y` are nil. With many
annotations and configured batch size (as in the test), this produces repeated, large
error logs containing all IDs under completely normal operation.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** pkg/services/annotations/annotationsimpl/xorm_store.go
**Line:** 534:537
**Comment:**
*Performance: In the time-based cleanup branch, every batch logs the full slice of annotation IDs at error level even on successful execution; with large batch sizes this can generate extremely large log entries and unnecessary error-level noise, impacting performance and log usability. The fix is to log only aggregate information (like counts and conditions), use debug-level for normal operation, and reserve error-level logging for actual errors from the delete call.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| r.log.Error("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond, "err", err) | ||
|
|
||
| x, y := r.deleteByIDs(ctx, "annotation", ids) | ||
| r.log.Error("cleaned annotations by count", "count", len(ids), "affected", x, "err", y) |
There was a problem hiding this comment.
Suggestion: In the max-count cleanup branch, the code again logs the entire slice of annotation IDs at error level for every batch, which can be tens of thousands of IDs and represents normal operation rather than an error; this will bloat logs and slow cleanup. Adjust the logging to only record counts/conditions at debug level and use error logging only when the delete operation actually fails. [performance]
Severity Level: Major ⚠️
- ⚠️ Max-count cleanup emits repeated error-level logs per batch.
- ⚠️ Logging all IDs increases CPU and log volume.
- ⚠️ Genuine annotation-count issues harder to spot in logs.| r.log.Error("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond, "err", err) | |
| x, y := r.deleteByIDs(ctx, "annotation", ids) | |
| r.log.Error("cleaned annotations by count", "count", len(ids), "affected", x, "err", y) | |
| r.log.Debug("Annotations to clean by count", "count", len(ids), "cond", cond) | |
| x, y := r.deleteByIDs(ctx, "annotation", ids) | |
| if y != nil { | |
| r.log.Error("Failed to clean annotations by count", "count", len(ids), "err", y) | |
| } else { | |
| r.log.Debug("Cleaned annotations by count", "count", len(ids), "affected", x) | |
| } |
Steps of Reproduction ✅
1. Execute `TestIntegrationAnnotationCleanUp` in
`pkg/services/annotations/annotationsimpl/cleanup_test.go:18-149`, focusing on the test
case `"should only keep three annotations"` at lines 68-81, which configures `MaxCount=3`
in `settingsFn` and sets `annotationCleanupJobBatchSize` at line 71.
2. As in the test, `createTestAnnotations` (`cleanup_test.go:232-292`) inserts many
annotations, and `ProvideCleanupService` is used at `cleanup_test.go:131` to create a
`CleanupServiceImpl` backed by `xormRepositoryImpl`.
3. The call `cleaner.Run(context.Background(), test.cfg)` at `cleanup_test.go:132` invokes
`CleanupServiceImpl.Run` (`cleanup.go:35-57`), which, for this config, calls
`cs.store.CleanAnnotations` with `cfg.AlertingAnnotationCleanupSetting`,
`cfg.APIAnnotationCleanupSettings`, and `cfg.DashboardAnnotationCleanupSettings` where
`MaxCount > 0`, entering `xormRepositoryImpl.CleanAnnotations` at `xorm_store.go:519`.
4. In `CleanAnnotations`, the count-based branch at `xorm_store.go:546-564` executes
`untilDoneOrCancelled` with a batch callback at `xorm_store.go:548-559`; on each batch,
after `r.fetchIDs` at `xorm_store.go:550-553`, the logger at `xorm_store.go:554` logs
`"Annotations to clean by count"` at error level including the full `ids` slice and
condition, then after `r.deleteByIDs` at `xorm_store.go:556-557` another error-level
`"cleaned annotations by count"` log is emitted. With many annotations and small
`MaxCount`, multiple batches are processed, so normal operation produces many large
error-level log entries containing all IDs.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** pkg/services/annotations/annotationsimpl/xorm_store.go
**Line:** 554:557
**Comment:**
*Performance: In the max-count cleanup branch, the code again logs the entire slice of annotation IDs at error level for every batch, which can be tens of thousands of IDs and represents normal operation rather than an error; this will bloat logs and slow cleanup. Adjust the logging to only record counts/conditions at debug level and use error logging only when the delete operation actually fails.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| r.log.Error("Tags to clean", "count", len(ids), "ids", ids, "cond", cond, "err", err) | ||
|
|
||
| x, y := r.deleteByIDs(ctx, "annotation_tag", ids) | ||
| r.log.Error("cleaned tags", "count", len(ids), "affected", x, "err", y) |
There was a problem hiding this comment.
Suggestion: The orphaned-tag cleanup also logs the entire list of tag IDs at error level during normal operation, which for large batches can flood logs and slow the job without any actual error; it should instead log summary information at debug level and only emit error-level logs when the delete operation returns an error. This keeps logging proportional and prevents performance degradation from stringifying huge ID slices. [performance]
Severity Level: Major ⚠️
- ⚠️ Orphaned-tag cleanup logs large ID lists as errors.
- ⚠️ Normal tag maintenance floods logs with error entries.
- ⚠️ Tag-related failures harder to distinguish from noise.| r.log.Error("Tags to clean", "count", len(ids), "ids", ids, "cond", cond, "err", err) | |
| x, y := r.deleteByIDs(ctx, "annotation_tag", ids) | |
| r.log.Error("cleaned tags", "count", len(ids), "affected", x, "err", y) | |
| r.log.Debug("Tags to clean", "count", len(ids), "cond", cond) | |
| x, y := r.deleteByIDs(ctx, "annotation_tag", ids) | |
| if y != nil { | |
| r.log.Error("Failed to clean tags", "count", len(ids), "err", y) | |
| } else { | |
| r.log.Debug("Cleaned tags", "count", len(ids), "affected", x) | |
| } |
Steps of Reproduction ✅
1. Run `TestIntegrationAnnotationCleanUp` in
`pkg/services/annotations/annotationsimpl/cleanup_test.go:18-149` with any case where
annotations are actually deleted (e.g., `"should remove annotations created before cut off
point"` at lines 53-66 or `"should only keep three annotations"` at lines 68-81).
2. The test inserts annotations and corresponding tag rows via `createTestAnnotations`
(`cleanup_test.go:232-292`), then constructs a `CleanupServiceImpl` with
`ProvideCleanupService` (`cleanup_test.go:131`) and calls `cleaner.Run` at
`cleanup_test.go:132`.
3. In `CleanupServiceImpl.Run` (`cleanup.go:35-57`), after three calls to
`cs.store.CleanAnnotations` for different annotation types, if `totalCleanedAnnotations >
0` the code at `cleanup.go:54-56` calls `cs.store.CleanOrphanedAnnotationTags(ctx)`, which
dispatches to `xormRepositoryImpl.CleanOrphanedAnnotationTags` at `xorm_store.go:569`.
4. `CleanOrphanedAnnotationTags` uses `untilDoneOrCancelled` (`xorm_store.go:569-582`)
with a batch callback that builds `cond` at `xorm_store.go:571`, then calls `r.fetchIDs`
at `xorm_store.go:572-575`; for each batch, the logger at `xorm_store.go:576` logs `"Tags
to clean"` at error level including the entire `ids` slice and condition, followed by
another error-level `"cleaned tags"` log at `xorm_store.go:579` after `r.deleteByIDs`.
When many orphaned tags exist (e.g., as created in the integration test), this results in
multiple large error-level log entries during normal tag cleanup.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** pkg/services/annotations/annotationsimpl/xorm_store.go
**Line:** 576:579
**Comment:**
*Performance: The orphaned-tag cleanup also logs the entire list of tag IDs at error level during normal operation, which for large batches can flood logs and slow the job without any actual error; it should instead log summary information at debug level and only emit error-level logs when the delete operation returns an error. This keeps logging proportional and prevents performance degradation from stringifying huge ID slices.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished running the review. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions! |
User description
Test 7nnn## Summary by CodeRabbitnn* Bug Fixesn * Improved annotation cleanup reliability by implementing a batching mechanism to prevent deadlocks during bulk deletion operations.nn* Choresn * Increased cleanup task frequency from 10-minute to 1-minute intervals for faster maintenance of stale annotations.nn✏️ Tip: You can customize this high-level summary in your review settings.nnn---nReplicated from ai-code-review-evaluation/grafana-coderabbit#7
CodeAnt-AI Description
Avoid deadlocks during annotation cleanup and run cleanup more frequently
What Changed
Impact
✅ Fewer deadlocks during annotation cleanup✅ Clearer cleanup behavior for very large SQLite batches✅ Shorter time to remove stale annotations💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.