Skip to content

Database Performance Optimizations#4

Closed
ShashankFC wants to merge 1 commit into
db-cleanup-baselinefrom
db-cleanup-optimized
Closed

Database Performance Optimizations#4
ShashankFC wants to merge 1 commit into
db-cleanup-baselinefrom
db-cleanup-optimized

Conversation

@ShashankFC

@ShashankFC ShashankFC commented Feb 13, 2026

Copy link
Copy Markdown

User description

Test 7nnn## Summary by CodeRabbitnn* Bug Fixesn * Improved annotation cleanup reliability by implementing a batching mechanism to prevent deadlocks during bulk deletion operations.nn* Choresn * Increased cleanup task frequency from 10-minute to 1-minute intervals for faster maintenance of stale annotations.nn✏️ Tip: You can customize this high-level summary in your review settings.nnn---nReplicated from ai-code-review-evaluation/grafana-coderabbit#7


CodeAnt-AI Description

Avoid deadlocks during annotation cleanup and run cleanup more frequently

What Changed

  • Annotation cleanup now loads IDs in bounded batches and deletes by ID lists instead of using nested batched subqueries, preventing deadlocks with concurrent inserts.
  • Cleanup of orphaned annotation tags uses the same batched-ID approach and shares the same robust loop-until-done behavior.
  • SQLite parameter limits are handled: very large batches either split into multiple statements or inline IDs when required, so cleanup does not fail for large datasets.
  • The global cleanup loop runs every 1 minute instead of every 10 minutes.
  • Tests were converted to integration tests, create annotations/tags in batched inserts, and include a case for very large batches over SQLite parameter limits.

Impact

✅ Fewer deadlocks during annotation cleanup
✅ Clearer cleanup behavior for very large SQLite batches
✅ Shorter time to remove stale annotations

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

… deadlocks on MySQL (#80329)

* Split subquery when cleaning annotations

* update comment

* Raise batch size, now that we pay attention to it

* Iterate in batches

* Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use

* Use split-out utility in outer batching loop so it respects context cancellation

* guard against empty queries

* Use SQL parameters

* Use same approach for tags

* drop unused function

* Work around parameter limit on sqlite for large batches

* Bulk insert test data in DB

* Refactor test to customise test data creation

* Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit

* Turn annotation cleanup test to integration tests

* lint

---------

Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
@ShashankFC

Copy link
Copy Markdown
Author

@CodeAnt-AI: review

@codeant-ai

codeant-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

CodeAnt AI is running the review.

@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Feb 13, 2026
@codeant-ai

codeant-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • SQLite parameter limit
    deleteByIDs decides whether to inline IDs into the SQL based on the configured batch size (r.cfg.AnnotationCleanupJobBatchSize) instead of the actual len(ids). This can cause incorrect behavior when the actual number of ids in the batch differs from the configured batch size; for SQLite the correct check should be against the actual slice length (len(ids)) and the SQLite parameter limit.

  • Resource Leak
    The new tighter cleanup frequency (1 minute) creates a ticker in Run but does not stop it. If Run ever returns, the ticker should be stopped (ticker.Stop()) to avoid leaking the underlying timer/goroutine.

  • Fragile test ID handling
    The test helper createTestAnnotations now sets ID manually on each annotations.Item and uses InsertMulti. Depending on the DB dialect and how multi-inserts treat explicit ID fields vs. autoincrement, this can produce flaky tests or wrong annotation_tag.annotation_id values. Tests should avoid assuming write semantics of InsertMulti across DBs or should explicitly query for inserted IDs.

  • Log level / noisy logging
    Several normal control-flow debug messages use r.log.Error(...) (e.g., when reporting counts of IDs to delete). This may flood error logs with expected behavior; consider using Debug or Info levels for these diagnostic messages.

Comment on lines +534 to +537
r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)

affected, err := r.executeUntilDoneOrCancelled(ctx, sql)
x, y := r.deleteByIDs(ctx, "annotation", ids)
r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: In the time-based cleanup branch, every batch logs the full slice of annotation IDs at error level even on successful execution; with large batch sizes this can generate extremely large log entries and unnecessary error-level noise, impacting performance and log usability. The fix is to log only aggregate information (like counts and conditions), use debug-level for normal operation, and reserve error-level logging for actual errors from the delete call. [performance]

Severity Level: Major ⚠️
- ⚠️ Annotation cleanup job emits frequent error-level log entries.
- ⚠️ Large ID slices increase logging overhead and allocations.
- ⚠️ Error logs polluted, obscuring genuine annotation cleanup failures.
Suggested change
r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)
affected, err := r.executeUntilDoneOrCancelled(ctx, sql)
x, y := r.deleteByIDs(ctx, "annotation", ids)
r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y)
r.log.Debug("Annotations to clean by time", "count", len(ids), "cond", cond)
x, y := r.deleteByIDs(ctx, "annotation", ids)
if y != nil {
r.log.Error("Failed to clean annotations by time", "count", len(ids), "err", y)
} else {
r.log.Debug("Cleaned annotations by time", "count", len(ids), "affected", x)
}
Steps of Reproduction ✅
1. Execute `TestIntegrationAnnotationCleanUp` in
`pkg/services/annotations/annotationsimpl/cleanup_test.go:18-149`, focusing on the test
case `"should remove annotations created before cut off point"` defined at lines 53-66,
which configures non-zero `MaxAge` and sets up cleanup batch size.

2. Inside that test, `createTestAnnotations` at `cleanup_test.go:232-292` bulk-inserts
annotations into the `annotation` table, then `ProvideCleanupService` is called at
`cleanup_test.go:131` to construct a `CleanupServiceImpl` with a real `xormRepositoryImpl`
store.

3. The test calls `cleaner.Run` at `cleanup_test.go:132`, which in
`CleanupServiceImpl.Run` (`cleanup.go:35-57`) invokes `cs.store.CleanAnnotations(ctx,
cfg.AlertingAnnotationCleanupSetting, alertAnnotationType)` when `MaxAge > 0`, entering
`xormRepositoryImpl.CleanAnnotations` at `xorm_store.go:519`.

4. In `CleanAnnotations`, the time-based branch at `xorm_store.go:521-544` calls
`untilDoneOrCancelled` with a batch callback defined at `xorm_store.go:528-539`; on each
batch, after fetching IDs at `xorm_store.go:530-533`, the logger at `xorm_store.go:534`
emits an error-level log `"Annotations to clean by time"` including the full `ids` slice,
and after deletion at `xorm_store.go:536` it logs `"cleaned annotations by time"` at
`xorm_store.go:537` as another error-level entry, even when `err`/`y` are nil. With many
annotations and configured batch size (as in the test), this produces repeated, large
error logs containing all IDs under completely normal operation.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** pkg/services/annotations/annotationsimpl/xorm_store.go
**Line:** 534:537
**Comment:**
	*Performance: In the time-based cleanup branch, every batch logs the full slice of annotation IDs at error level even on successful execution; with large batch sizes this can generate extremely large log entries and unnecessary error-level noise, impacting performance and log usability. The fix is to log only aggregate information (like counts and conditions), use debug-level for normal operation, and reserve error-level logging for actual errors from the delete call.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Comment on lines +554 to +557
r.log.Error("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond, "err", err)

x, y := r.deleteByIDs(ctx, "annotation", ids)
r.log.Error("cleaned annotations by count", "count", len(ids), "affected", x, "err", y)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: In the max-count cleanup branch, the code again logs the entire slice of annotation IDs at error level for every batch, which can be tens of thousands of IDs and represents normal operation rather than an error; this will bloat logs and slow cleanup. Adjust the logging to only record counts/conditions at debug level and use error logging only when the delete operation actually fails. [performance]

Severity Level: Major ⚠️
- ⚠️ Max-count cleanup emits repeated error-level logs per batch.
- ⚠️ Logging all IDs increases CPU and log volume.
- ⚠️ Genuine annotation-count issues harder to spot in logs.
Suggested change
r.log.Error("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond, "err", err)
x, y := r.deleteByIDs(ctx, "annotation", ids)
r.log.Error("cleaned annotations by count", "count", len(ids), "affected", x, "err", y)
r.log.Debug("Annotations to clean by count", "count", len(ids), "cond", cond)
x, y := r.deleteByIDs(ctx, "annotation", ids)
if y != nil {
r.log.Error("Failed to clean annotations by count", "count", len(ids), "err", y)
} else {
r.log.Debug("Cleaned annotations by count", "count", len(ids), "affected", x)
}
Steps of Reproduction ✅
1. Execute `TestIntegrationAnnotationCleanUp` in
`pkg/services/annotations/annotationsimpl/cleanup_test.go:18-149`, focusing on the test
case `"should only keep three annotations"` at lines 68-81, which configures `MaxCount=3`
in `settingsFn` and sets `annotationCleanupJobBatchSize` at line 71.

2. As in the test, `createTestAnnotations` (`cleanup_test.go:232-292`) inserts many
annotations, and `ProvideCleanupService` is used at `cleanup_test.go:131` to create a
`CleanupServiceImpl` backed by `xormRepositoryImpl`.

3. The call `cleaner.Run(context.Background(), test.cfg)` at `cleanup_test.go:132` invokes
`CleanupServiceImpl.Run` (`cleanup.go:35-57`), which, for this config, calls
`cs.store.CleanAnnotations` with `cfg.AlertingAnnotationCleanupSetting`,
`cfg.APIAnnotationCleanupSettings`, and `cfg.DashboardAnnotationCleanupSettings` where
`MaxCount > 0`, entering `xormRepositoryImpl.CleanAnnotations` at `xorm_store.go:519`.

4. In `CleanAnnotations`, the count-based branch at `xorm_store.go:546-564` executes
`untilDoneOrCancelled` with a batch callback at `xorm_store.go:548-559`; on each batch,
after `r.fetchIDs` at `xorm_store.go:550-553`, the logger at `xorm_store.go:554` logs
`"Annotations to clean by count"` at error level including the full `ids` slice and
condition, then after `r.deleteByIDs` at `xorm_store.go:556-557` another error-level
`"cleaned annotations by count"` log is emitted. With many annotations and small
`MaxCount`, multiple batches are processed, so normal operation produces many large
error-level log entries containing all IDs.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** pkg/services/annotations/annotationsimpl/xorm_store.go
**Line:** 554:557
**Comment:**
	*Performance: In the max-count cleanup branch, the code again logs the entire slice of annotation IDs at error level for every batch, which can be tens of thousands of IDs and represents normal operation rather than an error; this will bloat logs and slow cleanup. Adjust the logging to only record counts/conditions at debug level and use error logging only when the delete operation actually fails.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Comment on lines +576 to +579
r.log.Error("Tags to clean", "count", len(ids), "ids", ids, "cond", cond, "err", err)

x, y := r.deleteByIDs(ctx, "annotation_tag", ids)
r.log.Error("cleaned tags", "count", len(ids), "affected", x, "err", y)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The orphaned-tag cleanup also logs the entire list of tag IDs at error level during normal operation, which for large batches can flood logs and slow the job without any actual error; it should instead log summary information at debug level and only emit error-level logs when the delete operation returns an error. This keeps logging proportional and prevents performance degradation from stringifying huge ID slices. [performance]

Severity Level: Major ⚠️
- ⚠️ Orphaned-tag cleanup logs large ID lists as errors.
- ⚠️ Normal tag maintenance floods logs with error entries.
- ⚠️ Tag-related failures harder to distinguish from noise.
Suggested change
r.log.Error("Tags to clean", "count", len(ids), "ids", ids, "cond", cond, "err", err)
x, y := r.deleteByIDs(ctx, "annotation_tag", ids)
r.log.Error("cleaned tags", "count", len(ids), "affected", x, "err", y)
r.log.Debug("Tags to clean", "count", len(ids), "cond", cond)
x, y := r.deleteByIDs(ctx, "annotation_tag", ids)
if y != nil {
r.log.Error("Failed to clean tags", "count", len(ids), "err", y)
} else {
r.log.Debug("Cleaned tags", "count", len(ids), "affected", x)
}
Steps of Reproduction ✅
1. Run `TestIntegrationAnnotationCleanUp` in
`pkg/services/annotations/annotationsimpl/cleanup_test.go:18-149` with any case where
annotations are actually deleted (e.g., `"should remove annotations created before cut off
point"` at lines 53-66 or `"should only keep three annotations"` at lines 68-81).

2. The test inserts annotations and corresponding tag rows via `createTestAnnotations`
(`cleanup_test.go:232-292`), then constructs a `CleanupServiceImpl` with
`ProvideCleanupService` (`cleanup_test.go:131`) and calls `cleaner.Run` at
`cleanup_test.go:132`.

3. In `CleanupServiceImpl.Run` (`cleanup.go:35-57`), after three calls to
`cs.store.CleanAnnotations` for different annotation types, if `totalCleanedAnnotations >
0` the code at `cleanup.go:54-56` calls `cs.store.CleanOrphanedAnnotationTags(ctx)`, which
dispatches to `xormRepositoryImpl.CleanOrphanedAnnotationTags` at `xorm_store.go:569`.

4. `CleanOrphanedAnnotationTags` uses `untilDoneOrCancelled` (`xorm_store.go:569-582`)
with a batch callback that builds `cond` at `xorm_store.go:571`, then calls `r.fetchIDs`
at `xorm_store.go:572-575`; for each batch, the logger at `xorm_store.go:576` logs `"Tags
to clean"` at error level including the entire `ids` slice and condition, followed by
another error-level `"cleaned tags"` log at `xorm_store.go:579` after `r.deleteByIDs`.
When many orphaned tags exist (e.g., as created in the integration test), this results in
multiple large error-level log entries during normal tag cleanup.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** pkg/services/annotations/annotationsimpl/xorm_store.go
**Line:** 576:579
**Comment:**
	*Performance: The orphaned-tag cleanup also logs the entire list of tag IDs at error level during normal operation, which for large batches can flood logs and slow the job without any actual error; it should instead log summary information at debug level and only emit error-level logs when the delete operation returns an error. This keeps logging proportional and prevents performance degradation from stringifying huge ID slices.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@codeant-ai

codeant-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

CodeAnt AI finished running the review.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

@github-actions github-actions Bot added the stale label Mar 16, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions!

@github-actions github-actions Bot closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants