Database Performance Optimizations#7
Conversation
… deadlocks on MySQL (#80329) * Split subquery when cleaning annotations * update comment * Raise batch size, now that we pay attention to it * Iterate in batches * Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use * Use split-out utility in outer batching loop so it respects context cancellation * guard against empty queries * Use SQL parameters * Use same approach for tags * drop unused function * Work around parameter limit on sqlite for large batches * Bulk insert test data in DB * Refactor test to customise test data creation * Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit * Turn annotation cleanup test to integration tests * lint --------- Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
WalkthroughThe annotation cleanup logic was refactored to avoid MySQL deadlocks by switching from single-statement batched deletes to a two-step process: fetching IDs in batches and deleting them iteratively. Integration tests for cleanup were expanded for efficiency and coverage, including large batch scenarios. The cleanup service now runs every minute instead of every ten minutes. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as CleanupService
participant Repo as xormRepositoryImpl
participant DB as Database
Scheduler->>Repo: Run cleanup (every 1 min)
loop Until no more records or cancelled
Repo->>DB: Fetch batch of IDs to delete
DB-->>Repo: Return IDs
Repo->>DB: Delete records by IDs
DB-->>Repo: Confirm deletion
end
Repo-->>Scheduler: Report total deleted
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.2.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
pkg/services/annotations/annotationsimpl/xorm_store.go (3)
13-13: Import organization: Consider grouping importsThe new
migratorimport is correctly added but consider grouping it with other grafana imports for better organization.
584-595: Good safety check, consider adding documentationThe empty condition check prevents accidentally fetching all records. Since this method builds SQL dynamically, consider adding a comment documenting that the
conditionparameter must be sanitized and should never contain user input directly.+// fetchIDs retrieves IDs from the specified table matching the given condition. +// The condition parameter must be pre-sanitized and should never contain direct user input. +// An empty condition will return an error to prevent fetching the entire table. func (r *xormRepositoryImpl) fetchIDs(ctx context.Context, table, condition string) ([]int64, error) {
608-613: Consider SQL query size limits for large batchesWhile the SQLite parameter limit workaround is necessary, constructing SQL strings with thousands of IDs could hit query size limits or cause memory issues. Consider chunking large batches even for SQLite.
if r.db.GetDBType() == migrator.SQLite && r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit { + // Further chunk if batch size is extremely large to avoid query size limits + const maxIDsPerQuery = 10000 + if len(ids) > maxIDsPerQuery { + var totalAffected int64 + for i := 0; i < len(ids); i += maxIDsPerQuery { + end := i + maxIDsPerQuery + if end > len(ids) { + end = len(ids) + } + affected, err := r.deleteByIDs(ctx, table, ids[i:end]) + totalAffected += affected + if err != nil { + return totalAffected, err + } + } + return totalAffected, nil + } values := fmt.Sprint(ids[0])pkg/services/annotations/annotationsimpl/cleanup_test.go (1)
273-291: Efficient bulk insert implementationThe batched insert approach with a batch size of 500 is much more efficient than individual inserts. Good use of
minfunction to handle the last batch.Consider extracting the batch size as a constant to avoid duplication:
+ const insertBatchSize = 500 err := store.WithDbSession(context.Background(), func(sess *db.Session) error { - batchsize := 500 - for i := 0; i < len(newAnnotations); i += batchsize { - _, err := sess.InsertMulti(newAnnotations[i:min(i+batchsize, len(newAnnotations))]) + for i := 0; i < len(newAnnotations); i += insertBatchSize { + _, err := sess.InsertMulti(newAnnotations[i:min(i+insertBatchSize, len(newAnnotations))])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/services/annotations/annotationsimpl/cleanup_test.go(9 hunks)pkg/services/annotations/annotationsimpl/xorm_store.go(2 hunks)pkg/services/cleanup/cleanup.go(1 hunks)
🔇 Additional comments (5)
pkg/services/annotations/annotationsimpl/xorm_store.go (1)
640-661: Well-designed batch processing helperThe
untilDoneOrCancelledfunction provides a clean abstraction for the repeated batch processing pattern. Good error handling and context cancellation support.pkg/services/cleanup/cleanup.go (1)
77-77: Significant increase in cleanup frequency - verify performance impactChanging the cleanup interval from 10 minutes to 1 minute is a 10x increase in frequency. While the refactored cleanup logic is more efficient, this could still impact database performance, especially on busy systems.
Consider:
- Has this been load tested with realistic data volumes?
- Should this be configurable rather than hard-coded?
- Are there metrics in place to monitor the cleanup job duration and database impact?
pkg/services/annotations/annotationsimpl/cleanup_test.go (3)
18-22: Good practice: Integration test naming and skip conditionRenaming to include "Integration" prefix and adding skip for short test runs follows Go testing best practices.
97-111: Excellent test coverage for SQLite edge caseThis test case effectively verifies that the implementation can handle batch sizes exceeding SQLite's parameter limit (999). The choice of 32767 for batch size and 40003 total annotations ensures thorough testing of the batching logic.
121-127: Good error handling with errors.JoinUsing
errors.Jointo combine both deletion errors ensures no error is lost during cleanup. This is a best practice for handling multiple potential errors.
| r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err) | ||
|
|
There was a problem hiding this comment.
Use appropriate log levels for debugging information
The code uses Error log level for what appears to be debugging information. This will pollute error logs in production. Consider using Debug or Info level instead.
- r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)
+ r.log.Debug("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond)
- r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y)
+ if y != nil {
+ r.log.Error("Failed to clean annotations by time", "count", len(ids), "affected", x, "err", y)
+ } else {
+ r.log.Debug("Cleaned annotations by time", "count", len(ids), "affected", x)
+ }Also applies to: 537-538
🤖 Prompt for AI Agents
In pkg/services/annotations/annotationsimpl/xorm_store.go around lines 534 to
535 and also lines 537 to 538, the log statements currently use the Error log
level for what seems to be debugging or informational messages. Change these log
calls from Error to Debug or Info level to prevent cluttering error logs in
production environments. Ensure the log level accurately reflects the message
severity.
| r.log.Error("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond, "err", err) | ||
|
|
There was a problem hiding this comment.
Use appropriate log levels for debugging information (MaxCount section)
Same issue with log levels in the MaxCount cleanup section.
- r.log.Error("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond, "err", err)
+ r.log.Debug("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond)
- r.log.Error("cleaned annotations by count", "count", len(ids), "affected", x, "err", y)
+ if y != nil {
+ r.log.Error("Failed to clean annotations by count", "count", len(ids), "affected", x, "err", y)
+ } else {
+ r.log.Debug("Cleaned annotations by count", "count", len(ids), "affected", x)
+ }Also applies to: 557-558
🤖 Prompt for AI Agents
In pkg/services/annotations/annotationsimpl/xorm_store.go around lines 554 to
558, the log statements in the MaxCount cleanup section are using the Error log
level for what appears to be debugging or informational messages. Change these
log statements to use a more appropriate log level such as Debug or Info instead
of Error to better reflect the nature of the messages and avoid misleading error
logs.
| if err != nil { | ||
| return 0, err | ||
| } | ||
| r.log.Error("Tags to clean", "count", len(ids), "ids", ids, "cond", cond, "err", err) |
There was a problem hiding this comment.
Use appropriate log levels in CleanOrphanedAnnotationTags
Same logging level issue in the orphaned tags cleanup.
- r.log.Error("Tags to clean", "count", len(ids), "ids", ids, "cond", cond, "err", err)
+ r.log.Debug("Tags to clean", "count", len(ids), "ids", ids, "cond", cond)
- r.log.Error("cleaned tags", "count", len(ids), "affected", x, "err", y)
+ if y != nil {
+ r.log.Error("Failed to clean tags", "count", len(ids), "affected", x, "err", y)
+ } else {
+ r.log.Debug("Cleaned tags", "count", len(ids), "affected", x)
+ }Also applies to: 579-579
🤖 Prompt for AI Agents
In pkg/services/annotations/annotationsimpl/xorm_store.go at lines 576 and 579,
the log statements for cleaning orphaned annotation tags use the Error log level
inappropriately. Change these log calls to use a more suitable log level such as
Info or Debug, depending on the importance of the message, to better reflect the
severity and avoid misleading error reporting.
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions! |
Test 7
Summary by CodeRabbit