Dual Storage Architecture by ShashankFC · Pull Request #5 · AI-Code-Review-Evals/codeant-grafana

ShashankFC · 2026-02-13T05:09:24Z

User description

Test 6nnn## Summary by CodeRabbitnn## Release Notesnn* New Featuresn * Added list functionality for dual-write mode 3 operations to support reading from storage.nn* Testsn * Enhanced test coverage for dual-write mode 3 with comprehensive test cases across Create, Get, List, Delete, and Update operations.n * Added new integration tests for dual-write mode 3 across multiple storage backends.nn_{✏️ Tip: You can customize this high-level summary in your review settings.}nnn---nReplicated from ai-code-review-evaluation/grafana-coderabbit#6

CodeAnt-AI Description

Dual-write mode 3: read from unified Storage, async legacy writes with timeouts and metrics

What Changed

Mode 3 now performs reads (Get, List) from the unified Storage and writes (Create, Update, Delete, DeleteCollection) to Storage synchronously, while legacy storage writes run asynchronously with a 10s timeout so API requests are not blocked by legacy backend delays.
Added structured logs and timing/metrics recording for Storage and Legacy operations so operation success/failure and durations are tracked and visible.
Implemented Mode 3 behavior in tests and added integration test runs for Mode 3 across file, unified, and etcd (skipped) backends to validate Create/Get/List/Delete/Update/DeleteCollection paths.

Impact

✅ Shorter create/update/delete latency for API requests
✅ Fewer API timeouts caused by slow legacy storage
✅ Test coverage for dual-write mode 3 across storage backends

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

* Dual writer: mode 3 * Add integration tests for playlits in mode 3 * Remove todo * Update pkg/apiserver/rest/dualwriter_mode3.go Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com> * Admin: Fixes an issue where user accounts could not be enabled (#88117) Fix: unable to enable user * [REVIEW] FInish mode 3 and add tests * Improve logging * Update dependencies * Update pkg/apiserver/rest/dualwriter_mode3_test.go Co-authored-by: maicon <maiconscosta@gmail.com> * remove test assertion * Use mode log when dual writer is initiated --------- Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com> Co-authored-by: gonvee <gonvee@qq.com> Co-authored-by: maicon <maiconscosta@gmail.com>

ShashankFC · 2026-02-13T06:04:53Z

@CodeAnt-AI: review

codeant-ai · 2026-02-13T06:04:57Z

CodeAnt AI is running the review.

codeant-ai · 2026-02-13T06:08:56Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Logging / Context bug In Delete the code sets the klog context with `d.Log` instead of the per-call logger `log`. That loses the WithValues fields (name/kind/method) from the context and breaks structured logs attached to ctx. Metrics mix-up Several metric recording calls are using the wrong helper (recordLegacyDuration vs recordStorageDuration) and in one goroutine the storage metric is recorded for a legacy operation. This will surface wrong telemetry (storage errors counted as legacy and vice-versa). Also the code sometimes records metrics using the resource name where the helper expects a kind. Wrong metric argument A call to `recordStorageDuration` in Delete passes the resource `name` where other calls pass `options.Kind`. This is inconsistent with other calls and likely a bug (metrics expect kind). OpenTelemetry version divergence Multiple OpenTelemetry components are added/updated to different versions (v0.37.0, v1.26.0, v0.39.0 etc.). Mixed major/minor versions across otel modules can cause subtle runtime incompatibilities. Verify all otel modules are compatible and test telemetry-related code paths. Dependency upgrades The PR adds several new/updated modules (Grafana Azure SDK, plugin SDK, OpenTelemetry components). These upgrades can introduce breaking API or behavior changes and increase the transitive dependency surface — ensure the corresponding go.mod/go.work updates were made intentionally and verify compatibility across all build targets.

codeant-ai · 2026-02-13T06:09:29Z

 	created, err := d.Storage.Create(ctx, obj, createValidation, options)
 	if err != nil {
 		log.Error(err, "unable to create object in storage")
+		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)


Suggestion: In the create path, storage failures are recorded using the legacy duration metric instead of the storage duration metric, so errors in the primary storage are misclassified and legacy metrics are updated even though legacy wasn't called. [logic error]

Severity Level: Major ⚠️

- ⚠️ Storage write failures misclassified as legacy latency in metrics. - ⚠️ Dashboards using `dual_writer_storage_duration_seconds` under-report errors. - ⚠️ Troubleshooting Mode3 storage issues becomes harder and misleading.

Suggested change

d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)

d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)

Steps of Reproduction ✅

1. Configure a DualWriter in mode 3 by calling `NewDualWriter(Mode3, legacy, storage, reg)` in `pkg/apiserver/rest/dualwriter.go:105-120`, which internally constructs `newDualWriterMode3` from `pkg/apiserver/rest/dualwriter_mode3.go:24-25`. 2. Use this DualWriter as the storage implementation for some Kubernetes resource so that its REST create handler calls `DualWriterMode3.Create` at `pkg/apiserver/rest/dualwriter_mode3.go:36-60`. 3. Trigger a create operation (e.g., via the resource's HTTP/REST create endpoint) while the primary `Storage.Create` backend is failing so that `d.Storage.Create` at line 42 returns a non-nil `err`. 4. Observe that the error branch at lines 43-47 calls `d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)` instead of `recordStorageDuration`, so the failure is recorded in the `DualWriterLegacyDuration` metric (defined in `metrics.go:25-31`) rather than `DualWriterStorageDuration` (defined in `metrics.go:17-23`).

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** pkg/apiserver/rest/dualwriter_mode3.go **Line:** 45:45 **Comment:** *Logic Error: In the create path, storage failures are recorded using the legacy duration metric instead of the storage duration metric, so errors in the primary storage are misclassified and legacy metrics are updated even though legacy wasn't called. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-13T06:09:29Z

-		upstream: objInfo,
-		updated:  updated,
+		log.Error(err, "unable to update in storage")
+		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)


Suggestion: In the update path, failures in the primary storage update are recorded using the legacy duration metric with the storage start time, so storage errors are misattributed and legacy metrics are polluted even when the legacy store is never called. [logic error]

Severity Level: Major ⚠️

- ⚠️ Storage update failures reported as legacy latency in metrics. - ⚠️ Error-rate dashboards by backend (legacy vs storage) are skewed. - ⚠️ Investigations into Mode3 update issues may chase wrong backend.

Suggested change

d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)

d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)

Steps of Reproduction ✅

1. Instantiate a Mode3 DualWriter using `NewDualWriter(Mode3, legacy, storage, reg)` in `pkg/apiserver/rest/dualwriter.go:105-120`, which returns `newDualWriterMode3` (`dualwriter_mode3.go:24-25`). 2. Wire this DualWriter into a resource so its update handler invokes `DualWriterMode3.Update` at `pkg/apiserver/rest/dualwriter_mode3.go:120-143`. 3. Execute an update operation (e.g., via the resource's HTTP/REST update endpoint) while the primary `Storage.Update` backend is failing in a way that causes `d.Storage.Update` at line 126 to return an error. 4. On error, the branch at lines 127-130 calls `d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)` instead of `recordStorageDuration`, which records the storage failure in the legacy histogram `DualWriterLegacyDuration` (`metrics.go:25-31`) instead of the storage histogram `DualWriterStorageDuration` (`metrics.go:17-23`).

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** pkg/apiserver/rest/dualwriter_mode3.go **Line:** 129:129 **Comment:** *Logic Error: In the update path, failures in the primary storage update are recorded using the legacy duration metric with the storage start time, so storage errors are misattributed and legacy metrics are polluted even when the legacy store is never called. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-13T06:09:30Z

+		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
+		defer cancel()
+		_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
+		d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)


Suggestion: In DeleteCollection, the legacy delete operation's duration is recorded using the storage duration metric instead of the legacy duration metric, so legacy work is misreported as storage time. [logic error]

Severity Level: Major ⚠️

- ⚠️ Legacy DeleteCollection latency wrongly attributed to storage backend. - ⚠️ Legacy vs storage performance comparisons are distorted for collections. - ⚠️ Alerting based on legacy duration metrics may miss slow paths.

Suggested change

d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)

d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)

Steps of Reproduction ✅

1. Initialize a Mode3 DualWriter using `NewDualWriter(Mode3, legacy, storage, reg)` in `pkg/apiserver/rest/dualwriter.go:105-120` so that `DualWriterMode3.DeleteCollection` at `pkg/apiserver/rest/dualwriter_mode3.go:147-170` is used for collection deletes on a resource. 2. Execute a DeleteCollection operation through that resource (e.g., a list-based delete), which calls `d.Storage.DeleteCollection` at lines 152-159 and then spawns the goroutine at lines 161-167. 3. The goroutine issues `d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)` at line 165 using the legacy backend, measuring its duration from `startLegacy` at line 162. 4. Observe that the metric for this legacy operation is recorded via `d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)` at line 166, which writes into `DualWriterStorageDuration` (`metrics.go:17-23,59-62`) instead of `DualWriterLegacyDuration` (`metrics.go:25-31,54-57`), misreporting legacy latency as storage latency.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** pkg/apiserver/rest/dualwriter_mode3.go **Line:** 166:166 **Comment:** *Logic Error: In DeleteCollection, the legacy delete operation's duration is recorded using the storage duration metric instead of the legacy duration metric, so legacy work is misreported as storage time. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-13T06:09:33Z

CodeAnt AI finished running the review.

github-actions · 2026-03-16T04:20:28Z

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

github-actions · 2026-03-30T04:26:05Z

This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions!

codeant-ai Bot added the size:XL This PR changes 500-999 lines, ignoring generated files label Feb 13, 2026

codeant-ai Bot reviewed Feb 13, 2026

View reviewed changes

github-actions Bot added the stale label Mar 16, 2026

github-actions Bot closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dual Storage Architecture#5

Dual Storage Architecture#5
ShashankFC wants to merge 1 commit into
dual-storage-baselinefrom
dual-storage-enhanced

ShashankFC commented Feb 13, 2026 •

edited by codeant-ai Bot

Loading

Uh oh!

ShashankFC commented Feb 13, 2026

Uh oh!

codeant-ai Bot commented Feb 13, 2026

Uh oh!

codeant-ai Bot commented Feb 13, 2026

Uh oh!

codeant-ai Bot Feb 13, 2026

Uh oh!

codeant-ai Bot Feb 13, 2026

Uh oh!

codeant-ai Bot Feb 13, 2026

Uh oh!

codeant-ai Bot commented Feb 13, 2026

Uh oh!

github-actions Bot commented Mar 16, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
	d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)

	d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
	d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)

Conversation

ShashankFC commented Feb 13, 2026 • edited by codeant-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

ShashankFC commented Feb 13, 2026

Uh oh!

codeant-ai Bot commented Feb 13, 2026

Uh oh!

codeant-ai Bot commented Feb 13, 2026

Nitpicks 🔍

Uh oh!

codeant-ai Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot commented Feb 13, 2026

Uh oh!

github-actions Bot commented Mar 16, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShashankFC commented Feb 13, 2026 •

edited by codeant-ai Bot

Loading