Skip to content

Dual Storage Architecture#7

Draft
ShashankFC wants to merge 1 commit into
dual-storage-baselinefrom
dual-storage-enhanced
Draft

Dual Storage Architecture#7
ShashankFC wants to merge 1 commit into
dual-storage-baselinefrom
dual-storage-enhanced

Conversation

@ShashankFC

@ShashankFC ShashankFC commented Dec 31, 2025

Copy link
Copy Markdown
Collaborator

Test 6nn

Summary by CodeRabbit

  • New Features

    • Added List operation support to DualWriterMode3.
  • Improvements

    • Enhanced logging for DualWriterMode3 operations with mode-specific context.
    • Improved asynchronous handling of legacy storage operations with timeout management.
  • Tests

    • Added comprehensive test suite for DualWriterMode3 CRUD operations.
    • Introduced integration tests for dual-write mode across multiple storage backends (file, unified, etcd).
nn---n*Replicated from [ai-code-review-evaluation/grafana-coderabbit#6](https://github.com/ai-code-review-evaluation/grafana-coderabbit/pull/6)*

* Dual writer: mode 3

* Add integration tests for playlits in mode 3

* Remove todo

* Update pkg/apiserver/rest/dualwriter_mode3.go

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>

* Admin: Fixes an issue where user accounts could not be enabled (#88117)

Fix: unable to enable user

* [REVIEW] FInish mode 3 and add tests

* Improve logging

* Update dependencies

* Update pkg/apiserver/rest/dualwriter_mode3_test.go

Co-authored-by: maicon <maiconscosta@gmail.com>

* remove test assertion

* Use mode log when dual writer is initiated

---------

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: gonvee <gonvee@qq.com>
Co-authored-by: maicon <maiconscosta@gmail.com>
@ShashankFC

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jan 12, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jan 12, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The changes implement DualWriterMode3 functionality with synchronous primary storage operations and asynchronous legacy operations using timeouts. Mode-specific logging is added throughout, and comprehensive test coverage is provided for dual-writer interactions across multiple storage backends.

Changes

Cohort / File(s) Summary
DualWriterMode3 Implementation
pkg/apiserver/rest/dualwriter_mode3.go
Introduced mode-specific logging with mode3Str context. Refactored Create, Get, Delete, Update, and DeleteCollection methods to execute primary operations against Storage synchronously while spawning asynchronous Legacy operations with 10-second timeouts. Added new List method. Updated imports to include errors and time packages. Modified duration recording to track both storage and legacy operation timings separately.
DualWriterMode3 Test Suite
pkg/apiserver/rest/dualwriter_mode3_test.go
Replaced skeletal test scaffold with comprehensive test suite covering Create, Get, List, Delete, DeleteCollection, and Update operations. Added real test cases with mocks exercising dual-writer interactions, error handling paths, and assertions on results, resource versions, and object identity across legacy and storage interactions.
Integration Tests for Mode3
pkg/tests/apis/playlist/playlist_test.go
Added integration test cases for DualWriterMode3 across three storage backends: file storage, unified storage, and etcd. Each variant configures appropriate flags and storage backends while exercising the dual-write pattern with pre-test cleanup for etcd collections.
Test Infrastructure Cleanup
pkg/apiserver/rest/dualwriter_mode1_test.go
Removed redundant local Prometheus registry creation in TestMode1_Get, relying on package-level registry variable instead.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant DualWriterMode3
    participant Storage
    participant Legacy
    participant Timeout

    Client->>DualWriterMode3: Create/Update/Delete request
    DualWriterMode3->>Storage: Execute operation synchronously
    activate Storage
    Storage-->>DualWriterMode3: Result + Duration
    deactivate Storage
    DualWriterMode3->>DualWriterMode3: Record storage duration
    DualWriterMode3-->>Client: Return result immediately
    
    par Asynchronous Legacy Operation
        DualWriterMode3->>Timeout: Create 10s timeout context
        activate Timeout
        DualWriterMode3->>Legacy: Execute operation asynchronously
        activate Legacy
        Legacy-->>DualWriterMode3: Result or timeout
        deactivate Legacy
        deactivate Timeout
        DualWriterMode3->>DualWriterMode3: Record legacy duration
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A dual-writer hops with grace,
Storage first in sync's embrace,
Legacy follows in the night,
Asynchronous with timeout's might,
Mode3 dances left and right! 🎭

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Dual Storage Architecture' is vague and generic, using broad terminology that doesn't convey the specific primary change in the changeset. Consider a more specific title like 'Implement DualWriterMode3 with async legacy operations' or 'Add DualWriterMode3 with timeout-based async writes and comprehensive tests' to clearly indicate the main changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dual-storage-enhanced

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

@github-actions github-actions Bot added the stale label Feb 12, 2026
@ShashankFC

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Feb 23, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
pkg/apiserver/rest/dualwriter_mode3_test.go (2)

49-54: Shared *mock.Mock across legacy and storage mocks conflates call expectations.

Both legacyStoreMock and storageMock use the same *mock.Mock instance, making it impossible to verify that a call was routed to the correct store. For instance, in the Create error case (Lines 40-41), the expectation is set up via setupLegacyFn on the shared mock, but it's actually Storage.Create that matches it at runtime.

This pattern exists in all test functions in this file. Consider using separate *mock.Mock instances for each store so you can independently assert which backend was invoked.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3_test.go` around lines 49 - 54, The tests
use a single *mock.Mock instance for both legacyStoreMock and storageMock which
conflates expectations; create two distinct mocks (e.g., legacyMock and
storageMockInst) and pass them into legacyStoreMock and storageMock respectively
so expectations set via setupLegacyFn attach only to legacyMock and expectations
for Storage.Create/other methods attach only to storageMockInst; update all test
functions in this file that currently construct ls := legacyStoreMock{m, l} and
us := storageMock{m, s} to use separate mock instances and adjust the setup
calls (setupLegacyFn, setupStorageFn / Storage.Create expectations) accordingly
so each backend’s calls can be asserted independently.

72-74: ResourceVersion assertion is fragile and depends on mock internals.

assert.Equal(t, acc.GetResourceVersion(), "1") assumes that the mock exampleObj always has ResourceVersion "1". This doesn't actually test that Mode3 sets or preserves the resource version correctly — it just mirrors whatever the mock returns. This assertion adds little value and will break if the shared test fixture changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3_test.go` around lines 72 - 74, The
ResourceVersion assertion using acc.GetResourceVersion() is fragile because it
depends on the mock fixture (exampleObj) and doesn't validate Mode3 behavior;
instead, remove the hardcoded assert.Equal on acc.GetResourceVersion() or
replace it with a behavior-driven check: if you need to assert preservation,
capture the resource version from meta.Accessor(obj) before invoking the Mode3
operation and then assert the post-operation
meta.Accessor(obj).GetResourceVersion() equals that captured value; otherwise
simply drop the ResourceVersion assertion to avoid coupling tests to mock
internals.
pkg/apiserver/rest/dualwriter_mode3.go (1)

25-25: Long initialization line hinders readability.

Consider breaking the struct literal across multiple lines.

♻️ Suggested refactor
-	return &DualWriterMode3{Legacy: legacy, Storage: storage, Log: klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str), dualWriterMetrics: dwm}
+	return &DualWriterMode3{
+		Legacy:            legacy,
+		Storage:           storage,
+		Log:               klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str),
+		dualWriterMetrics: dwm,
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3.go` at line 25, The return line
constructs a DualWriterMode3 with a very long single-line struct literal; split
the struct literal across multiple lines for readability by placing each field
on its own line (e.g., Legacy: legacy, Storage: storage, Log:
klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str),
dualWriterMetrics: dwm) so the fields of DualWriterMode3 are easy to scan and
maintain; ensure trailing commas and proper indentation are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/apiserver/rest/dualwriter_mode3_test.go`:
- Line 63: Tests in this file inconsistently reference the Prometheus registry
variable p (TestMode3_Get defines a local p while TestMode3_Create,
TestMode3_List, TestMode3_Delete, TestMode3_DeleteCollection, and
TestMode3_Update rely on a package-level p), so make the usage consistent:
either remove the local p in TestMode3_Get and use the package-level registry
everywhere, or preferably define a local prometheus registry in each test that
currently uses p (e.g., add p := prometheus.NewRegistry() at the top of
TestMode3_Create, TestMode3_List, TestMode3_Delete, TestMode3_DeleteCollection,
and TestMode3_Update) so NewDualWriter(Mode3, ls, us, p) always receives a
locally defined registry and tests no longer depend on a package-level variable.
- Around line 47-77: The test can race with Mode3's async legacy goroutines
spawned by NewDualWriter(...).Create; modify the test to wait for those
goroutines to finish before returning: either add a short synchronization (e.g.,
a sync.WaitGroup or a channel) to your legacy mock (legacyStoreMock / mock.Mock)
and have the legacy call decrement/close it, then in the test invoke
dw.Create(...) and wait (wg.Wait() or receive on the channel) before asserting
and exiting, or as a minimal change add a short time.Sleep after dw.Create to
let the background goroutine complete; ensure the synchronization points are
wired into the setupLegacyFn so the async legacy invocation is awaited.

In `@pkg/apiserver/rest/dualwriter_mode3.go`:
- Around line 43-46: The error branch after Storage.Create currently calls
d.recordLegacyDuration, misattributing storage failures; replace that call with
d.recordStorageDuration using the same parameters (pass the error flag,
mode3Str, options.Kind, method, startStorage) so storage Create failures are
recorded to the storage metric instead of the legacy metric (update in the same
function/block where Storage.Create is handled).
- Around line 94-106: In Delete (DualWriterMode3.Delete) fix two bugs: replace
klog.NewContext(ctx, d.Log) with klog.NewContext(ctx, log) so the enriched
logger (log) is used in the context, and on the success path call
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
instead of passing name so the metric's kind label uses options.Kind; keep all
other args (mode3Str, method, startStorage) unchanged.
- Around line 126-131: When d.Storage.Update fails, the code incorrectly calls
d.recordLegacyDuration(...) instead of the storage metric function; replace the
call to d.recordLegacyDuration(true, mode3Str, options.Kind, method,
startStorage) with d.recordStorageDuration(true, mode3Str, options.Kind, method,
startStorage) so storage errors are attributed to the correct metric (look for
the failure handling around d.Storage.Update and the existing
recordStorageDuration usage elsewhere to mirror parameters).
- Around line 161-167: The async goroutine invoking d.Legacy.DeleteCollection is
incorrectly calling d.recordStorageDuration; update that call to
d.recordLegacyDuration so the metric is attributed to the legacy path. In the
goroutine that creates ctx/cancel and calls d.Legacy.DeleteCollection (look for
startLegacy, ctx := context.WithTimeoutCause(...), and the DeleteCollection
call), replace the d.recordStorageDuration(err != nil, mode3Str, options.Kind,
method, startLegacy) invocation with d.recordLegacyDuration(err != nil,
mode3Str, options.Kind, method, startLegacy) to record legacy duration metrics.
- Around line 50-57: The goroutine for async legacy writes uses
context.WithTimeoutCause(ctx, ...) which inherits the request context and gets
cancelled when the handler returns; update the code to wrap the request context
with context.WithoutCancel(ctx) before calling context.WithTimeoutCause so the
10s timeout remains effective (e.g., replace context.WithTimeoutCause(ctx, ...)
with context.WithTimeoutCause(context.WithoutCancel(ctx), ...) in the Create
goroutine that calls d.Legacy.Create), and apply the same pattern in the other
affected async legacy write goroutines (Delete, Update, DeleteCollection) so
their d.Legacy.<Create|Delete|Update|DeleteCollection> calls use the decoupled
timeout context.

---

Nitpick comments:
In `@pkg/apiserver/rest/dualwriter_mode3_test.go`:
- Around line 49-54: The tests use a single *mock.Mock instance for both
legacyStoreMock and storageMock which conflates expectations; create two
distinct mocks (e.g., legacyMock and storageMockInst) and pass them into
legacyStoreMock and storageMock respectively so expectations set via
setupLegacyFn attach only to legacyMock and expectations for
Storage.Create/other methods attach only to storageMockInst; update all test
functions in this file that currently construct ls := legacyStoreMock{m, l} and
us := storageMock{m, s} to use separate mock instances and adjust the setup
calls (setupLegacyFn, setupStorageFn / Storage.Create expectations) accordingly
so each backend’s calls can be asserted independently.
- Around line 72-74: The ResourceVersion assertion using
acc.GetResourceVersion() is fragile because it depends on the mock fixture
(exampleObj) and doesn't validate Mode3 behavior; instead, remove the hardcoded
assert.Equal on acc.GetResourceVersion() or replace it with a behavior-driven
check: if you need to assert preservation, capture the resource version from
meta.Accessor(obj) before invoking the Mode3 operation and then assert the
post-operation meta.Accessor(obj).GetResourceVersion() equals that captured
value; otherwise simply drop the ResourceVersion assertion to avoid coupling
tests to mock internals.

In `@pkg/apiserver/rest/dualwriter_mode3.go`:
- Line 25: The return line constructs a DualWriterMode3 with a very long
single-line struct literal; split the struct literal across multiple lines for
readability by placing each field on its own line (e.g., Legacy: legacy,
Storage: storage, Log:
klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str),
dualWriterMetrics: dwm) so the fields of DualWriterMode3 are easy to scan and
maintain; ensure trailing commas and proper indentation are used.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3a5b3 and 67b74e1.

⛔ Files ignored due to path filters (1)
  • go.work.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • pkg/apiserver/rest/dualwriter_mode1_test.go
  • pkg/apiserver/rest/dualwriter_mode3.go
  • pkg/apiserver/rest/dualwriter_mode3_test.go
  • pkg/tests/apis/playlist/playlist_test.go
💤 Files with no reviewable changes (1)
  • pkg/apiserver/rest/dualwriter_mode1_test.go

Comment on lines +47 to +77
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
l := (LegacyStorage)(nil)
s := (Storage)(nil)
m := &mock.Mock{}

ls := legacyStoreMock{m, l}
us := storageMock{m, s}

if tt.setupLegacyFn != nil {
tt.setupLegacyFn(m, tt.input)
}
if tt.setupStorageFn != nil {
tt.setupStorageFn(m)
}

dw := NewDualWriter(Mode3, ls, us, p)

obj, err := dw.Create(context.Background(), tt.input, func(context.Context, runtime.Object) error { return nil }, &metav1.CreateOptions{})

if tt.wantErr {
assert.Error(t, err)
return
}

acc, err := meta.Accessor(obj)
assert.NoError(t, err)
assert.Equal(t, acc.GetResourceVersion(), "1")
assert.NotEqual(t, obj, anotherObj)
})
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Async legacy goroutines may still be running when the test completes, causing data races.

Mode3's Create (and Delete, Update, DeleteCollection) fires async goroutines for legacy operations. The test doesn't wait for these goroutines to finish before the test ends and the mock is garbage-collected. Under -race, this can produce data-race reports on the shared mock.

Consider adding a brief time.Sleep or, better, restructuring the mock so the goroutine's legacy call can be awaited (e.g., using mock.Mock.Called channel or sync.WaitGroup injected into the test).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3_test.go` around lines 47 - 77, The test
can race with Mode3's async legacy goroutines spawned by
NewDualWriter(...).Create; modify the test to wait for those goroutines to
finish before returning: either add a short synchronization (e.g., a
sync.WaitGroup or a channel) to your legacy mock (legacyStoreMock / mock.Mock)
and have the legacy call decrement/close it, then in the test invoke
dw.Create(...) and wait (wg.Wait() or receive on the channel) before asserting
and exiting, or as a minimal change add a short time.Sleep after dw.Create to
let the background goroutine complete; ensure the synchronization points are
wired into the setupLegacyFn so the async legacy invocation is awaited.

tt.setupStorageFn(m)
}

dw := NewDualWriter(Mode3, ls, us, p)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

p (Prometheus registry) is not defined in TestMode3_Create; relies on a package-level variable.

In TestMode3_Get (Line 119), a local p := prometheus.NewRegistry() is created, but TestMode3_Create, TestMode3_List, TestMode3_Delete, TestMode3_DeleteCollection, and TestMode3_Update all reference p without a local definition. This likely resolves to a package-level var p defined in another test file (e.g., dualwriter_mode1_test.go).

This inconsistency is fragile — if the package-level p is removed or renamed, five of the six tests break. Either use the package-level registry consistently in all tests (removing the local in TestMode3_Get) or define it locally in each test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3_test.go` at line 63, Tests in this file
inconsistently reference the Prometheus registry variable p (TestMode3_Get
defines a local p while TestMode3_Create, TestMode3_List, TestMode3_Delete,
TestMode3_DeleteCollection, and TestMode3_Update rely on a package-level p), so
make the usage consistent: either remove the local p in TestMode3_Get and use
the package-level registry everywhere, or preferably define a local prometheus
registry in each test that currently uses p (e.g., add p :=
prometheus.NewRegistry() at the top of TestMode3_Create, TestMode3_List,
TestMode3_Delete, TestMode3_DeleteCollection, and TestMode3_Update) so
NewDualWriter(Mode3, ls, us, p) always receives a locally defined registry and
tests no longer depend on a package-level variable.

Comment on lines 43 to 46
if err != nil {
log.Error(err, "unable to create object in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Recording wrong metric on storage Create failure.

When Storage.Create fails, d.recordLegacyDuration is called instead of d.recordStorageDuration. This misattributes storage errors to the legacy metric.

🐛 Proposed fix
 	if err != nil {
 		log.Error(err, "unable to create object in storage")
-		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return created, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
log.Error(err, "unable to create object in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
if err != nil {
log.Error(err, "unable to create object in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 43 - 46, The error
branch after Storage.Create currently calls d.recordLegacyDuration,
misattributing storage failures; replace that call with d.recordStorageDuration
using the same parameters (pass the error flag, mode3Str, options.Kind, method,
startStorage) so storage Create failures are recorded to the storage metric
instead of the legacy metric (update in the same function/block where
Storage.Create is handled).

Comment on lines +50 to +57
go func() {
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
defer cancel()

startLegacy := time.Now()
_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
}()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Go version used in the project to verify if context.WithoutCancel is available (Go 1.21+)
cat go.mod | head -5

# Check if other dual writer modes handle this differently
rg -n "go func" pkg/apiserver/rest/dualwriter_mode*.go -A5

Repository: AI-Code-Review-Evals/coderabbit-grafana

Length of output: 5755


🏁 Script executed:

#!/bin/bash
# Read the full methods around the problematic lines to understand the control flow
echo "=== dualwriter_mode3.go Create method (lines 40-70) ==="
sed -n '40,70p' pkg/apiserver/rest/dualwriter_mode3.go

echo ""
echo "=== dualwriter_mode3.go Delete method (lines 95-120) ==="
sed -n '95,120p' pkg/apiserver/rest/dualwriter_mode3.go

echo ""
echo "=== Check if there's any WaitGroup or sync mechanism ==="
rg -n "sync\." pkg/apiserver/rest/dualwriter_mode3.go

echo ""
echo "=== Check context handling in the broader file ==="
rg -n "context\." pkg/apiserver/rest/dualwriter_mode3.go | head -20

Repository: AI-Code-Review-Evals/coderabbit-grafana

Length of output: 4324


🏁 Script executed:

#!/bin/bash
# Check if the methods are part of an HTTP handler context
echo "=== Check method signatures and callers ==="
rg -n "DualWriterMode3" pkg/apiserver/rest/ --type go | grep -E "(Create|Delete|Update|DeleteCollection)" | head -10

echo ""
echo "=== Check if there's any shutdown/graceful handling ==="
rg -n "shutdown\|graceful\|wait" pkg/apiserver/rest/dualwriter_mode3.go

echo ""
echo "=== Verify that this is indeed called from HTTP handlers ==="
fd -t f "*.go" pkg/apiserver/rest/ | xargs rg -l "DualWriterMode3" | head -5

echo ""
echo "=== Read full file length to understand scope ==="
wc -l pkg/apiserver/rest/dualwriter_mode3.go
</parameter>

Repository: AI-Code-Review-Evals/coderabbit-grafana

Length of output: 1819


🏁 Script executed:

#!/bin/bash
# Check the dualwriter.go to understand the architecture
echo "=== Read dualwriter.go to understand context flow ==="
wc -l pkg/apiserver/rest/dualwriter.go
head -100 pkg/apiserver/rest/dualwriter.go

echo ""
echo "=== Check where DualWriterMode3 is instantiated and used ==="
rg -n "DualWriterMode3" pkg/apiserver/rest/dualwriter.go | head -20

Repository: AI-Code-Review-Evals/coderabbit-grafana

Length of output: 3665


🏁 Script executed:

#!/bin/bash
# Check how these Storage interface methods are typically called in Kubernetes API server
echo "=== Search for HTTP handler usage pattern ==="
rg -n "rest.Storage" pkg/apiserver/rest/dualwriter.go -A2 -B2 | head -40

echo ""
echo "=== Check if there's any documentation about the intended behavior ==="
head -80 pkg/apiserver/rest/dualwriter_mode3.go | tail -50

echo ""
echo "=== Verify the issue: test if context cancellation would stop the goroutine ==="
# This is a conceptual check - context.WithTimeoutCause behavior with parent cancellation
echo "=== Context behavior check (documentation confirmation) ==="
rg -n "WithTimeoutCause\|WithoutCancel" pkg/apiserver/rest/
</parameter>

Repository: AI-Code-Review-Evals/coderabbit-grafana

Length of output: 3126


🏁 Script executed:

#!/bin/bash
# Check context cancellation behavior
echo "=== Check how context is passed through ==="
grep -n "ctx.*context" pkg/apiserver/rest/dualwriter_mode3.go | head -15

echo ""
echo "=== Verify: Does WithTimeoutCause inherit parent's cancellation? ==="
echo "According to Go documentation, context.WithTimeoutCause creates a child context."
echo "If the parent context is cancelled, all child contexts are cancelled as well."
echo "This is documented in the context package behavior."

echo ""
echo "=== Check if there's any graceful shutdown mechanism ==="
rg -n "Shutdown\|GracefulStop\|Wait" pkg/apiserver/rest/ -l

echo ""
echo "=== Read the complete Create, Delete, Update methods to confirm the flow ==="
sed -n '36,62p' pkg/apiserver/rest/dualwriter_mode3.go
echo "---"
sed -n '94,119p' pkg/apiserver/rest/dualwriter_mode3.go
echo "---"
sed -n '120,146p' pkg/apiserver/rest/dualwriter_mode3.go
</parameter>

Repository: AI-Code-Review-Evals/coderabbit-grafana

Length of output: 5872


Use context.WithoutCancel() to prevent premature cancellation of async legacy writes.

Goroutines launching legacy write operations derive their timeout context from the request context. Since the HTTP handler returns immediately after launching the goroutine (without waiting), the request context gets cancelled when the response is sent. Because context.WithTimeoutCause creates a child context that inherits parent cancellation, these background operations will be cancelled as soon as the parent is, rendering the 10-second timeout ineffective.

Replace context.WithTimeoutCause(ctx, ...) with context.WithTimeoutCause(context.WithoutCancel(ctx), ...) to decouple the background operation from the request lifecycle. This pattern appears in the Create, Delete, Update, and DeleteCollection methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 50 - 57, The goroutine
for async legacy writes uses context.WithTimeoutCause(ctx, ...) which inherits
the request context and gets cancelled when the handler returns; update the code
to wrap the request context with context.WithoutCancel(ctx) before calling
context.WithTimeoutCause so the 10s timeout remains effective (e.g., replace
context.WithTimeoutCause(ctx, ...) with
context.WithTimeoutCause(context.WithoutCancel(ctx), ...) in the Create
goroutine that calls d.Legacy.Create), and apply the same pattern in the other
affected async legacy write goroutines (Delete, Update, DeleteCollection) so
their d.Legacy.<Create|Delete|Update|DeleteCollection> calls use the decoupled
timeout context.

Comment on lines +94 to +106
func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) {
var method = "delete"
log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method)
ctx = klog.NewContext(ctx, d.Log)

startStorage := time.Now()
res, async, err := d.Storage.Delete(ctx, name, deleteValidation, options)
if err != nil {
log.Error(err, "unable to delete object in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}
d.recordStorageDuration(false, mode3Str, name, method, startStorage)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Two bugs in Delete: wrong logger context and wrong metric label.

  1. Line 97: klog.NewContext(ctx, d.Log) uses the base d.Log instead of the enriched log variable (which includes name, kind, method). Every other method passes log.

  2. Line 106: On the success path, d.recordStorageDuration(false, mode3Str, name, method, startStorage) passes name (the object name) as the kind parameter, while Line 103 (error path) correctly passes options.Kind. This will produce incorrect/meaningless metric labels on success.

🐛 Proposed fix
-	ctx = klog.NewContext(ctx, d.Log)
+	ctx = klog.NewContext(ctx, log)
-	d.recordStorageDuration(false, mode3Str, name, method, startStorage)
+	d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 94 - 106, In Delete
(DualWriterMode3.Delete) fix two bugs: replace klog.NewContext(ctx, d.Log) with
klog.NewContext(ctx, log) so the enriched logger (log) is used in the context,
and on the success path call d.recordStorageDuration(false, mode3Str,
options.Kind, method, startStorage) instead of passing name so the metric's kind
label uses options.Kind; keep all other args (mode3Str, method, startStorage)
unchanged.

Comment on lines +126 to 131
res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options)
if err != nil {
log.WithValues("object", updated).Error(err, "could not update or create object")
return nil, false, err
}
objInfo = &updateWrapper{
upstream: objInfo,
updated: updated,
log.Error(err, "unable to update in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Recording wrong metric on storage Update failure.

Same issue as Create: when Storage.Update fails, d.recordLegacyDuration is called instead of d.recordStorageDuration. This misattributes storage errors to the legacy metric.

🐛 Proposed fix
 	if err != nil {
 		log.Error(err, "unable to update in storage")
-		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return res, async, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options)
if err != nil {
log.WithValues("object", updated).Error(err, "could not update or create object")
return nil, false, err
}
objInfo = &updateWrapper{
upstream: objInfo,
updated: updated,
log.Error(err, "unable to update in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}
res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options)
if err != nil {
log.Error(err, "unable to update in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 126 - 131, When
d.Storage.Update fails, the code incorrectly calls d.recordLegacyDuration(...)
instead of the storage metric function; replace the call to
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) with
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) so
storage errors are attributed to the correct metric (look for the failure
handling around d.Storage.Update and the existing recordStorageDuration usage
elsewhere to mirror parameters).

Comment on lines +161 to +167
go func() {
startLegacy := time.Now()
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
defer cancel()
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Legacy DeleteCollection goroutine records to storage metric instead of legacy metric.

d.recordStorageDuration is called for the async legacy DeleteCollection operation. This should be d.recordLegacyDuration to correctly attribute the metric.

🐛 Proposed fix
 	go func() {
 		startLegacy := time.Now()
 		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
 		defer cancel()
 		_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
-		d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
+		d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
 	}()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go func() {
startLegacy := time.Now()
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
defer cancel()
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()
go func() {
startLegacy := time.Now()
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
defer cancel()
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 161 - 167, The async
goroutine invoking d.Legacy.DeleteCollection is incorrectly calling
d.recordStorageDuration; update that call to d.recordLegacyDuration so the
metric is attributed to the legacy path. In the goroutine that creates
ctx/cancel and calls d.Legacy.DeleteCollection (look for startLegacy, ctx :=
context.WithTimeoutCause(...), and the DeleteCollection call), replace the
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
invocation with d.recordLegacyDuration(err != nil, mode3Str, options.Kind,
method, startLegacy) to record legacy duration metrics.

@github-actions github-actions Bot removed the stale label Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants