Dual Storage Architecture#7
Conversation
* Dual writer: mode 3 * Add integration tests for playlits in mode 3 * Remove todo * Update pkg/apiserver/rest/dualwriter_mode3.go Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com> * Admin: Fixes an issue where user accounts could not be enabled (#88117) Fix: unable to enable user * [REVIEW] FInish mode 3 and add tests * Improve logging * Update dependencies * Update pkg/apiserver/rest/dualwriter_mode3_test.go Co-authored-by: maicon <maiconscosta@gmail.com> * remove test assertion * Use mode log when dual writer is initiated --------- Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com> Co-authored-by: gonvee <gonvee@qq.com> Co-authored-by: maicon <maiconscosta@gmail.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThe changes implement DualWriterMode3 functionality with synchronous primary storage operations and asynchronous legacy operations using timeouts. Mode-specific logging is added throughout, and comprehensive test coverage is provided for dual-writer interactions across multiple storage backends. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DualWriterMode3
participant Storage
participant Legacy
participant Timeout
Client->>DualWriterMode3: Create/Update/Delete request
DualWriterMode3->>Storage: Execute operation synchronously
activate Storage
Storage-->>DualWriterMode3: Result + Duration
deactivate Storage
DualWriterMode3->>DualWriterMode3: Record storage duration
DualWriterMode3-->>Client: Return result immediately
par Asynchronous Legacy Operation
DualWriterMode3->>Timeout: Create 10s timeout context
activate Timeout
DualWriterMode3->>Legacy: Execute operation asynchronously
activate Legacy
Legacy-->>DualWriterMode3: Result or timeout
deactivate Legacy
deactivate Timeout
DualWriterMode3->>DualWriterMode3: Record legacy duration
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions! |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (3)
pkg/apiserver/rest/dualwriter_mode3_test.go (2)
49-54: Shared*mock.Mockacross legacy and storage mocks conflates call expectations.Both
legacyStoreMockandstorageMockuse the same*mock.Mockinstance, making it impossible to verify that a call was routed to the correct store. For instance, in the Create error case (Lines 40-41), the expectation is set up viasetupLegacyFnon the shared mock, but it's actuallyStorage.Createthat matches it at runtime.This pattern exists in all test functions in this file. Consider using separate
*mock.Mockinstances for each store so you can independently assert which backend was invoked.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/apiserver/rest/dualwriter_mode3_test.go` around lines 49 - 54, The tests use a single *mock.Mock instance for both legacyStoreMock and storageMock which conflates expectations; create two distinct mocks (e.g., legacyMock and storageMockInst) and pass them into legacyStoreMock and storageMock respectively so expectations set via setupLegacyFn attach only to legacyMock and expectations for Storage.Create/other methods attach only to storageMockInst; update all test functions in this file that currently construct ls := legacyStoreMock{m, l} and us := storageMock{m, s} to use separate mock instances and adjust the setup calls (setupLegacyFn, setupStorageFn / Storage.Create expectations) accordingly so each backend’s calls can be asserted independently.
72-74: ResourceVersion assertion is fragile and depends on mock internals.
assert.Equal(t, acc.GetResourceVersion(), "1")assumes that the mockexampleObjalways has ResourceVersion"1". This doesn't actually test that Mode3 sets or preserves the resource version correctly — it just mirrors whatever the mock returns. This assertion adds little value and will break if the shared test fixture changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/apiserver/rest/dualwriter_mode3_test.go` around lines 72 - 74, The ResourceVersion assertion using acc.GetResourceVersion() is fragile because it depends on the mock fixture (exampleObj) and doesn't validate Mode3 behavior; instead, remove the hardcoded assert.Equal on acc.GetResourceVersion() or replace it with a behavior-driven check: if you need to assert preservation, capture the resource version from meta.Accessor(obj) before invoking the Mode3 operation and then assert the post-operation meta.Accessor(obj).GetResourceVersion() equals that captured value; otherwise simply drop the ResourceVersion assertion to avoid coupling tests to mock internals.pkg/apiserver/rest/dualwriter_mode3.go (1)
25-25: Long initialization line hinders readability.Consider breaking the struct literal across multiple lines.
♻️ Suggested refactor
- return &DualWriterMode3{Legacy: legacy, Storage: storage, Log: klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str), dualWriterMetrics: dwm} + return &DualWriterMode3{ + Legacy: legacy, + Storage: storage, + Log: klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str), + dualWriterMetrics: dwm, + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/apiserver/rest/dualwriter_mode3.go` at line 25, The return line constructs a DualWriterMode3 with a very long single-line struct literal; split the struct literal across multiple lines for readability by placing each field on its own line (e.g., Legacy: legacy, Storage: storage, Log: klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str), dualWriterMetrics: dwm) so the fields of DualWriterMode3 are easy to scan and maintain; ensure trailing commas and proper indentation are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/apiserver/rest/dualwriter_mode3_test.go`:
- Line 63: Tests in this file inconsistently reference the Prometheus registry
variable p (TestMode3_Get defines a local p while TestMode3_Create,
TestMode3_List, TestMode3_Delete, TestMode3_DeleteCollection, and
TestMode3_Update rely on a package-level p), so make the usage consistent:
either remove the local p in TestMode3_Get and use the package-level registry
everywhere, or preferably define a local prometheus registry in each test that
currently uses p (e.g., add p := prometheus.NewRegistry() at the top of
TestMode3_Create, TestMode3_List, TestMode3_Delete, TestMode3_DeleteCollection,
and TestMode3_Update) so NewDualWriter(Mode3, ls, us, p) always receives a
locally defined registry and tests no longer depend on a package-level variable.
- Around line 47-77: The test can race with Mode3's async legacy goroutines
spawned by NewDualWriter(...).Create; modify the test to wait for those
goroutines to finish before returning: either add a short synchronization (e.g.,
a sync.WaitGroup or a channel) to your legacy mock (legacyStoreMock / mock.Mock)
and have the legacy call decrement/close it, then in the test invoke
dw.Create(...) and wait (wg.Wait() or receive on the channel) before asserting
and exiting, or as a minimal change add a short time.Sleep after dw.Create to
let the background goroutine complete; ensure the synchronization points are
wired into the setupLegacyFn so the async legacy invocation is awaited.
In `@pkg/apiserver/rest/dualwriter_mode3.go`:
- Around line 43-46: The error branch after Storage.Create currently calls
d.recordLegacyDuration, misattributing storage failures; replace that call with
d.recordStorageDuration using the same parameters (pass the error flag,
mode3Str, options.Kind, method, startStorage) so storage Create failures are
recorded to the storage metric instead of the legacy metric (update in the same
function/block where Storage.Create is handled).
- Around line 94-106: In Delete (DualWriterMode3.Delete) fix two bugs: replace
klog.NewContext(ctx, d.Log) with klog.NewContext(ctx, log) so the enriched
logger (log) is used in the context, and on the success path call
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
instead of passing name so the metric's kind label uses options.Kind; keep all
other args (mode3Str, method, startStorage) unchanged.
- Around line 126-131: When d.Storage.Update fails, the code incorrectly calls
d.recordLegacyDuration(...) instead of the storage metric function; replace the
call to d.recordLegacyDuration(true, mode3Str, options.Kind, method,
startStorage) with d.recordStorageDuration(true, mode3Str, options.Kind, method,
startStorage) so storage errors are attributed to the correct metric (look for
the failure handling around d.Storage.Update and the existing
recordStorageDuration usage elsewhere to mirror parameters).
- Around line 161-167: The async goroutine invoking d.Legacy.DeleteCollection is
incorrectly calling d.recordStorageDuration; update that call to
d.recordLegacyDuration so the metric is attributed to the legacy path. In the
goroutine that creates ctx/cancel and calls d.Legacy.DeleteCollection (look for
startLegacy, ctx := context.WithTimeoutCause(...), and the DeleteCollection
call), replace the d.recordStorageDuration(err != nil, mode3Str, options.Kind,
method, startLegacy) invocation with d.recordLegacyDuration(err != nil,
mode3Str, options.Kind, method, startLegacy) to record legacy duration metrics.
- Around line 50-57: The goroutine for async legacy writes uses
context.WithTimeoutCause(ctx, ...) which inherits the request context and gets
cancelled when the handler returns; update the code to wrap the request context
with context.WithoutCancel(ctx) before calling context.WithTimeoutCause so the
10s timeout remains effective (e.g., replace context.WithTimeoutCause(ctx, ...)
with context.WithTimeoutCause(context.WithoutCancel(ctx), ...) in the Create
goroutine that calls d.Legacy.Create), and apply the same pattern in the other
affected async legacy write goroutines (Delete, Update, DeleteCollection) so
their d.Legacy.<Create|Delete|Update|DeleteCollection> calls use the decoupled
timeout context.
---
Nitpick comments:
In `@pkg/apiserver/rest/dualwriter_mode3_test.go`:
- Around line 49-54: The tests use a single *mock.Mock instance for both
legacyStoreMock and storageMock which conflates expectations; create two
distinct mocks (e.g., legacyMock and storageMockInst) and pass them into
legacyStoreMock and storageMock respectively so expectations set via
setupLegacyFn attach only to legacyMock and expectations for
Storage.Create/other methods attach only to storageMockInst; update all test
functions in this file that currently construct ls := legacyStoreMock{m, l} and
us := storageMock{m, s} to use separate mock instances and adjust the setup
calls (setupLegacyFn, setupStorageFn / Storage.Create expectations) accordingly
so each backend’s calls can be asserted independently.
- Around line 72-74: The ResourceVersion assertion using
acc.GetResourceVersion() is fragile because it depends on the mock fixture
(exampleObj) and doesn't validate Mode3 behavior; instead, remove the hardcoded
assert.Equal on acc.GetResourceVersion() or replace it with a behavior-driven
check: if you need to assert preservation, capture the resource version from
meta.Accessor(obj) before invoking the Mode3 operation and then assert the
post-operation meta.Accessor(obj).GetResourceVersion() equals that captured
value; otherwise simply drop the ResourceVersion assertion to avoid coupling
tests to mock internals.
In `@pkg/apiserver/rest/dualwriter_mode3.go`:
- Line 25: The return line constructs a DualWriterMode3 with a very long
single-line struct literal; split the struct literal across multiple lines for
readability by placing each field on its own line (e.g., Legacy: legacy,
Storage: storage, Log:
klog.NewKlogr().WithName("DualWriterMode3").WithValues("mode", mode3Str),
dualWriterMetrics: dwm) so the fields of DualWriterMode3 are easy to scan and
maintain; ensure trailing commas and proper indentation are used.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.work.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
pkg/apiserver/rest/dualwriter_mode1_test.gopkg/apiserver/rest/dualwriter_mode3.gopkg/apiserver/rest/dualwriter_mode3_test.gopkg/tests/apis/playlist/playlist_test.go
💤 Files with no reviewable changes (1)
- pkg/apiserver/rest/dualwriter_mode1_test.go
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| l := (LegacyStorage)(nil) | ||
| s := (Storage)(nil) | ||
| m := &mock.Mock{} | ||
|
|
||
| ls := legacyStoreMock{m, l} | ||
| us := storageMock{m, s} | ||
|
|
||
| if tt.setupLegacyFn != nil { | ||
| tt.setupLegacyFn(m, tt.input) | ||
| } | ||
| if tt.setupStorageFn != nil { | ||
| tt.setupStorageFn(m) | ||
| } | ||
|
|
||
| dw := NewDualWriter(Mode3, ls, us, p) | ||
|
|
||
| obj, err := dw.Create(context.Background(), tt.input, func(context.Context, runtime.Object) error { return nil }, &metav1.CreateOptions{}) | ||
|
|
||
| if tt.wantErr { | ||
| assert.Error(t, err) | ||
| return | ||
| } | ||
|
|
||
| acc, err := meta.Accessor(obj) | ||
| assert.NoError(t, err) | ||
| assert.Equal(t, acc.GetResourceVersion(), "1") | ||
| assert.NotEqual(t, obj, anotherObj) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Async legacy goroutines may still be running when the test completes, causing data races.
Mode3's Create (and Delete, Update, DeleteCollection) fires async goroutines for legacy operations. The test doesn't wait for these goroutines to finish before the test ends and the mock is garbage-collected. Under -race, this can produce data-race reports on the shared mock.
Consider adding a brief time.Sleep or, better, restructuring the mock so the goroutine's legacy call can be awaited (e.g., using mock.Mock.Called channel or sync.WaitGroup injected into the test).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/apiserver/rest/dualwriter_mode3_test.go` around lines 47 - 77, The test
can race with Mode3's async legacy goroutines spawned by
NewDualWriter(...).Create; modify the test to wait for those goroutines to
finish before returning: either add a short synchronization (e.g., a
sync.WaitGroup or a channel) to your legacy mock (legacyStoreMock / mock.Mock)
and have the legacy call decrement/close it, then in the test invoke
dw.Create(...) and wait (wg.Wait() or receive on the channel) before asserting
and exiting, or as a minimal change add a short time.Sleep after dw.Create to
let the background goroutine complete; ensure the synchronization points are
wired into the setupLegacyFn so the async legacy invocation is awaited.
| tt.setupStorageFn(m) | ||
| } | ||
|
|
||
| dw := NewDualWriter(Mode3, ls, us, p) |
There was a problem hiding this comment.
p (Prometheus registry) is not defined in TestMode3_Create; relies on a package-level variable.
In TestMode3_Get (Line 119), a local p := prometheus.NewRegistry() is created, but TestMode3_Create, TestMode3_List, TestMode3_Delete, TestMode3_DeleteCollection, and TestMode3_Update all reference p without a local definition. This likely resolves to a package-level var p defined in another test file (e.g., dualwriter_mode1_test.go).
This inconsistency is fragile — if the package-level p is removed or renamed, five of the six tests break. Either use the package-level registry consistently in all tests (removing the local in TestMode3_Get) or define it locally in each test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/apiserver/rest/dualwriter_mode3_test.go` at line 63, Tests in this file
inconsistently reference the Prometheus registry variable p (TestMode3_Get
defines a local p while TestMode3_Create, TestMode3_List, TestMode3_Delete,
TestMode3_DeleteCollection, and TestMode3_Update rely on a package-level p), so
make the usage consistent: either remove the local p in TestMode3_Get and use
the package-level registry everywhere, or preferably define a local prometheus
registry in each test that currently uses p (e.g., add p :=
prometheus.NewRegistry() at the top of TestMode3_Create, TestMode3_List,
TestMode3_Delete, TestMode3_DeleteCollection, and TestMode3_Update) so
NewDualWriter(Mode3, ls, us, p) always receives a locally defined registry and
tests no longer depend on a package-level variable.
| if err != nil { | ||
| log.Error(err, "unable to create object in storage") | ||
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) | ||
| return created, err |
There was a problem hiding this comment.
Bug: Recording wrong metric on storage Create failure.
When Storage.Create fails, d.recordLegacyDuration is called instead of d.recordStorageDuration. This misattributes storage errors to the legacy metric.
🐛 Proposed fix
if err != nil {
log.Error(err, "unable to create object in storage")
- d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+ d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err != nil { | |
| log.Error(err, "unable to create object in storage") | |
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) | |
| return created, err | |
| if err != nil { | |
| log.Error(err, "unable to create object in storage") | |
| d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) | |
| return created, err | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 43 - 46, The error
branch after Storage.Create currently calls d.recordLegacyDuration,
misattributing storage failures; replace that call with d.recordStorageDuration
using the same parameters (pass the error flag, mode3Str, options.Kind, method,
startStorage) so storage Create failures are recorded to the storage metric
instead of the legacy metric (update in the same function/block where
Storage.Create is handled).
| go func() { | ||
| ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout")) | ||
| defer cancel() | ||
|
|
||
| startLegacy := time.Now() | ||
| _, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options) | ||
| d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy) | ||
| }() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check Go version used in the project to verify if context.WithoutCancel is available (Go 1.21+)
cat go.mod | head -5
# Check if other dual writer modes handle this differently
rg -n "go func" pkg/apiserver/rest/dualwriter_mode*.go -A5Repository: AI-Code-Review-Evals/coderabbit-grafana
Length of output: 5755
🏁 Script executed:
#!/bin/bash
# Read the full methods around the problematic lines to understand the control flow
echo "=== dualwriter_mode3.go Create method (lines 40-70) ==="
sed -n '40,70p' pkg/apiserver/rest/dualwriter_mode3.go
echo ""
echo "=== dualwriter_mode3.go Delete method (lines 95-120) ==="
sed -n '95,120p' pkg/apiserver/rest/dualwriter_mode3.go
echo ""
echo "=== Check if there's any WaitGroup or sync mechanism ==="
rg -n "sync\." pkg/apiserver/rest/dualwriter_mode3.go
echo ""
echo "=== Check context handling in the broader file ==="
rg -n "context\." pkg/apiserver/rest/dualwriter_mode3.go | head -20Repository: AI-Code-Review-Evals/coderabbit-grafana
Length of output: 4324
🏁 Script executed:
#!/bin/bash
# Check if the methods are part of an HTTP handler context
echo "=== Check method signatures and callers ==="
rg -n "DualWriterMode3" pkg/apiserver/rest/ --type go | grep -E "(Create|Delete|Update|DeleteCollection)" | head -10
echo ""
echo "=== Check if there's any shutdown/graceful handling ==="
rg -n "shutdown\|graceful\|wait" pkg/apiserver/rest/dualwriter_mode3.go
echo ""
echo "=== Verify that this is indeed called from HTTP handlers ==="
fd -t f "*.go" pkg/apiserver/rest/ | xargs rg -l "DualWriterMode3" | head -5
echo ""
echo "=== Read full file length to understand scope ==="
wc -l pkg/apiserver/rest/dualwriter_mode3.go
</parameter>Repository: AI-Code-Review-Evals/coderabbit-grafana
Length of output: 1819
🏁 Script executed:
#!/bin/bash
# Check the dualwriter.go to understand the architecture
echo "=== Read dualwriter.go to understand context flow ==="
wc -l pkg/apiserver/rest/dualwriter.go
head -100 pkg/apiserver/rest/dualwriter.go
echo ""
echo "=== Check where DualWriterMode3 is instantiated and used ==="
rg -n "DualWriterMode3" pkg/apiserver/rest/dualwriter.go | head -20Repository: AI-Code-Review-Evals/coderabbit-grafana
Length of output: 3665
🏁 Script executed:
#!/bin/bash
# Check how these Storage interface methods are typically called in Kubernetes API server
echo "=== Search for HTTP handler usage pattern ==="
rg -n "rest.Storage" pkg/apiserver/rest/dualwriter.go -A2 -B2 | head -40
echo ""
echo "=== Check if there's any documentation about the intended behavior ==="
head -80 pkg/apiserver/rest/dualwriter_mode3.go | tail -50
echo ""
echo "=== Verify the issue: test if context cancellation would stop the goroutine ==="
# This is a conceptual check - context.WithTimeoutCause behavior with parent cancellation
echo "=== Context behavior check (documentation confirmation) ==="
rg -n "WithTimeoutCause\|WithoutCancel" pkg/apiserver/rest/
</parameter>Repository: AI-Code-Review-Evals/coderabbit-grafana
Length of output: 3126
🏁 Script executed:
#!/bin/bash
# Check context cancellation behavior
echo "=== Check how context is passed through ==="
grep -n "ctx.*context" pkg/apiserver/rest/dualwriter_mode3.go | head -15
echo ""
echo "=== Verify: Does WithTimeoutCause inherit parent's cancellation? ==="
echo "According to Go documentation, context.WithTimeoutCause creates a child context."
echo "If the parent context is cancelled, all child contexts are cancelled as well."
echo "This is documented in the context package behavior."
echo ""
echo "=== Check if there's any graceful shutdown mechanism ==="
rg -n "Shutdown\|GracefulStop\|Wait" pkg/apiserver/rest/ -l
echo ""
echo "=== Read the complete Create, Delete, Update methods to confirm the flow ==="
sed -n '36,62p' pkg/apiserver/rest/dualwriter_mode3.go
echo "---"
sed -n '94,119p' pkg/apiserver/rest/dualwriter_mode3.go
echo "---"
sed -n '120,146p' pkg/apiserver/rest/dualwriter_mode3.go
</parameter>Repository: AI-Code-Review-Evals/coderabbit-grafana
Length of output: 5872
Use context.WithoutCancel() to prevent premature cancellation of async legacy writes.
Goroutines launching legacy write operations derive their timeout context from the request context. Since the HTTP handler returns immediately after launching the goroutine (without waiting), the request context gets cancelled when the response is sent. Because context.WithTimeoutCause creates a child context that inherits parent cancellation, these background operations will be cancelled as soon as the parent is, rendering the 10-second timeout ineffective.
Replace context.WithTimeoutCause(ctx, ...) with context.WithTimeoutCause(context.WithoutCancel(ctx), ...) to decouple the background operation from the request lifecycle. This pattern appears in the Create, Delete, Update, and DeleteCollection methods.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 50 - 57, The goroutine
for async legacy writes uses context.WithTimeoutCause(ctx, ...) which inherits
the request context and gets cancelled when the handler returns; update the code
to wrap the request context with context.WithoutCancel(ctx) before calling
context.WithTimeoutCause so the 10s timeout remains effective (e.g., replace
context.WithTimeoutCause(ctx, ...) with
context.WithTimeoutCause(context.WithoutCancel(ctx), ...) in the Create
goroutine that calls d.Legacy.Create), and apply the same pattern in the other
affected async legacy write goroutines (Delete, Update, DeleteCollection) so
their d.Legacy.<Create|Delete|Update|DeleteCollection> calls use the decoupled
timeout context.
| func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) { | ||
| var method = "delete" | ||
| log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method) | ||
| ctx = klog.NewContext(ctx, d.Log) | ||
|
|
||
| startStorage := time.Now() | ||
| res, async, err := d.Storage.Delete(ctx, name, deleteValidation, options) | ||
| if err != nil { | ||
| log.Error(err, "unable to delete object in storage") | ||
| d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) | ||
| return res, async, err | ||
| } | ||
| d.recordStorageDuration(false, mode3Str, name, method, startStorage) |
There was a problem hiding this comment.
Two bugs in Delete: wrong logger context and wrong metric label.
-
Line 97:
klog.NewContext(ctx, d.Log)uses the based.Loginstead of the enrichedlogvariable (which includesname,kind,method). Every other method passeslog. -
Line 106: On the success path,
d.recordStorageDuration(false, mode3Str, name, method, startStorage)passesname(the object name) as thekindparameter, while Line 103 (error path) correctly passesoptions.Kind. This will produce incorrect/meaningless metric labels on success.
🐛 Proposed fix
- ctx = klog.NewContext(ctx, d.Log)
+ ctx = klog.NewContext(ctx, log)- d.recordStorageDuration(false, mode3Str, name, method, startStorage)
+ d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 94 - 106, In Delete
(DualWriterMode3.Delete) fix two bugs: replace klog.NewContext(ctx, d.Log) with
klog.NewContext(ctx, log) so the enriched logger (log) is used in the context,
and on the success path call d.recordStorageDuration(false, mode3Str,
options.Kind, method, startStorage) instead of passing name so the metric's kind
label uses options.Kind; keep all other args (mode3Str, method, startStorage)
unchanged.
| res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options) | ||
| if err != nil { | ||
| log.WithValues("object", updated).Error(err, "could not update or create object") | ||
| return nil, false, err | ||
| } | ||
| objInfo = &updateWrapper{ | ||
| upstream: objInfo, | ||
| updated: updated, | ||
| log.Error(err, "unable to update in storage") | ||
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) | ||
| return res, async, err | ||
| } |
There was a problem hiding this comment.
Bug: Recording wrong metric on storage Update failure.
Same issue as Create: when Storage.Update fails, d.recordLegacyDuration is called instead of d.recordStorageDuration. This misattributes storage errors to the legacy metric.
🐛 Proposed fix
if err != nil {
log.Error(err, "unable to update in storage")
- d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+ d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options) | |
| if err != nil { | |
| log.WithValues("object", updated).Error(err, "could not update or create object") | |
| return nil, false, err | |
| } | |
| objInfo = &updateWrapper{ | |
| upstream: objInfo, | |
| updated: updated, | |
| log.Error(err, "unable to update in storage") | |
| d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) | |
| return res, async, err | |
| } | |
| res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options) | |
| if err != nil { | |
| log.Error(err, "unable to update in storage") | |
| d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) | |
| return res, async, err | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 126 - 131, When
d.Storage.Update fails, the code incorrectly calls d.recordLegacyDuration(...)
instead of the storage metric function; replace the call to
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) with
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) so
storage errors are attributed to the correct metric (look for the failure
handling around d.Storage.Update and the existing recordStorageDuration usage
elsewhere to mirror parameters).
| go func() { | ||
| startLegacy := time.Now() | ||
| ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout")) | ||
| defer cancel() | ||
| _, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions) | ||
| d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy) | ||
| }() |
There was a problem hiding this comment.
Bug: Legacy DeleteCollection goroutine records to storage metric instead of legacy metric.
d.recordStorageDuration is called for the async legacy DeleteCollection operation. This should be d.recordLegacyDuration to correctly attribute the metric.
🐛 Proposed fix
go func() {
startLegacy := time.Now()
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
defer cancel()
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
- d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
+ d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| go func() { | |
| startLegacy := time.Now() | |
| ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout")) | |
| defer cancel() | |
| _, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions) | |
| d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy) | |
| }() | |
| go func() { | |
| startLegacy := time.Now() | |
| ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout")) | |
| defer cancel() | |
| _, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions) | |
| d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy) | |
| }() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 161 - 167, The async
goroutine invoking d.Legacy.DeleteCollection is incorrectly calling
d.recordStorageDuration; update that call to d.recordLegacyDuration so the
metric is attributed to the legacy path. In the goroutine that creates
ctx/cancel and calls d.Legacy.DeleteCollection (look for startLegacy, ctx :=
context.WithTimeoutCause(...), and the DeleteCollection call), replace the
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
invocation with d.recordLegacyDuration(err != nil, mode3Str, options.Kind,
method, startLegacy) to record legacy duration metrics.
Test 6nn
Summary by CodeRabbit
-
- Added List operation support to DualWriterMode3.
-
- Enhanced logging for DualWriterMode3 operations with mode-specific context.
- Improved asynchronous handling of legacy storage operations with timeout management.
-
- Added comprehensive test suite for DualWriterMode3 CRUD operations.
- Introduced integration tests for dual-write mode across multiple storage backends (file, unified, etcd).
nn---n*Replicated from [ai-code-review-evaluation/grafana-coderabbit#6](https://github.com/ai-code-review-evaluation/grafana-coderabbit/pull/6)*New Features
Improvements
Tests