Skip to content

Unified Storage Performance Optimizations#1

Closed
ShashankFC wants to merge 1 commit into
performance-optimization-baselinefrom
unified-storage-enhancements
Closed

Unified Storage Performance Optimizations#1
ShashankFC wants to merge 1 commit into
performance-optimization-baselinefrom
unified-storage-enhancements

Conversation

@ShashankFC

@ShashankFC ShashankFC commented Feb 13, 2026

Copy link
Copy Markdown

User description

Test 10nn

Summary by CodeRabbit

  • Bug Fixes

    • Fixed context propagation in distributed tracing for API operations.
  • Tests

    • Added conditional skip for Postgres-based test cases.
  • Chores

    • Optimized search index building with refined resource locking.
    • Added search index metrics tracking and logging.

✏️ Tip: You can customize this high-level summary in your review settings.

nn---n*Replicated from [ai-code-review-evaluation/grafana-coderabbit#10](https://github.com/ai-code-review-evaluation/grafana-coderabbit/pull/10)*

CodeAnt-AI Description

Initialize unified resource server at startup; speed up index builds and improve tracing and logging

What Changed

  • Resource server now runs full initialization during creation (startup), and initialization failures surface immediately instead of being invoked lazily on each API call.
  • Removed repeated runtime Init checks from RPC handlers (Create, Update, Delete, Read, List, Watch, Search, History, Origin, IsHealthy, blob operations) so requests no longer trigger initialization; startup must succeed for the server to accept requests.
  • Index building no longer holds the global cache lock while constructing an index; the lock is taken only when inserting the finished index into the cache, reducing contention and speeding concurrent index builds.
  • Tracing context propagation corrected for several search and SQL operations so spans use the returned context; search init now logs duration and total documents after initialization.
  • Integration test tweak: a storage-related integration test is skipped for Postgres in CI where it is known to fail.
  • High index-latency warnings now use the server logger consistently.

Impact

✅ Faster index builds under concurrent load
✅ Fewer runtime initialization surprises (startup fails fast if init fails)
✅ Clearer distributed traces for search and SQL operations

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

…#97529)

* dont lazy init unified storage

* Inits index when creating new resource server. Fixes trace propagation by passing span ctx. Update some logging.

* Use finer grained cache locking when building indexes to speed things up. Locking the whole function was slowing things down.

* formatting

* linter fix

* go mod

* make update-workspace

* fix workspaces check error

* update dependency owner in mod file

* wait 1 second before querying metrics

* try with big timeout, see if fixes CI. Wont fail locally.

* skips postgres integration test. Only fails in drone. Will fix later.

* put delay back to 500 ms
@ShashankFC

Copy link
Copy Markdown
Author

@CodeAnt-AI: review

@codeant-ai

codeant-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

CodeAnt AI is running the review.

@codeant-ai codeant-ai Bot added the size:S This PR changes 10-29 lines, ignoring generated files label Feb 13, 2026
@codeant-ai

codeant-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

Sequence Diagram

The PR moves ResourceServer initialization to startup (calling Init during NewResourceServer), ensures tracing context is propagated, and initializes the search index and watcher eagerly. Runtime RPC handlers no longer call Init on each request — they assume the server is already initialized. The diagram shows the startup init path and the simplified request path.

sequenceDiagram
    participant Operator
    participant ResourceServer
    participant SearchSupport
    participant Backend
    participant Client

    Operator->>ResourceServer: NewResourceServer(opts) (startup)
    ResourceServer->>ResourceServer: Init(ctx)  // eager init during construction
    ResourceServer->>SearchSupport: init(ctx)  // build indexes & start watcher
    SearchSupport->>Backend: GetResourceStats / BuildIndex (index documents)
    Backend-->>SearchSupport: Index ready
    SearchSupport-->>ResourceServer: Search initialized (metrics logged, watcher running)

    Client->>ResourceServer: API call (Create/Read/List/Search)
    ResourceServer-->>Backend: Handle request (no Init check)
Loading

Generated by CodeAnt AI

@codeant-ai

codeant-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Loop variable capture
    The goroutine started inside the for-loop captures the loop variable info. That causes all goroutines to observe the final info value instead of each iteration's value. Also totalBatchesIndexed++ is incremented from multiple goroutines without synchronization (data race). Fix by capturing a local copy of info per iteration and making the batch counter thread-safe.

  • Concurrent build race
    The implementation removed the earlier top-level lock around BuildIndex, and now only writes to the cache at the end with a lock. This allows concurrent builds for the same key — multiple goroutines may create duplicate indices concurrently, leading to wasted work and subtle races. Consider serializing builds per-key (singleflight or per-key mutex) or re-introducing a stronger lock around the check/build/store sequence.

  • Context propagation
    The PR changed GetResourceStats to capture the returned context from tracer.Start (fixing propagation). However many other methods still use "_, span := b.tracer.Start(...)" and thus do not use the returned context. Verify that all call-sites that start a span also propagate the returned context into downstream calls (DB calls, goroutines, nested operations) so tracing context isn't lost.

  • Watcher context propagation
    The watch loop uses watchctx := context.Background() rather than deriving from the startup ctx. This drops cancellation and tracing context from the caller. If you expect the watcher to inherit cancellation and trace/span info, use the parent ctx (or a child of it) instead of a new background context.

  • Tracing metadata & error reporting
    The new span in GetResourceStats can be improved by attaching relevant attributes (namespace, minCount) and by recording/setting span status on errors from DB operations. Without this, traces will lack searchable metadata and errors won't be reflected in span status.

if err := s.Init(ctx); err != nil {
return nil, err
}
return s.search.History(ctx, req)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The History RPC calls the search backend unconditionally without checking if search has been configured, which will cause a nil pointer dereference when the server is created without a search backend (for example when the unified storage search feature flag is disabled), unlike the Search RPC which returns a clear error in this case. [null pointer]

Severity Level: Critical 🚨
- ❌ ResourceIndex.History RPC panics when search feature disabled.
- ❌ Unified storage gRPC process can crash on history calls.
- ⚠️ API history endpoints relying on unified index become unusable.
Suggested change
return s.search.History(ctx, req)
if s.search == nil {
return nil, fmt.Errorf("search index not configured")
}
Steps of Reproduction ✅
1. Start the unified storage gRPC service using `ProvideUnifiedStorageGrpcService` at
`pkg/storage/unified/sql/service.go:103-150`, which calls `NewResourceServer` in
`pkg/storage/unified/sql/server.go:21-82`.

2. Run with `featuremgmt.FlagUnifiedStorageSearch` disabled so the `if
features.IsEnabledGlobally(featuremgmt.FlagUnifiedStorageSearch)` block at
`pkg/storage/unified/sql/server.go:59-69` is skipped, leaving `opts.Search` at its zero
value and never setting `opts.Search.Backend` or `opts.Search.Resources`.

3. `NewResourceServer` at `pkg/storage/unified/resource/server.go:182-234` sees
`opts.Search.Resources == nil` and therefore never calls `newSearchSupport`; as a result,
the constructed `server` instance has `s.search == nil`, but the gRPC
`ResourceIndexServer` is still registered unconditionally in
`pkg/storage/unified/sql/service.go:127-131` via
`resource.RegisterResourceIndexServer(srv, server)`.

4. Trigger a History RPC call, for example via the apistore history handler which calls
`r.index.History(ctx, &resource.HistoryRequest{...})` at
`pkg/storage/unified/apistore/history.go:76` (found via Grep); this routes to
`(*server).History` at `pkg/storage/unified/resource/server.go:924-926`, which executes
`return s.search.History(ctx, req)` on a nil `s.search`, causing a `panic: runtime error:
invalid memory address or nil pointer dereference` in the unified storage server.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** pkg/storage/unified/resource/server.go
**Line:** 925:925
**Comment:**
	*Null Pointer: The History RPC calls the search backend unconditionally without checking if search has been configured, which will cause a nil pointer dereference when the server is created without a search backend (for example when the unified storage search feature flag is disabled), unlike the Search RPC which returns a clear error in this case.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

if err := s.Init(ctx); err != nil {
return nil, err
}
return s.search.Origin(ctx, req)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The Origin RPC also calls the search backend without checking if search is configured, so on servers started without a search backend this will dereference a nil pointer instead of returning a clear "not configured" error, unlike the Search RPC which already performs this guard. [null pointer]

Severity Level: Critical 🚨
- ❌ ResourceIndex.Origin RPC panics when search feature disabled.
- ❌ Dashboard origin queries can crash unified storage service.
- ⚠️ Any Origin-based provisioning fails instead of returning clear error.
Suggested change
return s.search.Origin(ctx, req)
if s.search == nil {
return nil, fmt.Errorf("search index not configured")
}
Steps of Reproduction ✅
1. Start the unified storage gRPC service via `ProvideUnifiedStorageGrpcService` at
`pkg/storage/unified/sql/service.go:103-150`, which creates the SQL-backed
`ResourceServer` using `NewResourceServer` at `pkg/storage/unified/sql/server.go:21-82`.

2. Ensure `featuremgmt.FlagUnifiedStorageSearch` is disabled so the search setup block at
`pkg/storage/unified/sql/server.go:59-69` is not executed; this leaves `opts.Search` with
a nil `Backend` and `Resources`, so `NewResourceServer` in
`pkg/storage/unified/resource/server.go:182-234` does not construct a `searchSupport`,
leaving `s.search == nil`.

3. The same `server` instance is still registered as a `ResourceIndexServer` in
`pkg/storage/unified/sql/service.go:127-131` via
`resource.RegisterResourceIndexServer(srv, server)`, exposing the `Origin` RPC even though
no search backend is configured.

4. Invoke the Origin RPC, e.g. from dashboard legacy storage which defines an `Origin`
method using `*resource.OriginRequest` at
`pkg/registry/apis/dashboard/legacy/storage.go:333` (via Grep); this call is routed to
`(*server).Origin` at `pkg/storage/unified/resource/server.go:929-931`, which executes
`return s.search.Origin(ctx, req)` on a nil `s.search`, causing a nil-pointer panic in the
unified storage server.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** pkg/storage/unified/resource/server.go
**Line:** 930:930
**Comment:**
	*Null Pointer: The Origin RPC also calls the search backend without checking if search is configured, so on servers started without a search backend this will dereference a nil pointer instead of returning a clear "not configured" error, unlike the Search RPC which already performs this guard.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@codeant-ai

codeant-ai Bot commented Feb 13, 2026

Copy link
Copy Markdown

CodeAnt AI finished running the review.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

@github-actions github-actions Bot added the stale label Mar 16, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions!

@github-actions github-actions Bot closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants