Skip to content

Unified Storage Performance Optimizations#1

Open
zaibkhan wants to merge 1 commit into
performance-optimization-baselinefrom
unified-storage-enhancements
Open

Unified Storage Performance Optimizations#1
zaibkhan wants to merge 1 commit into
performance-optimization-baselinefrom
unified-storage-enhancements

Conversation

@zaibkhan

@zaibkhan zaibkhan commented Sep 5, 2025

Copy link
Copy Markdown

This PR implements performance optimizations for Grafana’s unified storage system, focusing on reducing cold-start latency, improving indexing throughput, and tightening error handling.

Key changes:

  • Refactors ResourceServer initialization to reduce request latency.
  • Moves indexing tasks into background workers with improved tracing.
  • Adds enhanced error handling and observability around storage operations.

…#97529)

* dont lazy init unified storage

* Inits index when creating new resource server. Fixes trace propagation by passing span ctx. Update some logging.

* Use finer grained cache locking when building indexes to speed things up. Locking the whole function was slowing things down.

* formatting

* linter fix

* go mod

* make update-workspace

* fix workspaces check error

* update dependency owner in mod file

* wait 1 second before querying metrics

* try with big timeout, see if fixes CI. Wont fail locally.

* skips postgres integration test. Only fails in drone. Will fix later.

* put delay back to 500 ms
@codoki-pr-intelligence

codoki-pr-intelligence Bot commented Sep 5, 2025

Copy link
Copy Markdown

Codoki PR Review

Summary: Restore safe index build locking, prevent concurrent corruption
What’s good: Eager server initialization removes per-RPC Init calls, reducing request latency and making readiness deterministic. Search init precedes watcher start, which likely avoids indexing partial state during startup. Tracing and logging improvements in search add useful observability.
Review Status: ❌ Requires changes
Overall Priority: High

Issues (Critical & High only)

Severity Issue Why it matters
High Correctness — Concurrent BuildIndex may race and corrupt on-disk index …/search/bleve.go
Removing the function-wide lock introduces a race for same-key builds: two goroutines can concurrently create the same on-disk index directory (bleve.New) and then both write the cache entry, causing errors or waste. Until a per-key singleflight is added, keep the coarse lock to serialize builds and avoid filesystem conflicts and duplicate work.

Showing top 1 issues. Critical: 0, High: 1. See inline suggestions for more.

Key Feedback (click to expand)
  • Needs improvement: The removal of the coarse lock in the Bleve BuildIndex path introduces a race for same-key builds that can lead to on-disk index creation conflicts and wasted work. Consider per-key singleflight or reintroducing the coarse lock until a finer-grained guard is added.
  • Testing: Please add a concurrent test for BuildIndex that starts two goroutines building the same key with a file-backed index and asserts only one index is created and cached, and no error is returned (detect race or path conflict).
  • Documentation: Consider a brief comment on BuildIndex clarifying concurrency guarantees (e.g., 'BuildIndex may be called concurrently; construction for the same key must be serialized or de-duplicated').
  • Compatibility: No Gorilla Mux or Go 1.23.1 compatibility issues observed in the changed code.
  • Performance: Eager initialization should reduce first-request latency and stabilize runtime behavior. However, concurrent duplicate builds for the same key will waste CPU and I/O; a per-key singleflight would preserve parallelism across keys while avoiding duplication.
  • Security: No new authentication/authorization surface changes in the diff; startup context uses a service account for watches as before.
  • Open questions: Is BuildIndex ever invoked concurrently for the same NamespacedResource (e.g., via multiple WorkerThreads or rebuilding after events)? If yes, do we have an upstream deduplication mechanism to guarantee single builder per key?

Confidence: 3/5 — Needs work before merge (1 high · status: Requires changes)

Sequence Diagram

sequenceDiagram
    participant Caller
    participant ResourceServer
    participant Search
    participant Watcher
    Caller->>ResourceServer: NewResourceServer(opts)
    ResourceServer->>ResourceServer: Init(ctx)
    alt search configured
        ResourceServer->>Search: init(ctx)
    end
    ResourceServer->>Watcher: initWatcher()
    ResourceServer-->>Caller: server instance
Loading

React with 👍 or 👎 if you found this review useful.

@@ -85,9 +85,6 @@ func (b *bleveBackend) BuildIndex(ctx context.Context,
// The builder will write all documents before returning

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ High: Removing the function-wide lock introduces a race for same-key builds: two goroutines can concurrently create the same on-disk index directory (bleve.New) and then both write the cache entry, causing errors or waste. Until a per-key singleflight is added, keep the coarse lock to serialize builds and avoid filesystem conflicts and duplicate work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants