Conversation
Make modelsdev.Store a process-wide singleton via sync.OnceValues so the models.dev database is loaded at most once and shared across all callers (teamloader, runtime, bedrock, embedding, config tests). Key changes: - NewStore() returns the same *Store instance for the entire process - Add ETag support: store the ETag from models.dev responses and send If-None-Match on refresh; a 304 Not Modified skips the download - Switch from json.Decoder to ReadFile/ReadAll + json.Unmarshal to avoid the decoder's intermediate buffering overhead - Hoist store creation out of per-model loops in teamloader to make the singleton intent explicit and handle errors gracefully - Persist ETag in the cache file for reuse across process restarts This eliminates ~50-60MB of redundant heap allocations per run (the 3MB JSON was being parsed into ~66MB of Go objects up to 6 times). Assisted-By: docker-agent
Replace json.NewDecoder streaming decode with os.ReadFile + string
splitting + strconv.Unquote. The history file stores one JSON-quoted
string per line; Unquote handles the same escapes without the full
JSON state machine and reflection overhead.
Pre-size slices by counting newlines upfront and use map[string]struct{}
for deduplication to reduce allocation count.
Benchmarked on a 658KB / 7348-line history file:
Before: 3.05ms, 15703 allocs, 2.32MB
After: 1.41ms, 2606 allocs, 2.96MB
Assisted-By: docker-agent
There was a problem hiding this comment.
Review Summary
Assessment: 🔴 CRITICAL
This PR introduces performance optimizations through singleton patterns and optimized parsing. While the optimizations are well-intentioned, there are several issues that need attention:
Critical Issues (1):
- Singleton initialization failure is permanent and cannot recover
Notable Issues (5):
- Silent error handling may cause data loss in history
- Cache inconsistency if disk write fails
- Store initialization errors are silently ignored (2 instances)
- No validation of cached data structure
Key Findings
The singleton pattern for the models.dev store introduces a critical availability issue: if initialization fails during a transient condition (e.g., network mount not ready), the error is cached permanently and the application cannot recover without a restart.
Additionally, error handling has been weakened in several places - the history loader now silently skips malformed lines, and store initialization errors are never logged or returned to callers.
Recommendations
- Consider making the singleton recoverable or adding retry logic
- Add logging for skipped history entries
- Log store initialization errors at appropriate levels
- Validate cached data structure before returning it
- Handle saveToCache failures more gracefully
| continue | ||
| } | ||
|
|
||
| message, err := strconv.Unquote(line) |
There was a problem hiding this comment.
🟡 MEDIUM: Silent error handling in strconv.Unquote may cause data loss
When strconv.Unquote fails on line 243, the error is silently ignored with continue. This means corrupt or malformed lines will be skipped without logging. If the history file contains invalid JSON strings due to disk corruption, application bugs, or manual editing, users will silently lose history entries. The previous implementation with json.Decoder.Decode would have failed and returned an error on the first invalid line, alerting the user.
Recommendation: Consider at minimum logging the skipped line or counting skipped entries and returning an error if too many are skipped.
| // on the first call to NewStore. All subsequent calls return the same value. | ||
| var singleton = sync.OnceValues(func() (*Store, error) { | ||
| homeDir, err := os.UserHomeDir() | ||
| if err != nil { |
There was a problem hiding this comment.
🔴 HIGH: Singleton initialization failure is permanent and cannot recover
The singleton pattern using sync.OnceValues means that if os.UserHomeDir() or os.MkdirAll fails during the first call, the error is cached permanently. If the first call happens during startup when the home directory is temporarily unavailable (e.g., network mount not ready, Docker volume not attached), ALL future calls will fail with the same error even after the condition is resolved. The singleton cannot be reset, requiring a process restart.
For a long-running process, this is a severe availability issue.
Recommendation: Consider making the singleton recoverable by using a mutex-protected initialization that can retry on failure, or document this limitation clearly and ensure proper error handling at startup.
|
|
||
| s.db = db | ||
| s.etag = etag | ||
| return db, nil |
There was a problem hiding this comment.
🟡 MEDIUM: Cache inconsistency if saveToCache fails after updating in-memory fields
If saveToCache fails after updating s.db and s.etag in memory, the in-memory cache and disk cache are out of sync. Subsequent calls will return the cached db but the disk cache wasn't updated, so after a restart, the old cache is loaded. This leads to unnecessary API calls after restart.
Recommendation: Consider updating the in-memory cache only after successful disk write, or add logic to detect and recover from this inconsistency.
| thinkingConfigured := false | ||
|
|
||
| // Obtain the singleton store once, outside the loop. | ||
| modelsStore, modelsStoreErr := modelsdev.NewStore() |
There was a problem hiding this comment.
🟡 MEDIUM: Singleton store initialization error is silently ignored
The modelsStore is obtained once before the loop. If modelsStoreErr is not nil, it's stored but never returned to the caller or logged. The code continues and skips the models.dev lookup when modelsStoreErr != nil. If the user hasn't configured max_tokens and modelsStoreErr is not nil, the code silently falls back to defaultMaxTokens without informing the user that the models.dev lookup failed.
This makes debugging harder when users encounter unexpected token limit behavior.
Recommendation: Log the error at an appropriate level (at least debug, preferably warn) so users can diagnose issues.
| slog.Warn("Warning: failed to save to cache", "error", err) | ||
| // Save the fresh data to cache. | ||
| if saveErr := saveToCache(cacheFile, database, newETag); saveErr != nil { | ||
| slog.Warn("Failed to save to cache", "error", saveErr) |
There was a problem hiding this comment.
🟡 MEDIUM: No validation of cached data before returning after 304 response
When handling a 304 Not Modified response, the code returns cached.Database without validating that it's structurally sound. If the cache file was corrupted or partially written, json.Unmarshal might succeed but produce a Database with nil Providers, causing nil pointer panics in GetModel.
Recommendation: Add validation of the cached data structure (e.g., check that Providers is not nil) before returning it. This is good defensive programming against cache corruption.
| var fallbackModels []provider.Provider | ||
|
|
||
| // Obtain the singleton store once, outside the loop. | ||
| modelsStore, modelsStoreErr := modelsdev.NewStore() |
There was a problem hiding this comment.
🟡 MEDIUM: Store initialization error in fallback models is never logged
In getFallbackModelsForAgent, modelsStore initialization error is silently ignored. If NewStore() fails, the error is never logged. If the user has configured fallback models that depend on models.dev for max_tokens, they'll silently get defaultMaxTokens without knowing why.
This could lead to unexpected behavior where models hit token limits earlier than expected.
Recommendation: Log the error at an appropriate level (at least debug, preferably warn) to help users diagnose issues.
No description provided.