Optimize start time by dgageot · Pull Request #2138 · docker/docker-agent

dgageot · 2026-03-16T21:11:08Z

No description provided.

Make modelsdev.Store a process-wide singleton via sync.OnceValues so the models.dev database is loaded at most once and shared across all callers (teamloader, runtime, bedrock, embedding, config tests). Key changes: - NewStore() returns the same *Store instance for the entire process - Add ETag support: store the ETag from models.dev responses and send If-None-Match on refresh; a 304 Not Modified skips the download - Switch from json.Decoder to ReadFile/ReadAll + json.Unmarshal to avoid the decoder's intermediate buffering overhead - Hoist store creation out of per-model loops in teamloader to make the singleton intent explicit and handle errors gracefully - Persist ETag in the cache file for reuse across process restarts This eliminates ~50-60MB of redundant heap allocations per run (the 3MB JSON was being parsed into ~66MB of Go objects up to 6 times). Assisted-By: docker-agent

Replace json.NewDecoder streaming decode with os.ReadFile + string splitting + strconv.Unquote. The history file stores one JSON-quoted string per line; Unquote handles the same escapes without the full JSON state machine and reflection overhead. Pre-size slices by counting newlines upfront and use map[string]struct{} for deduplication to reduce allocation count. Benchmarked on a 658KB / 7348-line history file: Before: 3.05ms, 15703 allocs, 2.32MB After: 1.41ms, 2606 allocs, 2.96MB Assisted-By: docker-agent

docker-agent

Review Summary

Assessment: 🔴 CRITICAL

This PR introduces performance optimizations through singleton patterns and optimized parsing. While the optimizations are well-intentioned, there are several issues that need attention:

Critical Issues (1):

Singleton initialization failure is permanent and cannot recover

Notable Issues (5):

Silent error handling may cause data loss in history
Cache inconsistency if disk write fails
Store initialization errors are silently ignored (2 instances)
No validation of cached data structure

Key Findings

The singleton pattern for the models.dev store introduces a critical availability issue: if initialization fails during a transient condition (e.g., network mount not ready), the error is cached permanently and the application cannot recover without a restart.

Additionally, error handling has been weakened in several places - the history loader now silently skips malformed lines, and store initialization errors are never logged or returned to callers.

Recommendations

Consider making the singleton recoverable or adding retry logic
Add logging for skipped history entries
Log store initialization errors at appropriate levels
Validate cached data structure before returning it
Handle saveToCache failures more gracefully

docker-agent · 2026-03-16T21:16:39Z

pkg/history/history.go

+			continue
+		}
+
+		message, err := strconv.Unquote(line)


🟡 MEDIUM: Silent error handling in strconv.Unquote may cause data loss

When strconv.Unquote fails on line 243, the error is silently ignored with continue. This means corrupt or malformed lines will be skipped without logging. If the history file contains invalid JSON strings due to disk corruption, application bugs, or manual editing, users will silently lose history entries. The previous implementation with json.Decoder.Decode would have failed and returned an error on the first invalid line, alerting the user.

Recommendation: Consider at minimum logging the skipped line or counting skipped entries and returning an error if too many are skipped.

docker-agent · 2026-03-16T21:16:39Z

pkg/modelsdev/store.go

+// on the first call to NewStore. All subsequent calls return the same value.
+var singleton = sync.OnceValues(func() (*Store, error) {
 	homeDir, err := os.UserHomeDir()
 	if err != nil {


🔴 HIGH: Singleton initialization failure is permanent and cannot recover

The singleton pattern using sync.OnceValues means that if os.UserHomeDir() or os.MkdirAll fails during the first call, the error is cached permanently. If the first call happens during startup when the home directory is temporarily unavailable (e.g., network mount not ready, Docker volume not attached), ALL future calls will fail with the same error even after the condition is resolved. The singleton cannot be reset, requiring a process restart.

For a long-running process, this is a severe availability issue.

Recommendation: Consider making the singleton recoverable by using a mutex-protected initialization that can retry on failure, or document this limitation clearly and ensure proper error handling at startup.

docker-agent · 2026-03-16T21:16:39Z

pkg/modelsdev/store.go


 	s.db = db
+	s.etag = etag
 	return db, nil


🟡 MEDIUM: Cache inconsistency if saveToCache fails after updating in-memory fields

If saveToCache fails after updating s.db and s.etag in memory, the in-memory cache and disk cache are out of sync. Subsequent calls will return the cached db but the disk cache wasn't updated, so after a restart, the old cache is loaded. This leads to unnecessary API calls after restart.

Recommendation: Consider updating the in-memory cache only after successful disk write, or add logic to detect and recover from this inconsistency.

docker-agent · 2026-03-16T21:16:39Z

pkg/teamloader/teamloader.go

 	thinkingConfigured := false

+	// Obtain the singleton store once, outside the loop.
+	modelsStore, modelsStoreErr := modelsdev.NewStore()


🟡 MEDIUM: Singleton store initialization error is silently ignored

The modelsStore is obtained once before the loop. If modelsStoreErr is not nil, it's stored but never returned to the caller or logged. The code continues and skips the models.dev lookup when modelsStoreErr != nil. If the user hasn't configured max_tokens and modelsStoreErr is not nil, the code silently falls back to defaultMaxTokens without informing the user that the models.dev lookup failed.

This makes debugging harder when users encounter unexpected token limit behavior.

Recommendation: Log the error at an appropriate level (at least debug, preferably warn) so users can diagnose issues.

docker-agent · 2026-03-16T21:16:39Z

pkg/modelsdev/store.go

-		slog.Warn("Warning: failed to save to cache", "error", err)
+	// Save the fresh data to cache.
+	if saveErr := saveToCache(cacheFile, database, newETag); saveErr != nil {
+		slog.Warn("Failed to save to cache", "error", saveErr)


🟡 MEDIUM: No validation of cached data before returning after 304 response

When handling a 304 Not Modified response, the code returns cached.Database without validating that it's structurally sound. If the cache file was corrupted or partially written, json.Unmarshal might succeed but produce a Database with nil Providers, causing nil pointer panics in GetModel.

Recommendation: Add validation of the cached data structure (e.g., check that Providers is not nil) before returning it. This is good defensive programming against cache corruption.

docker-agent · 2026-03-16T21:16:39Z

pkg/teamloader/teamloader.go

 	var fallbackModels []provider.Provider

+	// Obtain the singleton store once, outside the loop.
+	modelsStore, modelsStoreErr := modelsdev.NewStore()


🟡 MEDIUM: Store initialization error in fallback models is never logged

In getFallbackModelsForAgent, modelsStore initialization error is silently ignored. If NewStore() fails, the error is never logged. If the user has configured fallback models that depend on models.dev for max_tokens, they'll silently get defaultMaxTokens without knowing why.

This could lead to unexpected behavior where models hit token limits earlier than expected.

Recommendation: Log the error at an appropriate level (at least debug, preferably warn) to help users diagnose issues.

dgageot added 2 commits March 16, 2026 21:44

dgageot requested a review from a team as a code owner March 16, 2026 21:11

docker-agent bot reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize start time#2138

Optimize start time#2138
dgageot wants to merge 2 commits intodocker:mainfrom
dgageot:board/run-docker-agent-run-examples-gopher-yam-ecbcff53

dgageot commented Mar 16, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

docker-agent bot Mar 16, 2026

Uh oh!

docker-agent bot Mar 16, 2026

Uh oh!

docker-agent bot Mar 16, 2026

Uh oh!

docker-agent bot Mar 16, 2026

Uh oh!

docker-agent bot Mar 16, 2026

Uh oh!

docker-agent bot Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dgageot commented Mar 16, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🔴 CRITICAL

Key Findings

Recommendations

Uh oh!

docker-agent bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant