Skip to content

feat: remote container features for cluster mode#44

Closed
Will-Luck wants to merge 9 commits into
mainfrom
dev-testing
Closed

feat: remote container features for cluster mode#44
Will-Luck wants to merge 9 commits into
mainfrom
dev-testing

Conversation

@Will-Luck

Copy link
Copy Markdown
Owner

Summary

  • Remote container logs via gRPC streaming
  • Remote rollback via update history lookup
  • Host-scoped hooks and notification preferences (fixes key collision bug)
  • Remote GHCR switch
  • Remote dependency graph with cross-host support
  • Agent auto-update with cluster setting, version comparison, and race condition fixes

Test plan

  • Gitea CI passed
  • GitHub Actions CI passed
  • Test on .60/.61/.62 cluster

🤖 Generated with Claude Code

Add FetchLogsRequest/FetchLogsResult proto messages, agent handler,
server sync method, and wire through ClusterProvider to the web API.
Remote container logs now stream via the gRPC channel instead of
returning the "not available" stub.
Add RollbackRemoteContainer to ClusterProvider interface, ClusterController
proxy, and clusterAdapter. The adapter looks up the most recent successful
update from server-side history (scoped by hostID::containerName) to find
the old image, then sends an UpdateContainerSync to the agent with that
image as the target.

The API handler (apiRollback) now checks for ?host= query param and routes
to the remote rollback path when cluster mode is active, mirroring the
pattern used by restart/stop/start/update handlers. On success, the
rollback policy is applied using the host-scoped key.
Remote containers from cluster agents are now included when building
the dependency graph, enabling cross-host dependency visualisation.
The single-container deps endpoint also supports ?host= scoping.
Add a cluster setting "Auto-update agents" that, when enabled, detects
version mismatches between the server and connected agents after each
scan cycle. When a mismatch is found, the server sends an
UpdateContainerRequest to recreate the agent's sentinel container with
the server's version tag.

- New constant SettingClusterAutoUpdateAgents in store/bolt.go
- GET/POST handlers in api_settings.go read and persist the setting
- New auto_update.go with CheckAgentVersions, replaceImageTag, baseVersion
- Post-scan callback in main.go triggers the check when enabled
- Toggle added to cluster settings UI in settings.html + settings-cluster.js
- Add sync.RWMutex to agentStream for version/features field access
- Fix replaceImageTag for digest refs and port-only registries
- Add atomic.Bool guard to prevent overlapping auto-update runs
- Snapshot containers slice in updateAgentContainer to avoid races
@Will-Luck Will-Luck closed this Mar 1, 2026
@Will-Luck Will-Luck deleted the dev-testing branch March 2, 2026 15:53
Will-Luck pushed a commit that referenced this pull request Mar 2, 2026
Security (critical):
- Constant-time comparison for webhook secret (#10) and CSRF token (#30)
- Settings key allowlist prevents arbitrary store writes (#11)
- Add oidc_client_secret to sensitive keys (#20)
- Remove shell injection vectors in self-update script (#12)
- Fatal error + rollback on network connect failure in self-update (#36)
- Deep copy from Registry.Get() prevents live state mutation (#13)

Concurrency & reliability:
- Mutex-protect gRPC stream.Send() in cluster agent (#14)
- Split MQTT WaitTimeout/Error checks in HA discovery (#15)
- MQTT auto-reconnect with persistent session (#42, #46)
- SMTP Send() now respects context with 30s timeout (#16)
- Password change invalidates all sessions, re-creates current (#17)
- Reject passwords >72 bytes (bcrypt truncation) (#18)
- Rate limiter background cleanup goroutine with Stop() (#19)
- BoltDB: collect keys before delete, never mutate during iteration (#23, #24)

Auth & crypto:
- SHA-256 hash recovery codes before storage, constant-time verify (#31)

Engine:
- Queue prune includes remote container keys (hostID::name) (#22)
- Fix unlock-before-persist race in queue Remove (#35)
- Cap SCAN_CONCURRENCY at 50 in getter and setter (#52)

Web API:
- Duration validation with time.ParseDuration before saving (#21)
- HA discovery returns JSON instead of 204 (#51)
- Consistent JSON errors in containers API (#32)
- SSRF validation for Portainer URL (#33) and registry test (#34)
- Webhook URL scheme validation (http/https only) (#47)
- Replace http.DefaultClient with 10s timeout client in releases (#48)
- Limit Portainer response body reads to 1MB (#41)
- Document session token URL pattern (#49)

Cluster:
- Random 32-byte HMAC key persisted to disk, replaces CA-derived key (#25)
- Revoke new cert if UpdateCertSerial fails (#38)
- Agent hooks audit logging with hooks_allowed config flag (#39)

Main:
- Write credentials to file instead of stdout (#43)
- Guard defers with sync.Once to prevent double-close (#44)
- 30s timeout on shutdown context (#45)

Frontend:
- Remove duplicate escapeHtml from queue/swarm/notifications, import from utils (#29, #53)
- Add single-quote escaping to portainer inline escapeHtml (#57)
- SSE reconnect triggers full dashboard reload (#27)
- Remove duplicate EventSource in cluster.html and portainer.html (#28)
- Add CSRF headers to agent.html fetch calls (#55)
- Document why auth.js/webauthn.js duplicate CSRF helpers (#54)

Closes #10, #11, #12, #13, #14, #15, #16, #17, #18, #19, #20, #21, #22,
#23, #24, #25, #27, #28, #29, #30, #31, #32, #33, #34, #35, #36, #38,
#39, #41, #42, #43, #44, #45, #46, #47, #48, #49, #51, #52, #53, #54,
#55, #57
Will-Luck pushed a commit that referenced this pull request Apr 17, 2026
Security (critical):
- Constant-time comparison for webhook secret (#10) and CSRF token (#30)
- Settings key allowlist prevents arbitrary store writes (#11)
- Add oidc_client_secret to sensitive keys (#20)
- Remove shell injection vectors in self-update script (#12)
- Fatal error + rollback on network connect failure in self-update (#36)
- Deep copy from Registry.Get() prevents live state mutation (#13)

Concurrency & reliability:
- Mutex-protect gRPC stream.Send() in cluster agent (#14)
- Split MQTT WaitTimeout/Error checks in HA discovery (#15)
- MQTT auto-reconnect with persistent session (#42, #46)
- SMTP Send() now respects context with 30s timeout (#16)
- Password change invalidates all sessions, re-creates current (#17)
- Reject passwords >72 bytes (bcrypt truncation) (#18)
- Rate limiter background cleanup goroutine with Stop() (#19)
- BoltDB: collect keys before delete, never mutate during iteration (#23, #24)

Auth & crypto:
- SHA-256 hash recovery codes before storage, constant-time verify (#31)

Engine:
- Queue prune includes remote container keys (hostID::name) (#22)
- Fix unlock-before-persist race in queue Remove (#35)
- Cap SCAN_CONCURRENCY at 50 in getter and setter (#52)

Web API:
- Duration validation with time.ParseDuration before saving (#21)
- HA discovery returns JSON instead of 204 (#51)
- Consistent JSON errors in containers API (#32)
- SSRF validation for Portainer URL (#33) and registry test (#34)
- Webhook URL scheme validation (http/https only) (#47)
- Replace http.DefaultClient with 10s timeout client in releases (#48)
- Limit Portainer response body reads to 1MB (#41)
- Document session token URL pattern (#49)

Cluster:
- Random 32-byte HMAC key persisted to disk, replaces CA-derived key (#25)
- Revoke new cert if UpdateCertSerial fails (#38)
- Agent hooks audit logging with hooks_allowed config flag (#39)

Main:
- Write credentials to file instead of stdout (#43)
- Guard defers with sync.Once to prevent double-close (#44)
- 30s timeout on shutdown context (#45)

Frontend:
- Remove duplicate escapeHtml from queue/swarm/notifications, import from utils (#29, #53)
- Add single-quote escaping to portainer inline escapeHtml (#57)
- SSE reconnect triggers full dashboard reload (#27)
- Remove duplicate EventSource in cluster.html and portainer.html (#28)
- Add CSRF headers to agent.html fetch calls (#55)
- Document why auth.js/webauthn.js duplicate CSRF helpers (#54)

Closes #10, #11, #12, #13, #14, #15, #16, #17, #18, #19, #20, #21, #22,
#23, #24, #25, #27, #28, #29, #30, #31, #32, #33, #34, #35, #36, #38,
#39, #41, #42, #43, #44, #45, #46, #47, #48, #49, #51, #52, #53, #54,
#55, #57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants