Skip to content

fix(ccem-agent): resolve memory leak and main-thread stall#3

Merged
peguesj merged 3 commits intomainfrom
fix/ccem-agent-memory-leak
Mar 16, 2026
Merged

fix(ccem-agent): resolve memory leak and main-thread stall#3
peguesj merged 3 commits intomainfrom
fix/ccem-agent-memory-leak

Conversation

@peguesj
Copy link
Copy Markdown
Owner

@peguesj peguesj commented Mar 16, 2026

Summary

Fixes three root causes that caused CCEMAgent to consume 74.9% CPU and grow to 2.5GB memory footprint over multi-day sessions, causing periodic macOS menubar UI stalls.

Root Causes & Fixes

# Component Root Cause Fix
1 APMClient.swift URLSessionConfiguration.default accumulated ~170k cached HTTP responses to disk across 4-day sessions, growing to 2.5GB Switch to .ephemeral config; add explicit 5s/10s timeouts
2 MultiServerManager.swift New APMClient (and URLSession) created on every checkHealth() call Cache one APMClient per server UUID; prune on server removal
3 APMServerManager.swift checkRunning() called launchctl/kill via Process.waitUntilExit() on @MainActor, blocking the UI thread Dispatch to Task.detached(priority: .utility); await result back on main actor
4 EnvironmentMonitor.swift seenNotificationIds Set grows unbounded over long sessions Cap at 2000 entries, trim to 1000

Files Changed

  • CCEMAgent/Sources/CCEMAgent/CCEMAgentApp.swift — await async checkRunning() callers
  • CCEMAgent/Sources/CCEMAgent/Services/APMClient.swift — ephemeral URLSession config
  • CCEMAgent/Sources/CCEMAgent/Services/APMServerManager.swift — async checkRunning via Task.detached
  • CCEMAgent/Sources/CCEMAgent/Services/EnvironmentMonitor.swift — cap seenNotificationIds
  • CCEMAgent/Sources/CCEMAgent/Services/MultiServerManager.swift — reuse APMClient per server

Before / After

Metric Before After
CPU (idle) 74.9% 0%
RSS 143 MB 57 MB
Physical footprint (peak) 2.5 GB 57 MB

Test Plan

  • Build CCEMAgent: bash CCEMAgent/build-app.sh — zero errors
  • Launch menubar app; confirm 0% CPU at idle
  • Run for 24+ hours; confirm memory stays below 100 MB RSS
  • Confirm APM server connect/disconnect notifications still fire

peguesj and others added 2 commits March 12, 2026 08:11
…-025)

Adds interactive SVG architecture diagram showing @ccem/core and @ccem/apm
package ecosystem with consumers, GenServer topology, and data flow edges.
Showcase.js updated with System/npm tab switching in architecture panel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three root causes identified and fixed:

1. URLSession cache accumulation (APMClient.swift)
   - Switch URLSessionConfiguration from .default to .ephemeral
   - Default config accumulated ~170k cached HTTP responses to disk
   - over 4-day sessions, growing to 2.5GB physical footprint
   - Add explicit 5s request / 10s resource timeouts

2. URLSession per-call churn (MultiServerManager.swift)
   - Cache one APMClient instance per server UUID instead of
     creating a new URLSession on every checkHealth() call
   - Prune stale clients when servers are removed

3. Synchronous shell execution on main actor (APMServerManager.swift)
   - checkRunning() called launchctl/kill via Process.waitUntilExit()
     directly on @mainactor, blocking the UI thread
   - Dispatched to Task.detached(priority: .utility) — shell work
     now runs on a background thread; result promoted back to main actor
   - Updated callers in CCEMAgentApp.swift to await the async call

4. Unbounded seenNotificationIds Set (EnvironmentMonitor.swift)
   - Cap Set at 2000 entries, trim to 1000 to prevent unbounded growth
     in long-running sessions

Result: 0% CPU idle, 57MB RSS vs 74.9% CPU / 2.5GB peak before fix
- showcase/client/showcase.js: APM_BASE reads window.CCEM_APM_BASE_URL || 'http://localhost:3032'
- apm-v4: bump submodule pointer to include /docs/upm/status endpoint
@peguesj peguesj merged commit 392f7e4 into main Mar 16, 2026
@peguesj peguesj deleted the fix/ccem-agent-memory-leak branch March 16, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant