You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The agent creates its long-lived background threads (and makes blocking network calls) during application boot. On a forking app server with preload_app!, those threads are started in the master process before it forks workers. This is unsafe and produces intermittent, hard-to-diagnose boot failures, and in thread/pid/memory-constrained environments it can fail thread allocation outright. This issue documents the observed behaviors and proposes fixes.
Observed behaviors
1. Threads are started in the preloading master, before fork
The Railtie eagerly calls Agent#install → start during Rails initialization (lib/scout_apm.rb:221-228). Under a forking server with preload_app!, this runs in the master during preload. start then unconditionally spawns:
the AppServerLoad thread, which makes a blocking HTTP POST (lib/scout_apm/agent.rb:82 → lib/scout_apm/app_server_load.rb:12)
the metrics background worker thread (lib/scout_apm/agent.rb:84)
the error-service background worker thread (lib/scout_apm/agent.rb:85)
the BackgroundRecorder thread when async_recording: true (lib/scout_apm/agent_context.rb:236-243 → lib/scout_apm/background_recorder.rb:21)
Forking a process that has live threads is unsafe: only the calling thread survives in the child, and any lock held by another thread at fork() time (resolver, OpenSSL, malloc arena, Logger mutex, etc.) is inherited locked with no owner. The result is intermittent worker boot deadlocks — the worker never finishes booting / the server never reaches a listening state, so deploys behind a health check hang and roll back. It is timing-dependent (a retry sometimes succeeds), which matches a fork/thread race. Disabling monitoring (monitor: false) eliminates it entirely.
start_background_worker? already exists and returns !forking? (lib/scout_apm/agent.rb:131-134), but start does not consult it, and the Railtie calls start regardless.
2. ThreadError: can't alloc thread in constrained processes
In a non-forking but thread/pid/memory-constrained process (e.g. a background-job container running as PID 1), agent startup can fail with:
INFO : Failed Sending Application Startup Info - can't alloc thread
can't alloc thread is EAGAIN from pthread_create — the process hit RLIMIT_NPROC, the cgroup pids.max, or ran out of memory for a new thread stack. Because the agent adds ~3 always-on threads on top of the host application's own thread pool, it can push a near-ceiling process over the edge — and starve the host application of the threads it needs to start, so the failure is not isolated to the agent.
3. App-server detection silently degrades to null
Puma is only detected when the process name starts with puma (lib/scout_apm/server_integrations/puma.rb:23: File.basename($0) =~ /\Apuma/). When the server is launched via bin/rails server, $0 is rails, no integration matches, and detection falls through to Null (lib/scout_apm/server_integrations/null.rb), which reports forking? => false and installs no before_worker_boot hook.
The before_worker_boot hook that is supposed to (re)start the worker post-fork is never registered, so a forked worker only starts the agent lazily on first request.
The agent is nonetheless started in the master because PRECONDITION_DETECTED_SERVER passes whenever any app-server or background-job integration is found (lib/scout_apm/agent/preconditions.rb:27-36) — e.g. when a background-job framework is present — independent of the null app-server result.
4. No HTTP timeouts on reporting (related)
Reporter#http builds its Net::HTTP client with no open_timeout/read_timeout (lib/scout_apm/reporter.rb:121-133), so a slow/unreachable host leaves a reporting thread blocked inside a native call for up to Net::HTTP's 60s default — widening the dangerous window in (1) and stalling graceful shutdown (the at_exit handler joins the worker thread). Addressed in #617.
5. The error-service thread starts unconditionally
start_error_service_background_worker is called from start (lib/scout_apm/agent.rb:85) and is not gated by errors_enabled (lib/scout_apm/agent.rb:206-215). The thread is created even when the error service is disabled, contributing to (2).
Proposed fixes
Do not create background threads (or make network calls) during app boot in a forking/preloading master. Defer all thread creation to the post-fork hook so threads only ever exist in a process that will not fork again. Honor the existing start_background_worker?/forking? signal in start, and have the Railtie defer when running under a forking server.
Make forking detection reliable. Don't rely solely on $0 to detect Puma — also detect when running under Puma (e.g. defined?(::Puma) + cluster/preload context) so forking? is correct regardless of launcher, and ensure the before_worker_boot hook is installed in that case. At minimum, treat "preloaded app, app server unknown" conservatively (defer thread start).
Reduce always-on thread footprint and make threads lazy.
Gate the error-service worker behind errors_enabled (Heroku FTW #5).
Consider not spawning reporting threads until there is data to report.
Guard Thread.new call sites so a ThreadError is logged and survivable rather than surfacing as an opaque failure.
With a forking app server + preload_app!, no agent threads are started in the master; Puma emits no "Detected N Thread(s) started in app boot" warning naming scout_apm.
Worker boot is deterministic (no fork/thread race); repeated deploys succeed.
Agent startup degrades gracefully (logs and continues) if a thread cannot be allocated.
forking? is correct under Puma regardless of launch command.
Summary
The agent creates its long-lived background threads (and makes blocking network calls) during application boot. On a forking app server with
preload_app!, those threads are started in the master process before it forks workers. This is unsafe and produces intermittent, hard-to-diagnose boot failures, and in thread/pid/memory-constrained environments it can fail thread allocation outright. This issue documents the observed behaviors and proposes fixes.Observed behaviors
1. Threads are started in the preloading master, before fork
The Railtie eagerly calls
Agent#install→startduring Rails initialization (lib/scout_apm.rb:221-228). Under a forking server withpreload_app!, this runs in the master during preload.startthen unconditionally spawns:AppServerLoadthread, which makes a blocking HTTP POST (lib/scout_apm/agent.rb:82→lib/scout_apm/app_server_load.rb:12)lib/scout_apm/agent.rb:84)lib/scout_apm/agent.rb:85)BackgroundRecorderthread whenasync_recording: true(lib/scout_apm/agent_context.rb:236-243→lib/scout_apm/background_recorder.rb:21)Puma detects this and warns:
Forking a process that has live threads is unsafe: only the calling thread survives in the child, and any lock held by another thread at
fork()time (resolver, OpenSSL, malloc arena,Loggermutex, etc.) is inherited locked with no owner. The result is intermittent worker boot deadlocks — the worker never finishes booting / the server never reaches a listening state, so deploys behind a health check hang and roll back. It is timing-dependent (a retry sometimes succeeds), which matches a fork/thread race. Disabling monitoring (monitor: false) eliminates it entirely.start_background_worker?already exists and returns!forking?(lib/scout_apm/agent.rb:131-134), butstartdoes not consult it, and the Railtie callsstartregardless.2.
ThreadError: can't alloc threadin constrained processesIn a non-forking but thread/pid/memory-constrained process (e.g. a background-job container running as PID 1), agent startup can fail with:
can't alloc threadisEAGAINfrompthread_create— the process hitRLIMIT_NPROC, the cgrouppids.max, or ran out of memory for a new thread stack. Because the agent adds ~3 always-on threads on top of the host application's own thread pool, it can push a near-ceiling process over the edge — and starve the host application of the threads it needs to start, so the failure is not isolated to the agent.3. App-server detection silently degrades to
nullPuma is only detected when the process name starts with
puma(lib/scout_apm/server_integrations/puma.rb:23:File.basename($0) =~ /\Apuma/). When the server is launched viabin/rails server,$0israils, no integration matches, and detection falls through toNull(lib/scout_apm/server_integrations/null.rb), which reportsforking? => falseand installs nobefore_worker_boothook.Consequences:
forking?is wrong, so even aforking?-aware deferral (fix context cleanup, SlowTransaction #1 below) would not trigger.before_worker_boothook that is supposed to (re)start the worker post-fork is never registered, so a forked worker only starts the agent lazily on first request.The agent is nonetheless started in the master because
PRECONDITION_DETECTED_SERVERpasses whenever any app-server or background-job integration is found (lib/scout_apm/agent/preconditions.rb:27-36) — e.g. when a background-job framework is present — independent of thenullapp-server result.4. No HTTP timeouts on reporting (related)
Reporter#httpbuilds itsNet::HTTPclient with noopen_timeout/read_timeout(lib/scout_apm/reporter.rb:121-133), so a slow/unreachable host leaves a reporting thread blocked inside a native call for up to Net::HTTP's 60s default — widening the dangerous window in (1) and stalling graceful shutdown (theat_exithandler joins the worker thread). Addressed in #617.5. The error-service thread starts unconditionally
start_error_service_background_workeris called fromstart(lib/scout_apm/agent.rb:85) and is not gated byerrors_enabled(lib/scout_apm/agent.rb:206-215). The thread is created even when the error service is disabled, contributing to (2).Proposed fixes
Do not create background threads (or make network calls) during app boot in a forking/preloading master. Defer all thread creation to the post-fork hook so threads only ever exist in a process that will not fork again. Honor the existing
start_background_worker?/forking?signal instart, and have the Railtie defer when running under a forking server.Make forking detection reliable. Don't rely solely on
$0to detect Puma — also detect when running under Puma (e.g.defined?(::Puma)+ cluster/preload context) soforking?is correct regardless of launcher, and ensure thebefore_worker_boothook is installed in that case. At minimum, treat "preloaded app, app server unknown" conservatively (defer thread start).Reduce always-on thread footprint and make threads lazy.
errors_enabled(Heroku FTW #5).Thread.newcall sites so aThreadErroris logged and survivable rather than surfacing as an opaque failure.Set HTTP timeouts on the reporting connection (Set timeouts on the reporting HTTP connection #617), and bound the background-worker
joinon shutdown so drain cannot hang.Acceptance
preload_app!, no agent threads are started in the master; Puma emits no "Detected N Thread(s) started in app boot" warning namingscout_apm.forking?is correct under Puma regardless of launch command.