Skip to content

Add uwsgi-style process manager with heartbeat watchdog#282

Draft
eyalr wants to merge 2 commits intomasterfrom
implement-uwsgi-style-process-management
Draft

Add uwsgi-style process manager with heartbeat watchdog#282
eyalr wants to merge 2 commits intomasterfrom
implement-uwsgi-style-process-management

Conversation

@eyalr
Copy link
Copy Markdown
Contributor

@eyalr eyalr commented Apr 28, 2026

Summary

Adds a robust process supervision layer to the PySOA standalone server, modeled after uwsgi's worker management approach. The parent process now monitors child workers via a shared-memory heartbeat and automatically kills any worker that hangs mid-request, then respawns a replacement.

What changed

standalone.py — process manager overhaul

  • Changed the default --fork-processes from 0 (no forking) to 1 (always fork at least one worker). Use --fork 0 to run directly in the current process.
  • New CLI arguments:
    • --ping-timeout (default 10 s): per-request watchdog; kills a child that hasn't pinged within this window.
    • --startup-timeout (default 60 s): how long a freshly-spawned child has to send its first ping before being killed.
    • --process-shutdown-timeout (default 30 s): grace period after SIGTERM before escalating to SIGKILL on shutdown.
  • _ProcessMonitor now uses a short-polling loop (join(timeout=1s)) instead of a blocking join(), enabling timely reaction to shutdown signals and hung-worker detection.
  • Added _kill_process() helper for SIGKILL with logging and a final join().
  • Ping state (timestamp + process start time) is reset in _start_process() before every (re)spawn to prevent a stale timestamp from a crashed child from instantly killing the replacement.

server.py — child-side heartbeat

  • Added _ping_timestamp (a multiprocessing.Value('d') supplied by the parent) and _ping_parent() which writes time.monotonic() into it.
  • handle_next_request() calls _ping_parent() at three points: on MessageReceiveTimeout (idle), immediately after a request is dequeued, and in the finally block after the request completes.
  • New _validate_receive_timeout_vs_ping_timeout() classmethod raises ValueError at startup if the Redis transport's receive_timeout_in_seconds exceeds ping_timeout / 2, which would cause spurious kills of healthy workers blocked in a long BLPOP.
  • main() accepts explicit _ping_timestamp and _ping_timeout keyword arguments so the values are available in spawn-mode child processes (Windows, macOS Python 3.13+) where sys.argv is reset.

Test coverage

  • test_handle_next_request.py: _ping_parent no-op without shared memory, timestamp monotonicity, ping calls on idle/receive/complete/exception, receive-timeout validation (boundary conditions, Redis-default fallback, non-Redis skip), _ping_timeout kwarg priority over sys.argv.
  • test_standalone.py: default fork count, CLI argument defaults, _ProcessMonitor watchdog (stale ping → SIGKILL, fresh ping → no kill, SIGTERM → SIGKILL escalation, graceful exit → no SIGKILL, respawn after ping-kill, stale-timestamp-from-dead-child doesn't kill replacement, startup timeout, startup window protection, ping timeout takes over after first ping).

@eyalr eyalr changed the title Improve resilience by adding a uwsgi-style process manager Add uwsgi-style process manager with heartbeat watchdog Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant