Skip to content

Audit session/container metrics for drift after exec cancel path lands (#3) #11

@Gandalf-Le-Dev

Description

@Gandalf-Le-Dev

Problem

Before #3, `manager.Exec` could park indefinitely on SSH disconnect when the in-container process was silent, meaning:

  • `defer s.manager.SessionDisconnect(...)` at `gateway/server.go:329` never ran.
  • `metrics.ActiveSessionsTotal` / `metrics.BoxActiveSessions` never decremented.
  • Idle timer never armed; container stayed up forever.

After #3, the cancel path forces `Exec` to return on ctx cancel, so in theory all these should now decrement correctly. This issue is to verify that claim empirically and close any remaining gaps.

Scope

  • Deploy (or local-run) hopboxd with the Handle SIGWINCH cleanly #3 fix.
  • Scrape `/metrics` before and after a disconnect-heavy workload: connect, disconnect, reconnect, repeat many times, across TTY and non-TTY sessions (scp, one-shot ssh commands, zellij attach, etc.).
  • Verify gauges return to zero when all sessions close.
  • Verify idle timer fires for real on last-disconnect.
  • If drift remains, identify the leak path (e.g. forced SIGKILL of hopboxd still leaves metric counters unnerfed, but that's expected; what about graceful paths?).

Out of scope

  • Rewriting metric plumbing.
  • Adding new metrics — only audit and fix drift in existing ones.

Effort

Medium. Mostly runtime verification; small code fixes if any leak is found.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions