fix(ant-dev): clean up orphan anvil/antnode and stale node identities on stop#81
Open
Nic-dorman wants to merge 1 commit into
Open
fix(ant-dev): clean up orphan anvil/antnode and stale node identities on stop#81Nic-dorman wants to merge 1 commit into
Nic-dorman wants to merge 1 commit into
Conversation
… on stop ant-devnet keeps anvil alive past Testnet::new scope via std::mem::forget on the AnvilInstance, then relies on graceful Drop at process exit to clean it up. SIGTERM/SIGKILL skip destructors, so every ant dev stop leaks one anvil child and one ~/.local/share/ant/nodes/<peer_id>/ tree for each of the spawned nodes. After a handful of start/stop or killed-mid-startup cycles, the LXC accumulates orphan anvils plus 100+ stale node dirs, and subsequent ant dev start runs flake or hang. This is a workaround at the ant-dev layer (Option B in #73). The proper fix lives in ant-devnet itself (Option A: tempfile::TempDir + tokio signal handler, mirroring how ant-clients MiniTestnet and ant-nodes tests/e2e/testnet.rs already do it) and will be a separate PR against WithAutonomi/ant-node. In ant dev stop now: - pkill anvil and antnode in addition to ant-devnet - rm -rf ~/.local/share/ant/nodes and ~/.local/share/ant/spill so the next start begins from a clean state - Centralise the pkill calls into a _pkill() helper No behaviour change on Windows (the pkill / rm paths are POSIX-only). Closes #16 (local task); helps mitigate #73 (upstream).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Helps mitigate #73 (Option B path).
ant-devnetkeeps anvil alive pastTestnet::new's scope withstd::mem::forget(testnet)and relies on graceful Drop at process exit to clean it up. SIGTERM/SIGKILL skip destructors, so everyant dev stopleaks oneanvilchild and one~/.local/share/ant/nodes/<peer_id>/directory per spawned node (25 dirs on the default preset). After a handful of start/stop cycles — and especially after kill-mid-startup events — the LXC accumulates orphan anvils plus 100+ stale node dirs, and subsequentant dev startruns flake or hang.This is the Option B workaround proposed in #73 (the band-aid at the
ant-devlayer). The proper fix is Option A: changeant-devnet/main.rsto usetempfile::TempDir+ a tokio signal handler, mirroring howant-client'sMiniTestnetandant-node'stests/e2e/testnet.rsalready do it. That lives inWithAutonomi/ant-nodeand will go up as a separate PR there.Changes in
ant dev stoppkill -9 -f anvilandpkill -9 -f .../antnodein addition to the existingant-devnetpkillrm -rf ~/.local/share/ant/{nodes,spill}so the nextant dev startbegins from a clean slate_pkill()helper for readabilityNo behaviour change on Windows —
pkilland the data-dir cleanup are POSIX-only branches.Test plan
ant dev startfollowed byant dev stopleft an orphananviland 25 dirs in~/.local/share/ant/nodes/. Reproducible every run.start→stopleaves zero processes and an empty data dir: