feat: safe scripted bot restart (#100) by fitz123 · Pull Request #105 · fitz123/claude-code-bot

fitz123 · 2026-04-19T12:28:57Z

Summary

New bot/scripts/restart-bot.sh — single supported restart entry point
- Default mode: graceful SIGTERM, waits up to 60s drain, returns new PID
- --plist mode: race-free bootout+bootstrap that reflects on-disk plist after success
- Pre-restart config validation aborts without sending SIGTERM
- Never SIGKILL, never kickstart -k
- Tolerates variable drain time without fixed sleep
- HOME fallback guard for restricted environments
New bot/src/__tests__/restart-bot.test.ts — 13 tests covering graceful path, plist mode, slow-shutdown stub (5s drain), broken-config abort, malformed plist
.claude/rules/platform/bot-operations.md and .claude/skills/bot-operations/SKILL.md — operators use the script
README.md — operator docs point at the script

Rebased clean onto main after PR #104 merged. Supersedes closed PR #103. Contains only the restart-bot specific changes (media-store changes from the original 103 branch were superseded by PR #104's in-flight protection).

Closes #100

Test plan

npx tsx --test bot/src/__tests__/restart-bot.test.ts (13/13 pass)
cd bot && npx tsc --noEmit
shellcheck bot/scripts/restart-bot.sh clean
codex + claude + copilot reviews: addressed

Single supported restart entry point for the launchd-managed bot. - bot/scripts/restart-bot.sh - Default mode: graceful SIGTERM, waits up to 60s drain, returns new PID - --plist mode: race-free bootout+bootstrap that reflects on-disk plist - Pre-restart config validation aborts without sending SIGTERM - Never SIGKILL, never kickstart -k - Tolerates variable drain time without fixed sleep - HOME fallback guard for restricted environments - bot/src/__tests__/restart-bot.test.ts: 13 tests covering both modes, slow-shutdown drain (5s), broken-config abort, malformed plist - .claude/rules/platform/bot-operations.md + bot-operations skill: operators use restart-bot.sh instead of raw launchctl - README.md: operator docs point at the script Closes #100 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a single supported restart-bot.sh entry point for safe launchd restarts (graceful by default, race-free plist reload via --plist), wires operator documentation to use it, and adds automated tests for the restart behavior.

Changes:

Added bot/scripts/restart-bot.sh to perform graceful SIGTERM restarts and safe bootout+bootstrap restarts for plist-on-disk changes, with config validation and polling (no fixed sleeps).
Added bot/src/__tests__/restart-bot.test.ts to cover graceful/--plist paths, slow shutdown, config validation aborts, and malformed plist handling.
Updated operator docs (.claude/*, README.md) to direct restarts through the script.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`bot/scripts/restart-bot.sh`	New canonical restart script implementing graceful + plist restart flows with validation and polling.
`bot/src/__tests__/restart-bot.test.ts`	New test harness using mocked `launchctl`/`plutil` to validate script behavior across scenarios.
`.claude/rules/platform/bot-operations.md`	Updates operator rule to use the script and explains why `--plist` is required for plist edits.
`.claude/skills/bot-operations/SKILL.md`	Updates operational skill instructions to use the script and preserves safety warnings.
`README.md`	Updates restart documentation to use `bot/scripts/restart-bot.sh` and `--plist`.
`docs/plans/2026-04-18-issue-100-safe-restart.md`	Planning/trace doc for the change and addressed prior review items.
`docs/plans/completed/2026-04-18-issue-100-safe-restart.md`	Completed plan record for the same work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T12:33:17Z

+  local plist_label
+  if ! plist_label=$("$PLUTIL_BIN" -extract Label raw "$BOT_PLIST" 2>/dev/null); then
+    err "plist is missing 'Label' key: $BOT_PLIST"
+    return 1
+  fi
+  if [ "$plist_label" != "$BOT_LABEL" ]; then
+    err "plist Label '$plist_label' does not match expected '$BOT_LABEL'"
+    return 1


plutil -extract is invoked without specifying an output target. On macOS, plutil -extract <key> raw expects -o - (stdout) or an output file; otherwise it fails and will make --plist mode abort as if the Label key were missing. Update the command to include -o - (and adjust the mock plutil + tests accordingly).

Copilot · 2026-04-19T12:33:17Z

+    file="\${4:-}"
+    if [ "\$key" != "Label" ] || [ "\$fmt" != "raw" ] || [ -z "\$file" ] || [ ! -f "\$file" ]; then


The mock plutil fixture only supports -extract Label raw <file> and will not match the real plutil -extract CLI once the script is corrected to pass -o -. Update this mock to accept (and ideally assert) the real argument pattern so the tests remain representative of production behavior.

Suggested change

file="\${4:-}"

if [ "\$key" != "Label" ] || [ "\$fmt" != "raw" ] || [ -z "\$file" ] || [ ! -f "\$file" ]; then

out_flag="\${4:-}"

out_target="\${5:-}"

file="\${6:-}"

if [ "\$key" != "Label" ] || [ "\$fmt" != "raw" ] || [ "\$out_flag" != "-o" ] || [ "\$out_target" != "-" ] || [ -z "\$file" ] || [ ! -f "\$file" ]; then

Copilot · 2026-04-19T12:33:17Z

+if [ -z "${HOME:-}" ]; then
+  if command -v dscl >/dev/null 2>&1; then
+    HOME="$(dscl . -read "/Users/$(id -un)" NFSHomeDirectory 2>/dev/null | awk '{print $2}')"
+  fi
+fi
+if [ -z "${HOME:-}" ]; then
+  if command -v getent >/dev/null 2>&1; then
+    HOME="$(getent passwd "$(id -un)" 2>/dev/null | cut -d: -f6)"
+  fi
+fi
+export HOME
+if [ -z "$HOME" ]; then
+  echo "[restart-bot] Error: could not determine HOME from environment or fallback lookups" >&2
+  exit 1
+fi


The script exits early if it cannot determine HOME, but this happens before argument parsing. That means restart-bot.sh --help (and unknown-arg usage) can fail in restricted environments even though help output shouldn’t depend on HOME. Consider parsing --help/invalid args before the HOME fallback/guard (or only requiring HOME when actually needed for default plist path resolution).

Copilot · 2026-04-19T12:33:17Z

+  if ! old_pid=$(get_pid); then
+    err "service $BOT_LABEL is not registered with launchd; run: restart-bot.sh --plist"


get_pid documents exit status 2 as “launchctl query failed (unknown state)”, but graceful_restart treats any non-zero from get_pid as “service not registered”. This can misdiagnose transient/permission errors from launchctl list. Handle rc=2 explicitly (e.g., emit a distinct error like “launchctl list failed” and abort) so operators aren’t pointed at --plist when the real issue is an unreadable launchctl state.

Suggested change

if ! old_pid=$(get_pid); then

err "service $BOT_LABEL is not registered with launchd; run: restart-bot.sh --plist"

local get_pid_rc=0

if old_pid=$(get_pid); then

:

else

get_pid_rc=$?

case "$get_pid_rc" in

1)

err "service $BOT_LABEL is not registered with launchd; run: restart-bot.sh --plist"

;;

2)

err "launchctl list failed while querying service $BOT_LABEL; unable to determine launchd state"

;;

*)

err "failed to query PID for service $BOT_LABEL (get_pid exit status: $get_pid_rc)"

;;

esac

Copilot · 2026-04-19T12:33:18Z

+  if is_registered; then
+    log "Unregistering $SERVICE (launchctl bootout)…"
+    # bootout may return non-zero even when the teardown is in progress;
+    # we rely on polling below, not the exit code.
+    "$LAUNCHCTL_BIN" bootout "$SERVICE" >/dev/null 2>&1 || true


is_registered treats get_pid rc=2 (launchctl query failure/unknown state) the same as “not registered”. In --plist mode that can cause the script to skip bootout and go straight to bootstrap against a potentially still-registered service (often EIO). Consider detecting rc=2 explicitly and aborting with a clear “launchctl list failed” diagnostic instead of proceeding.

Copilot AI review requested due to automatic review settings April 19, 2026 12:28

Copilot started reviewing on behalf of fitz123 April 19, 2026 12:29 View session

fitz123 force-pushed the issue-100-safe-restart branch from 20c5632 to 7064772 Compare April 19, 2026 12:30

fitz123 force-pushed the issue-100-safe-restart branch from 7064772 to 40a1533 Compare April 19, 2026 12:30

fitz123 merged commit 32e386f into main Apr 19, 2026
1 check passed

fitz123 deleted the issue-100-safe-restart branch April 19, 2026 12:31

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: safe scripted bot restart (#100)#105

feat: safe scripted bot restart (#100)#105
fitz123 merged 1 commit intomainfrom
issue-100-safe-restart

fitz123 commented Apr 19, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		file="\${4:-}"
		if [ "\$key" != "Label" ] \|\| [ "\$fmt" != "raw" ] \|\| [ -z "\$file" ] \|\| [ ! -f "\$file" ]; then

-    file="\${4:-}"
-    if [ "\$key" != "Label" ] || [ "\$fmt" != "raw" ] || [ -z "\$file" ] || [ ! -f "\$file" ]; then
+    out_flag="\${4:-}"
+    out_target="\${5:-}"
+    file="\${6:-}"
+    if [ "\$key" != "Label" ] || [ "\$fmt" != "raw" ] || [ "\$out_flag" != "-o" ] || [ "\$out_target" != "-" ] || [ -z "\$file" ] || [ ! -f "\$file" ]; then

		if ! old_pid=$(get_pid); then
		err "service $BOT_LABEL is not registered with launchd; run: restart-bot.sh --plist"

-  if ! old_pid=$(get_pid); then
-    err "service $BOT_LABEL is not registered with launchd; run: restart-bot.sh --plist"
+  local get_pid_rc=0
+  if old_pid=$(get_pid); then
+    :
+  else
+    get_pid_rc=$?
+    case "$get_pid_rc" in
+)
+        err "service $BOT_LABEL is not registered with launchd; run: restart-bot.sh --plist"
+        ;;
+)
+        err "launchctl list failed while querying service $BOT_LABEL; unable to determine launchd state"
+        ;;
+      *)
+        err "failed to query PID for service $BOT_LABEL (get_pid exit status: $get_pid_rc)"
+        ;;
+    esac

Conversation

fitz123 commented Apr 19, 2026

Summary

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants