Skip to content

feat: add /stop command to cancel running Claude calls#116

Closed
F1orian wants to merge 16 commits intoRichardAtCT:mainfrom
F1orian:feat/stop-command
Closed

feat: add /stop command to cancel running Claude calls#116
F1orian wants to merge 16 commits intoRichardAtCT:mainfrom
F1orian:feat/stop-command

Conversation

@F1orian
Copy link
Contributor

@F1orian F1orian commented Feb 26, 2026

Summary

  • Adds /stop command that immediately cancels a running Claude call
  • Enables concurrent_updates(True) on the PTB Application so /stop can run while a long-running Claude call is in progress (PTB defaults to sequential update processing, which blocked /stop behind the active handler)
  • Tracks active calls per-thread in _active_calls dict keyed by thread state_key, so /stop in one project thread doesn't affect another
  • Replaces single _active_pid with per-call PID tracking (_active_pids dict) and process-tree killing via /proc/<pid>/task/<pid>/children to safely terminate the CLI and all its subagents without killing the bot's process group
  • Registered in both agentic and classic modes with bot menu entry "Stop running Claude call"

Changes

  • src/bot/core.py — enable concurrent_updates(True)
  • src/bot/orchestrator.py — per-thread task tracking via _active_calls, _thread_key() helper
  • src/bot/handlers/message.py — same per-thread tracking for classic mode
  • src/bot/handlers/command.py/stop resolves correct thread, uses abort_call(call_id)
  • src/claude/facade.py — thread call_id through run_command, add abort_call()
  • src/claude/sdk_integration.py — per-call PID dict, process-tree kill (_get_descendants, _kill_pid), abort() for shutdown, abort_call() for targeted stop

Test plan

  • Updated orchestrator tests for new command counts (agentic: 5→6, classic: 13→14)
  • All orchestrator tests pass (9/9)
  • Manual test: /stop cancels running call immediately (confirmed via logs: success: true)
  • Manual test: /stop with nothing running → "Nothing running." response
  • Manual test: concurrent calls in two project threads, /stop in one does not affect the other
  • Manual test: bot stays alive after /stop (no process group kill)
  • Subagent processes terminated via process-tree walking

🤖 Generated with Claude Code

F1orian and others added 4 commits February 26, 2026 17:11
Allows users to immediately cancel an in-progress Claude call.
The command cancels the asyncio task wrapping run_command(),
cleans up the progress message, and logs the cancellation.

Registered in both agentic and classic modes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
task.cancel() only cancels the asyncio wrapper but leaves the
Claude CLI subprocess running. Add abort() to ClaudeSDKManager
and ClaudeIntegration facade that calls client.disconnect(),
which terminates the underlying subprocess via SIGTERM.

The /stop handler now calls abort() before task.cancel() to
ensure the CLI process is killed immediately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The original /stop implementation could never actually stop a running
Claude call because PTB's concurrent_updates defaults to False, meaning
the /stop handler was queued behind the still-running message handler.

This commit fixes three issues:

1. Enable concurrent update processing (concurrent_updates=True) so
   /stop can run while a Claude call is in progress.

2. Track active calls per-thread instead of a single _claude_task on
   context.user_data. Each project thread gets its own entry in an
   _active_calls dict, so /stop in one channel doesn't affect another.

3. Replace the single _active_pid with a per-call PID dict and use
   process-tree killing (via /proc/*/children) instead of killpg to
   safely terminate the CLI and all its subagents without killing the
   bot's own process group.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FridayOpenClawBot
Copy link

PR Review
Reviewed head: 1c8e6e78ba2aa95cb931432e0e0e18dc78d714ee

Summary

  • Adds /stop command to cancel an in-flight Claude call per-thread without affecting other concurrent threads
  • Enables concurrent_updates(True) on the PTB Application so /stop can run while a long-running handler is active
  • Tracks active calls in context.user_data["_active_calls"] keyed by thread state_key; uses per-call PID tracking and process-tree kill via abort_call()

What looks good

  • Per-thread call isolation is correct — /stop in one topic won't cancel work in another
  • abort_call() kills the subprocess tree before cancelling the asyncio task, preventing zombie processes
  • CancelledError is caught cleanly in both orchestrator.py and message.py, deleting the progress message gracefully
  • Test counts updated to reflect the new command

Issues / questions

  1. [Important] src/bot/handlers/command.pystop_command reads context.user_data.get("_thread_context") to derive thread_key. With concurrent_updates(True) enabled, context.user_data is shared and could be updated by a concurrent handler between the read and the cancel. Consider snapshotting thread_key before the first await or using the same _update_thread_states dict pattern introduced in PR Enable concurrent processing of messages across topics #103.
  2. [Important] src/claude/sdk_integration.pyabort_call(call_id) and abort() rely on /proc/<pid>/task/<pid>/children for process-tree walking, which is Linux-only. If any contributors run on macOS for testing, this will silently fail. A comment noting the Linux requirement (or a platform guard) would prevent confusion.
  3. [Nit] call_id = id(object()) is slightly fragile — CPython reuses addresses for short-lived objects. uuid.uuid4() or a simple monotonic counter would be more robust.

Suggested tests

  • Unit test for stop_command when _active_calls is empty (should reply "Nothing running." without error)
  • Unit test confirming /stop in thread A does not cancel the task registered for thread B

Verdict
⚠️ Merge after fixes — the core logic is sound, but the context.user_data race under concurrent updates and the Linux-only process walk are worth addressing before ship.

Friday, AI assistant to @RichardAtCT

- Document concurrent_updates safety: all user_data reads in stop_command
  happen synchronously before the first await
- Add platform guard in _get_descendants(): return [] on non-Linux so
  _kill_pid gracefully falls back to killing only the root process
- Replace id(object()) with itertools.count() monotonic counter for
  call_id (avoids CPython address reuse)
- Add unit tests: /stop with nothing running, thread isolation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@F1orian
Copy link
Contributor Author

F1orian commented Feb 27, 2026

Thanks for the thorough review! Pushed fixes for all three points:

1. context.user_data race (confirmed safe + documented):
All reads from user_data (_thread_context, _active_calls, task, call_id) happen synchronously before the first await in stop_command, so no concurrent handler can mutate the dict between read and cancel. Added a comment documenting this invariant.

2. Linux-only /proc process-tree walking:
Added a platform.system() != "Linux" guard in _get_descendants() — returns [] on non-Linux so _kill_pid gracefully falls back to killing only the root process. Docstring updated.

3. id(object()) → monotonic counter:
Replaced with itertools.count(1) in both orchestrator.py (class-level _call_counter) and message.py (module-level _call_counter). No address-reuse risk.

Tests added:

  • test_stop_command_nothing_running — verifies "Nothing running." reply when _active_calls is empty
  • test_stop_command_thread_isolation — verifies /stop in thread A cancels only thread A's task, leaving thread B untouched

@FridayOpenClawBot
Copy link

PR Review
Reviewed head: 27c9b73d0e67ce66f2fbe72d7507b9d53f66b468

Summary

  • Adds /stop command that cancels a running Claude call in the current thread
  • Enables concurrent_updates(True) on the PTB application so /stop can be processed while a long-running handler is active
  • Tracks per-thread active calls via _active_calls dict; kills the underlying subprocess tree via /proc on Linux

What looks good

  • Thread isolation is solid: _thread_key() helper is consistent across both classic and orchestrator paths, and the tests explicitly verify that stopping thread A doesn't affect thread B
  • Process-tree kill logic is careful: uses killpg only when the process is its own group leader, falls back to individual kills otherwise — avoids nuking the bot's own process group
  • The pre-await snapshot of thread_key / call_info before the first await is a nice touch to prevent race conditions under concurrent updates

Issues / questions

  1. [Important] src/claude/sdk_integration.py — The _get_descendants helper using /proc/<pid>/task/<pid>/children is Linux-only (correctly guarded), but the _ensure_dead SIGKILL follow-up thread imports time and threading inside the function. Minor style issue, but more relevantly: if the bot is running on macOS (dev/test), the SIGKILL fallback is a no-op because _get_descendants returns [] and only target_pid itself gets checked. That's probably fine for local dev, just worth a note in comments or docs.

  2. [Important] src/bot/handlers/command.py (stop_command) — claude_integration is fetched from bot_data but could be None if the bot starts in a mode where it's not injected. You guard with if claude_integration and call_id is not None before calling abort_call, which is correct. However, if claude_integration is None but a task is running, you call task.cancel() without first killing the subprocess — the subprocess may outlive the asyncio wrapper. Worth documenting this as a known limitation, or asserting that claude_integration is always set when a call is active.

  3. [Nit] src/bot/handlers/message.py_call_counter = itertools.count(1) is a module-level singleton shared across all classic-mode calls. MessageOrchestrator._call_counter is class-level for agentic mode. Both are fine, but mixing the patterns could confuse future contributors. Consider making message.py's counter a module-level constant with a comment explaining why it's separate.

Suggested tests (if needed)

  • A test for the case where claude_integration is None but a task exists (ensures task.cancel() is still called without crashing)
  • Integration-style test verifying the subprocess is actually reaped after /stop (may be out of scope for unit tests)

Verdict
⚠️ Merge after fixes — the two Important items are low-risk but worth addressing before ship. The core logic is sound.

Friday, AI assistant to @RichardAtCT

F1orian and others added 4 commits February 28, 2026 01:02
Introduce CommandPaletteScanner that reads ~/.claude/ to discover
installed plugins, their skills/commands, and custom user skills.
Includes data models (ActionType, PaletteItem, PluginInfo), a YAML
frontmatter parser, plugin toggle support, and 43 comprehensive tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement MenuBuilder (builds top-level and category sub-menus with short
IDs for Telegram's 64-byte callback_data limit), menu_command and
menu_callback handlers, and wire them into the agentic orchestrator.

- MenuBuilder: top-level menu (Bot, plugins 2-per-row, custom skills,
  Plugin Store), category sub-menus with toggle and back buttons,
  single-item plugins execute directly, multi-item open categories
- menu_command: scans filesystem via CommandPaletteScanner, sends
  inline keyboard, stores builder in user_data for callback resolution
- menu_callback: handles back/cat/run/tog/store actions, re-scans
  on navigation, sets pending skill/command for future Claude injection
- Orchestrator: registers /menu CommandHandler + menu: CallbackQueryHandler,
  adds BotCommand("menu") to Telegram command menu
- 39 menu tests + 3 orchestrator integration tests (78 total, all pass)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When users tap skill buttons (/commit, /review-pr, custom skills),
the menu now actually calls Claude via ClaudeIntegration.run_command()
instead of just storing a pending value. Direct commands (/new, /status,
/stop, /verbose, /repo) are executed inline with appropriate feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RichardAtCT
Copy link
Owner

Just catching up with my clanker!
Point 1 on the latest comment needs to be addressed because a user can run the tool in production on MacOS (I do!).

- Add get_enabled_plugin_paths() to command_palette.py to resolve
  installed plugin paths from ~/.claude/ config files
- Pass enabled plugins to ClaudeAgentOptions via SdkPluginConfig so
  skills/commands from plugins are available in CLI sessions
- Change setting_sources from ["project"] to ["project", "user"] so
  custom user skills (~/.claude/skills/) are also discovered
- Fix menu skill invocation to send slash commands directly instead
  of hacky description-based prompts
- Add message_thread_id to all send_message calls in menu handler
  to fix wrong-channel responses in supergroup forum topics
- Add fallback confirmation message when skill produces no visible
  text output
- Include design docs for the dynamic command menu feature

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FridayOpenClawBot
Copy link

PR Review
Reviewed head: 3bef1c4557c865813a3708de9c30696db60aa0ca

Summary

  • Adds /stop command that cancels running Claude calls via per-thread task tracking and process-tree killing
  • Enables concurrent_updates(True) in PTB so /stop can run alongside an active handler
  • Also ships a substantial /menu dynamic command palette (scans ~/.claude/ for skills/plugins) — this is a large additional feature beyond what the title suggests

What looks good

  • The per-call-ID tracking + process-tree kill (_get_descendants via /proc) is solid: avoids killing the bot's own process group, gracefully degrades on non-Linux
  • Reading thread_key and call_info synchronously before the first await in stop_command is a correct and explicit defence against the concurrency race introduced by concurrent_updates
  • Scoping abort_call to a call_id (rather than a global abort) correctly isolates stop to the right thread

Issues / questions

  1. [Important] src/bot/handlers/command.pystop_command reads context.user_data["_active_calls"] without a lock. With concurrent_updates(True), a message handler running concurrently could be writing to _active_calls while /stop reads it. Python's GIL makes individual dict ops atomic, but check whether PTB serialises user_data access internally when concurrent_updates=True; if not, a threading.Lock stored in user_data may be needed.

  2. [Important] src/bot/handlers/message.py — if the Claude task raises (not cancels), active.pop(thread_key, None) in the finally block correctly cleans up. But what if the progress message deletion in the CancelledError handler itself raises? The bare except Exception: pass swallows it silently, which is fine — just confirm there's no cleanup that should still happen (e.g. removing the active_call entry). Currently the finally block handles it, so this is fine — just worth a comment.

  3. [Nit] src/claude/sdk_integration.py_get_descendants uses a bare open() on /proc paths; wrapping in Path(...).read_text() is more idiomatic and auto-closes even on partial reads, though the with statement here is already safe.

  4. [Nit] The PR title says "add /stop command" but it also includes 388-line command_palette.py, a 602-line menu.py, and ~1725-line plan docs. If /menu is still in progress, consider splitting to keep review surface manageable and avoid blocking the /stop feature.

Suggested tests (if needed)

  • Unit test for stop_command when _active_calls has a task for the current thread vs. a different thread — verify only the correct one is cancelled
  • Test _get_descendants returns empty list on non-Linux (mock platform.system())

Verdict
⚠️ Merge after fixes — the concurrent user_data access question (#1) is worth confirming before merge, but the core stop logic is clean.

Friday, AI assistant to @RichardAtCT

F1orian and others added 6 commits March 2, 2026 23:16
Design for intercepting AskUserQuestion via PreToolUse SDK hooks
and routing questions through Telegram inline keyboards.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8-task plan with TDD steps for intercepting AskUserQuestion via
PreToolUse hooks and routing through Telegram inline keyboards.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oard builders

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…interactive questions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…strator, and menu

Thread telegram_context from Telegram handlers through facade.py and into
sdk_integration.py so the PreToolUse hook for AskUserQuestion can send
interactive inline keyboards to the user's chat.

- sdk_integration.py: accept telegram_context param, register HookMatcher
  for AskUserQuestion, cancel pending questions in finally block
- facade.py: thread telegram_context through run_command and _execute
- orchestrator.py: build TelegramContext in agentic_text, register askq
  callback and other-text handlers in _register_agentic_handlers
- menu.py: build TelegramContext in INJECT_SKILL section, move thread_id
  extraction before run_command
- interactive_questions.py: use ApplicationHandlerStop instead of return
  True to prevent double-handling of consumed messages
- Lazy imports in sdk_integration.py to avoid circular import with
  bot.features

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update test_setting_sources_includes_project to match the
setting_sources=["project", "user"] change. Apply black and isort
formatting to all interactive questions files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FridayOpenClawBot
Copy link

Review — new commits since last review (interactive AskUserQuestion feature)

The interactive question routing via PreToolUse hooks is a solid approach, and the implementation is clean. A few things worth flagging:

Race condition in multi-question flow

_pending is keyed by (user_id, chat_id) and each question in a multi-question loop register_pending + await future sequentially. That's fine for serial questions, but if Claude fires two AskUserQuestion tool calls concurrently (possible with concurrent_updates=True now enabled), the second register_pending will overwrite the first and the first future will never resolve. Consider keying by (user_id, chat_id, call_id) or using a queue.

q_idx is per-loop-iteration, not globally unique

build_single_select_keyboard encodes q_idx in callback_data (askq:{q_idx}:{i}), but q_idx resets to 0 on each run_command. If two concurrent hooks are active, the askq_callback routing by q_idx will be ambiguous. Since pq is already looked up by (user_id, chat_id), q_idx isn't actually used in askq_callback — the # noqa: F841 comment acknowledges this. You could remove q_idx from callback_data entirely and simplify to askq:{i} / askq:t{i} / askq:done / askq:other.

"Other..." free-text: no timeout

If a user taps "Other..." and then abandons the session, pq.awaiting_other stays True indefinitely and will intercept their next real message. Consider cancelling after a timeout, or at least clearing awaiting_other when the session ends (alongside the existing cancel_pending in the finally block).

cancel_pending in finally only cancels one question

The finally in _run_client calls cancel_pending once, but a multi-question flow could have left a PendingQuestion in the registry mid-stream. This is already handled if the future is awaited before the next register_pending, but worth a comment for clarity.

Minor: lazy imports inside hot paths

from ..bot.features.interactive_questions import TelegramContext is imported inside agentic_text (called on every message) and inside _run_client. These are fine functionally (Python caches module imports), but could be moved to module level now that TYPE_CHECKING is already used in sdk_integration.py.

Overall the design is sound and the test coverage looks good. The race condition is the only one I'd call a real bug worth fixing before merge.

Friday, AI assistant to @RichardAtCT

@RichardAtCT
Copy link
Owner

Closing — superseded by #122 which implements the same functionality (cancel running requests) via an inline Stop button rather than a /stop command. The inline approach provides better UX and a smaller diff. Thanks for the contribution!

@RichardAtCT RichardAtCT closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants