Skip to content

Harden agent extension review findings#22

Open
koriym wants to merge 5 commits into
codex/agent-history-completionfrom
codex/review-hardening
Open

Harden agent extension review findings#22
koriym wants to merge 5 commits into
codex/agent-history-completionfrom
codex/review-hardening

Conversation

@koriym
Copy link
Copy Markdown
Member

@koriym koriym commented May 13, 2026

Summary

  • Always reject tool calls that are not present in the current LLM request tool list.
  • Preserve public interface compatibility while adding option-aware agent interfaces for per-run AgentOptions.
  • Harden Agent-as-Tool behavior: wire subagent dispatchers, apply profile options, catch subagent exceptions, fallback unknown ask_* calls, and default-deny confirmable subagent tools without a pool handler.
  • Tighten output processor invariants so tool-use history and streaming tool-use control events cannot be corrupted.
  • Correct ALPS safe/idempotent semantics and update fixtures/docs.

Validation

  • zsh -ic 'sphp85; composer tests'
  • zsh -ic 'sphp85; XDEBUG_MODE=coverage vendor/bin/phpunit --coverage-text' (100% classes, methods, lines)

Stacked on #21.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5a1a7d44-a163-4c56-bd96-5f3aa6a3343d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces per-run AgentOptions support through new OptionAwareAgentInterface and OptionAwareStreamingAgentInterface, removes conditional tool enforcement in favor of unconditional validation against ToolList, adds output processor data-preservation assertions, introduces DenyConfirmationHandler for safer subagent defaults, and refactors AgentPool and AgentFactory to support tool caching and profile-based option application.

Changes

Per-run options and tool validation

Layer / File(s) Summary
Option-aware agent interfaces and base contract changes
src/Runtime/OptionAwareAgentInterface.php, src/Runtime/OptionAwareStreamingAgentInterface.php, src/Runtime/AgentInterface.php, src/Runtime/StreamingAgentInterface.php
New OptionAwareAgentInterface and OptionAwareStreamingAgentInterface extend base interfaces with per-run AgentOptions support; base AgentInterface and StreamingAgentInterface signatures simplified by removing options parameters.
Agent implementation with unconditional tool validation
src/Runtime/Agent.php, src/Runtime/StreamingAgent.php, tests/Fake/FakeOutputProcessor.php, tests/Runtime/AgentProcessorTest.php
Agent and StreamingAgent implement option-aware interfaces; tool validation is unconditional—any tool not in ToolList produces an error; enforceToolList() removed; FakeOutputProcessor preserves non-text blocks on tool-use responses; tests validate processor preservation and unadvertised tool rejection.
Output processor validation and data preservation
src/Runtime/AgentOptions.php, src/Runtime/OutputProcessorInterface.php
AgentOptions enforces output processor compliance by validating tool-use content blocks and stream event control data are preserved; documentation clarifies type-preservation requirements and matching semantics.
Denial confirmation handler and safer subagent defaults
src/Runtime/DenyConfirmationHandler.php, src/Runtime/AgentPool.php, tests/Runtime/AgentPoolTest.php
New DenyConfirmationHandler unconditionally denies confirmable tool calls; AgentPool::create() defaults to this handler when none is configured; tests validate denial behavior, tool caching per profile, and profile option application.
ProfiledAgent wrapper and pool-based agent creation
src/Runtime/ProfiledAgent.php, src/Runtime/AgentPool.php
ProfiledAgent wraps agents and applies profile-default options; AgentPool::create() caches tools per profile, constructs agents with denial defaults, wraps in ProfiledAgent, and returns option-aware interface.
AgentDelegator exception handling and AgentFactory wiring
src/Runtime/AgentDelegator.php, src/Runtime/AgentFactory.php, tests/Runtime/AgentPoolTest.php
AgentDelegator converts subagent exceptions to tool errors, reformats context as XML blocks, removes profile-option passing; AgentFactory makes dispatcher mutable, wraps pools in delegator, updates return types to option-aware interfaces; tests validate delegator fallback and factory subagent wiring.
Documentation, configuration, and test fixtures
README.md, README.ja.md, composer.json, src/Runtime/AgentProfile.php, src/Runtime/AgentResponse.php, src/Runtime/AlpsContextInputProcessor.php, src/Runtime/AlpsToolPolicyInputProcessor.php, src/Schema/AlpsDescriptorIndex.php, tests/Fake/alps-profile.json, tests/Fake/alps-profile.xml, tests/Runtime/AlpsContextInputProcessorTest.php, tests/Runtime/AlpsToolPolicyInputProcessorTest.php, tests/Schema/AlpsSemanticDictionaryTest.php
Composer scripts separated into phpstan and psalm; READMEs clarify output processor preservation, ALPS policy filtering, and subagent confirmation semantics; ALPS fixtures updated with new state/transition semantics; processor documentation expanded; AgentProfile context schema permits additional properties; type annotations enhanced; test helpers reorganized.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • bearsunday/BEAR.ToolUse#8: Introduces StreamingAgent and StreamingAgentInterface around StreamEvent handling; this PR refactors StreamingAgent to use OptionAwareStreamingAgentInterface and updates dispatch semantics.
  • bearsunday/BEAR.ToolUse#11: Both PRs update ALPS semantic dictionary behavior; this PR changes test fixtures and assertions to reflect goUserList exclusion from semantic dictionary.
  • bearsunday/BEAR.ToolUse#4: Introduces ConfirmationHandlerInterface for tool confirmation; this PR adds DenyConfirmationHandler to establish safer defaults for subagent confirmation handling.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Harden agent extension review findings' is vague and non-descriptive, using generic terminology that doesn't clearly convey the main changes in the changeset. Consider a more specific title that highlights the primary change, such as 'Implement OptionAwareAgent interfaces and strengthen tool validation' or 'Add option-aware agent interfaces with stricter tool enforcement'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description is well-structured and clearly related to the changeset, detailing multiple key changes including tool validation hardening, interface additions, and output processor constraints.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/review-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (2e69e50) to head (fa69ae7).

Additional details and impacted files
@@                        Coverage Diff                         @@
##             codex/agent-history-completion       #22   +/-   ##
==================================================================
  Coverage                            100.00%   100.00%           
- Complexity                              373       405   +32     
==================================================================
  Files                                    33        35    +2     
  Lines                                  1016      1078   +62     
==================================================================
+ Hits                                   1016      1078   +62     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koriym koriym force-pushed the codex/review-hardening branch from 0e3cb5b to b8d9191 Compare May 13, 2026 01:52
@koriym
Copy link
Copy Markdown
Member Author

koriym commented May 13, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Runtime/AgentOptions.php`:
- Around line 132-133: The current preservation check calls
$this->assertToolUseContentIsPreserved($output) which only validates the
post-processed $output and allows a processor to mutate both toolCalls and
content together; change the assertion to compare the post-processed $output
against the original $response (pass both objects into the assertion or call a
new method) and explicitly verify that control fields are unchanged: stopReason,
each tool call's id, name and input (the $output->toolCalls entries must match
the corresponding $response->toolCalls entries and $output->content must not
replace a tool-use marker with a different tool call), updating the assertion
used around the lines referencing $output->toolCalls and $output->content so the
original $response is the source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ea5d8b8-22bb-4ed0-9de3-e4e9aa637415

📥 Commits

Reviewing files that changed from the base of the PR and between 2e69e50 and 88f4380.

📒 Files selected for processing (29)
  • README.ja.md
  • README.md
  • composer.json
  • src/Runtime/Agent.php
  • src/Runtime/AgentDelegator.php
  • src/Runtime/AgentFactory.php
  • src/Runtime/AgentInterface.php
  • src/Runtime/AgentOptions.php
  • src/Runtime/AgentPool.php
  • src/Runtime/AgentProfile.php
  • src/Runtime/AgentResponse.php
  • src/Runtime/AlpsContextInputProcessor.php
  • src/Runtime/AlpsToolPolicyInputProcessor.php
  • src/Runtime/DenyConfirmationHandler.php
  • src/Runtime/OptionAwareAgentInterface.php
  • src/Runtime/OptionAwareStreamingAgentInterface.php
  • src/Runtime/OutputProcessorInterface.php
  • src/Runtime/ProfiledAgent.php
  • src/Runtime/StreamingAgent.php
  • src/Runtime/StreamingAgentInterface.php
  • src/Schema/AlpsDescriptorIndex.php
  • tests/Fake/FakeOutputProcessor.php
  • tests/Fake/alps-profile.json
  • tests/Fake/alps-profile.xml
  • tests/Runtime/AgentPoolTest.php
  • tests/Runtime/AgentProcessorTest.php
  • tests/Runtime/AlpsContextInputProcessorTest.php
  • tests/Runtime/AlpsToolPolicyInputProcessorTest.php
  • tests/Schema/AlpsSemanticDictionaryTest.php

Comment thread src/Runtime/AgentOptions.php Outdated
@koriym
Copy link
Copy Markdown
Member Author

koriym commented May 14, 2026

Conductor issue: #24 (Semantic Hypermedia Agent Runtime — merge / release plan for the #18#22 stack).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant