Harden agent extension review findings by koriym · Pull Request #22 · bearsunday/BEAR.ToolUse

koriym · 2026-05-13T01:50:10Z

Summary

Always reject tool calls that are not present in the current LLM request tool list.
Preserve public interface compatibility while adding option-aware agent interfaces for per-run AgentOptions.
Harden Agent-as-Tool behavior: wire subagent dispatchers, apply profile options, catch subagent exceptions, fallback unknown ask_* calls, and default-deny confirmable subagent tools without a pool handler.
Tighten output processor invariants so tool-use history and streaming tool-use control events cannot be corrupted.
Correct ALPS safe/idempotent semantics and update fixtures/docs.

Validation

zsh -ic 'sphp85; composer tests'
zsh -ic 'sphp85; XDEBUG_MODE=coverage vendor/bin/phpunit --coverage-text' (100% classes, methods, lines)

Stacked on #21.

coderabbitai · 2026-05-13T01:50:17Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5a1a7d44-a163-4c56-bd96-5f3aa6a3343d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces per-run AgentOptions support through new OptionAwareAgentInterface and OptionAwareStreamingAgentInterface, removes conditional tool enforcement in favor of unconditional validation against ToolList, adds output processor data-preservation assertions, introduces DenyConfirmationHandler for safer subagent defaults, and refactors AgentPool and AgentFactory to support tool caching and profile-based option application.

Changes

Per-run options and tool validation

Layer / File(s)	Summary
Option-aware agent interfaces and base contract changes `src/Runtime/OptionAwareAgentInterface.php`, `src/Runtime/OptionAwareStreamingAgentInterface.php`, `src/Runtime/AgentInterface.php`, `src/Runtime/StreamingAgentInterface.php`	New `OptionAwareAgentInterface` and `OptionAwareStreamingAgentInterface` extend base interfaces with per-run `AgentOptions` support; base `AgentInterface` and `StreamingAgentInterface` signatures simplified by removing options parameters.
Agent implementation with unconditional tool validation `src/Runtime/Agent.php`, `src/Runtime/StreamingAgent.php`, `tests/Fake/FakeOutputProcessor.php`, `tests/Runtime/AgentProcessorTest.php`	`Agent` and `StreamingAgent` implement option-aware interfaces; tool validation is unconditional—any tool not in `ToolList` produces an error; `enforceToolList()` removed; `FakeOutputProcessor` preserves non-text blocks on tool-use responses; tests validate processor preservation and unadvertised tool rejection.
Output processor validation and data preservation `src/Runtime/AgentOptions.php`, `src/Runtime/OutputProcessorInterface.php`	`AgentOptions` enforces output processor compliance by validating tool-use content blocks and stream event control data are preserved; documentation clarifies type-preservation requirements and matching semantics.
Denial confirmation handler and safer subagent defaults `src/Runtime/DenyConfirmationHandler.php`, `src/Runtime/AgentPool.php`, `tests/Runtime/AgentPoolTest.php`	New `DenyConfirmationHandler` unconditionally denies confirmable tool calls; `AgentPool::create()` defaults to this handler when none is configured; tests validate denial behavior, tool caching per profile, and profile option application.
ProfiledAgent wrapper and pool-based agent creation `src/Runtime/ProfiledAgent.php`, `src/Runtime/AgentPool.php`	`ProfiledAgent` wraps agents and applies profile-default options; `AgentPool::create()` caches tools per profile, constructs agents with denial defaults, wraps in `ProfiledAgent`, and returns option-aware interface.
AgentDelegator exception handling and AgentFactory wiring `src/Runtime/AgentDelegator.php`, `src/Runtime/AgentFactory.php`, `tests/Runtime/AgentPoolTest.php`	`AgentDelegator` converts subagent exceptions to tool errors, reformats context as XML blocks, removes profile-option passing; `AgentFactory` makes dispatcher mutable, wraps pools in delegator, updates return types to option-aware interfaces; tests validate delegator fallback and factory subagent wiring.
Documentation, configuration, and test fixtures `README.md`, `README.ja.md`, `composer.json`, `src/Runtime/AgentProfile.php`, `src/Runtime/AgentResponse.php`, `src/Runtime/AlpsContextInputProcessor.php`, `src/Runtime/AlpsToolPolicyInputProcessor.php`, `src/Schema/AlpsDescriptorIndex.php`, `tests/Fake/alps-profile.json`, `tests/Fake/alps-profile.xml`, `tests/Runtime/AlpsContextInputProcessorTest.php`, `tests/Runtime/AlpsToolPolicyInputProcessorTest.php`, `tests/Schema/AlpsSemanticDictionaryTest.php`	Composer scripts separated into `phpstan` and `psalm`; READMEs clarify output processor preservation, ALPS policy filtering, and subagent confirmation semantics; ALPS fixtures updated with new state/transition semantics; processor documentation expanded; `AgentProfile` context schema permits additional properties; type annotations enhanced; test helpers reorganized.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

bearsunday/BEAR.ToolUse#8: Introduces StreamingAgent and StreamingAgentInterface around StreamEvent handling; this PR refactors StreamingAgent to use OptionAwareStreamingAgentInterface and updates dispatch semantics.
bearsunday/BEAR.ToolUse#11: Both PRs update ALPS semantic dictionary behavior; this PR changes test fixtures and assertions to reflect goUserList exclusion from semantic dictionary.
bearsunday/BEAR.ToolUse#4: Introduces ConfirmationHandlerInterface for tool confirmation; this PR adds DenyConfirmationHandler to establish safer defaults for subagent confirmation handling.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Harden agent extension review findings' is vague and non-descriptive, using generic terminology that doesn't clearly convey the main changes in the changeset.	Consider a more specific title that highlights the primary change, such as 'Implement OptionAwareAgent interfaces and strengthen tool validation' or 'Add option-aware agent interfaces with stricter tool enforcement'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is well-structured and clearly related to the changeset, detailing multiple key changes including tool validation hardening, interface additions, and output processor constraints.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/review-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-13T01:50:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (2e69e50) to head (fa69ae7).

Additional details and impacted files

@@                        Coverage Diff                         @@
##             codex/agent-history-completion       #22   +/-   ##
==================================================================
  Coverage                            100.00%   100.00%           
- Complexity                              373       405   +32     
==================================================================
  Files                                    33        35    +2     
  Lines                                  1016      1078   +62     
==================================================================
+ Hits                                   1016      1078   +62

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

koriym · 2026-05-13T02:24:03Z

@coderabbitai review

coderabbitai · 2026-05-13T02:24:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Runtime/AgentOptions.php`:
- Around line 132-133: The current preservation check calls
$this->assertToolUseContentIsPreserved($output) which only validates the
post-processed $output and allows a processor to mutate both toolCalls and
content together; change the assertion to compare the post-processed $output
against the original $response (pass both objects into the assertion or call a
new method) and explicitly verify that control fields are unchanged: stopReason,
each tool call's id, name and input (the $output->toolCalls entries must match
the corresponding $response->toolCalls entries and $output->content must not
replace a tool-use marker with a different tool call), updating the assertion
used around the lines referencing $output->toolCalls and $output->content so the
original $response is the source of truth.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ea5d8b8-22bb-4ed0-9de3-e4e9aa637415

📥 Commits

Reviewing files that changed from the base of the PR and between 2e69e50 and 88f4380.

📒 Files selected for processing (29)

README.ja.md
README.md
composer.json
src/Runtime/Agent.php
src/Runtime/AgentDelegator.php
src/Runtime/AgentFactory.php
src/Runtime/AgentInterface.php
src/Runtime/AgentOptions.php
src/Runtime/AgentPool.php
src/Runtime/AgentProfile.php
src/Runtime/AgentResponse.php
src/Runtime/AlpsContextInputProcessor.php
src/Runtime/AlpsToolPolicyInputProcessor.php
src/Runtime/DenyConfirmationHandler.php
src/Runtime/OptionAwareAgentInterface.php
src/Runtime/OptionAwareStreamingAgentInterface.php
src/Runtime/OutputProcessorInterface.php
src/Runtime/ProfiledAgent.php
src/Runtime/StreamingAgent.php
src/Runtime/StreamingAgentInterface.php
src/Schema/AlpsDescriptorIndex.php
tests/Fake/FakeOutputProcessor.php
tests/Fake/alps-profile.json
tests/Fake/alps-profile.xml
tests/Runtime/AgentPoolTest.php
tests/Runtime/AgentProcessorTest.php
tests/Runtime/AlpsContextInputProcessorTest.php
tests/Runtime/AlpsToolPolicyInputProcessorTest.php
tests/Schema/AlpsSemanticDictionaryTest.php

koriym · 2026-05-14T17:15:43Z

Conductor issue: #24 (Semantic Hypermedia Agent Runtime — merge / release plan for the #18 → #22 stack).

Harden agent extension review findings

b8d9191

koriym force-pushed the codex/review-hardening branch from 0e3cb5b to b8d9191 Compare May 13, 2026 01:52

koriym added 2 commits May 13, 2026 10:58

Address review follow-ups

32474a2

Set static analysis memory limits

88f4380

coderabbitai Bot requested changes May 13, 2026

View reviewed changes

Comment thread src/Runtime/AgentOptions.php Outdated

Preserve original tool-use control fields

61ddb77

coderabbitai Bot approved these changes May 13, 2026

View reviewed changes

Cover tool-use call count preservation

fa69ae7

koriym mentioned this pull request May 16, 2026

Preview: Semantic Hypermedia Agent Runtime integration #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden agent extension review findings#22

Harden agent extension review findings#22
koriym wants to merge 5 commits into
codex/agent-history-completionfrom
codex/review-hardening

koriym commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

codecov Bot commented May 13, 2026 •

edited

Loading

Uh oh!

koriym commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

koriym commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

koriym commented May 13, 2026

Summary

Validation

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

koriym commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

koriym commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading