feat(sdk): add ask_oracle tool#3673
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
📁 PR Artifacts Notice This PR contains a
|
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
This comment was marked as outdated.
This comment was marked as outdated.
Co-authored-by: openhands <openhands@all-hands.dev>
08d4edd to
9c2b227
Compare
Coverage Report •
|
||||||||||||||||||||||||||||||
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
Updated the description of the Oracle to clarify its purpose and capabilities.
This comment was marked as outdated.
This comment was marked as outdated.
Co-authored-by: openhands <openhands@all-hands.dev>
This comment was marked as outdated.
This comment was marked as outdated.
Co-authored-by: openhands <openhands@all-hands.dev>
enyst
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable — the core feature is small and mostly sane, but there is a real configuration footgun hiding in the data flow.
[CRITICAL ISSUES]
-
[
openhands-sdk/openhands/sdk/settings/model.py, lines 1154-1163] Data Structure / two sources of truth:oracle_llm_profileis supposed to be the canonical setting that powersask_oracle, butcreate_agent()refuses to wire that profile ifself.toolsalready contains anask_oracleentry. That silently creates an unconfigured Oracle even when the user explicitly setoracle_llm_profile="oracle":OpenHandsAgentSettings( oracle_llm_profile="oracle", tools=[Tool(name="ask_oracle")], ).create_agent() # ask_oracle params stay {}
Then the agent sees the tool, calls it, and gets “The Oracle is not configured.” That's exactly the kind of special-case behavior this settings field should eliminate. Pick one source of truth: when
oracle_llm_profileis set, normalize/replace the existingask_oracletool params with{"profile_name": ...}, or reject the conflicting configuration loudly. Add a regression test fororacle_llm_profileplus a pre-existingTool(name="ask_oracle").
[IMPROVEMENT OPPORTUNITIES]
- [
openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py, line 215] Tool metadata lies:openWorldHint=Falsesays the tool has a closed interaction domain, but this tool calls an external LLM provider and sends the agent's question/context out of process. That is open-world behavior. Mark itTrueunless there is a very deliberate reason to pretend an LLM network call is closed-world.
[TESTING GAPS]
- The tests cover the happy-path append and omission cases, but not the configuration collision above. This is not a mock-the-world nit: it is a real
OpenHandsAgentSettings.create_agent()behavior regression test that should fail if the settings field stops being the actual source of truth.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
This adds a new agent-visible tool that can call another LLM and influence agent behavior, so the blast radius is larger than the code size. The implementation is mostly contained, and there is unit/live evidence, but the config precedence bug can make a user-visible “configured but unusable” state. Also, because this touches tool calling / agent behavior, I am leaving a COMMENT review rather than approving; per repo guidelines, a human maintainer should decide after appropriate eval/CI validation.
VERDICT:
❌ Needs rework: fix the config precedence footgun before merge.
KEY INSIGHT:
The feature is only simple if oracle_llm_profile is the single source of truth; right now tools can silently override it into a broken half-configured state.
This PR review comment was generated by an AI agent (OpenHands) on behalf of the user.
This comment was marked as duplicate.
This comment was marked as duplicate.
Co-authored-by: openhands <openhands@all-hands.dev>
|
@enyst Should we add this in tools instead that in the builtin ones? |
|
@OpenHands understand and do this please: #3673 (comment) |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
@VascoSch92 @enyst I looked into this. I don't think Why:
I pushed 994c508 to clarify the This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server See the conversation for more information. |
|
mmmh... so why do we have openhands-tools? I'm not against the tool. Just could be cool to have some order: what goes where? :-) |
HUMAN:
This PR proposes an
Oracletool, for the agent to ask a more capable LLM when it encounters a difficulty, when it needs a second opinion, or when the user tells it to.AGENT:
Why
Agents sometimes need a second opinion from a stronger or more specialized saved LLM profile without permanently switching the active conversation profile. This adds a minimal
ask_oracletool powered by anOpenHandsAgentSettings.oracle_llm_profileprofile name so the agent can consult that Oracle profile statelessly and continue with its current LLM.Summary
ask_oracletool that asks a saved Oracle LLM profile for stateless second-opinion guidance.OpenHandsAgentSettings.oracle_llm_profileso users can declare the saved profile that powers the Oracle tool..pr/evidence for reviewers.Closes #3672.
Evidence
The
.pr/directory is intentional. Per this repository's PR artifact policy, temporary design notes, live-test evidence, JSON results, and reviewer-facing validation summaries that should not merge tomainbelong under.pr/during review. The source for that policy is the repository guidance inAGENTS.md, sectionPR_ARTIFACTS: it says.pr/is for PR-specific documents/scripts/artifacts, that reviewers are notified when it exists, and that the directory is automatically removed by workflow when the PR is approved. In other words, approving this PR will not merge the.pr/artifacts; approval triggers the cleanup workflow to remove them before merge.What the evidence proves:
.pr/ask_oracle_live_validation.jsonshows a live run where the regular profile used OpenAI directopenai/gpt-5-nano, the Oracle profile usedlitellm_proxy/openai/gpt-5-minithroughhttps://llm-proxy.eval.all-hands.dev, andask_oraclereturned a successful non-errorresponsefrom the Oracle..pr/ask_oracle_live_validation.pyis the exact script used to create that JSON. It creates an isolated temporary profile store, saves anoracleprofile, buildsOpenHandsAgentSettings(oracle_llm_profile="oracle"), executesask_oracle, records the response, and removes the temporary profile store infinally..pr/ask_oracle_test_results.jsonrecords the targeted pytest, example pytest, pre-commit, and live-validation commands/results..pr/ask_oracle_validation_summary.mdsummarizes the behavior for reviewers: the tool consults the saved Oracle profile statelessly, sends only the Oracle system prompt plus the agent's question/context (no conversation history or tools), and does not switch the active conversation LLM.How to Test
uv run pre-commit run --files openhands-sdk/openhands/sdk/settings/model.py openhands-sdk/openhands/sdk/tool/builtins/__init__.py openhands-sdk/openhands/sdk/tool/builtins/ask_oracle.py tests/sdk/tool/test_ask_oracle.py tests/sdk/test_settings.py tests/examples/test_examples.py examples/01_standalone_sdk/54_ask_oracle_tool/main.py .pr/ask_oracle_live_validation.py .pr/ask_oracle_live_validation.json .pr/ask_oracle_test_results.json .pr/ask_oracle_validation_summary.mduv run pytest tests/sdk/tool/test_ask_oracle.py tests/sdk/tool/test_builtins.py tests/sdk/test_settings.py::test_llm_agent_settings_export_schema_groups_sections tests/examples/test_examples.py::test_directory_example_is_discovereduv run pytest tests/examples/test_examples.py --run-examples -k 54_ask_oracle_toolCI=true uv run python -m pytest -q tests/sdk.pr/ask_oracle_live_validation.jsonusingopenai/gpt-5-nanoas the regular profile andlitellm_proxy/openai/gpt-5-minithrough the eval proxy as the Oracle profile.This PR was created by an AI agent (OpenHands) on behalf of the user.
@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:994c508-pythonRun
All tags pushed for this build
About Multi-Architecture Support
994c508-python) is a multi-arch manifest supporting both amd64 and arm64994c508-python-amd64) are also available if needed