refactor(sdk): remove Jinja prompts superseded by the Python prompt registry#3796
refactor(sdk): remove Jinja prompts superseded by the Python prompt registry#3796VascoSch92 wants to merge 3 commits into
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
I verified the SDK prompt-rendering paths as a real user would; the main refactor works for default, disabled, normal custom-policy, custom-template, and planning-template usage, but an empty custom policy file now leaks the built-in default policy.
Does this PR achieve its stated goal?
Mostly yes. The PR set out to remove redundant built-in Jinja prompts while keeping the Python prompt registry behavior equivalent and routing custom security policies through it; running Agent(...).static_system_message on base vs. the PR showed identical default Claude/GPT-5 prompt hashes and working normal custom/disabled policy behavior. However, a zero-byte custom security_policy_filename regresses: base rendered no default policy for that custom file, while the PR falls back to the built-in security policy.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully. |
| CI Status | 🟡 No failures observed; many checks pass, with coverage/QA/build jobs still pending when checked. |
| Functional Verification |
Functional Verification
Test 1: Default prompt parity and normal custom-policy behavior
Step 1 — Establish baseline on origin/main:
Ran git checkout --quiet origin/main && uv run python /tmp/qa_prompt_check.py:
default_claude: len=14504 sha=c9d7d3f665687ed7 security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=False claude_specific=True
gpt5: len=15202 sha=0ecf2532b0abacfc security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=True claude_specific=False
disabled_policy: security_tag=True default_policy=False len=13020 sha=6ac4a8b90cd27349
custom_policy: custom_rule=True jinja_variable_literal=False jinja_variable_rendered=False default_policy_leaked=False security_tag_count=1 len=13062 sha=adb80e715e640b47
custom_template: CUSTOM_TEMPLATE model_family=openai_gpt policy=security_policy.j2
planning_template: len=5249 sha=9dfe768574be2daf planning_marker=True
This establishes the old user-facing behavior: default prompts include the expected sections/model-specific instructions; custom policy content replaces the default policy; custom system templates and the planning template render through Jinja.
Step 2 — Apply the PR's changes:
Checked out vasco/clean-jinja at b140712e05c8d9c77b40c1f6116e14223ffd2da5.
Step 3 — Re-run with the PR in place:
Ran git checkout --quiet vasco/clean-jinja && uv run python /tmp/qa_prompt_check.py:
default_claude: len=14504 sha=c9d7d3f665687ed7 security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=False claude_specific=True
gpt5: len=15202 sha=0ecf2532b0abacfc security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=True claude_specific=False
disabled_policy: security_tag=False default_policy=False len=12991 sha=5b8bface75f397d6
custom_policy: custom_rule=True jinja_variable_literal=True jinja_variable_rendered=False default_policy_leaked=False security_tag_count=1 len=13073 sha=27cb578ec84d66b9
custom_template: CUSTOM_TEMPLATE model_family=openai_gpt policy=security_policy.j2
planning_template: len=5249 sha=9dfe768574be2daf planning_marker=True
This confirms the main refactor goal for normal usage: default Claude and GPT-5 prompts are byte-identical by hash, normal custom policy content is included without default-policy leakage, disabled policy no longer emits empty <SECURITY> tags, and retained Jinja escape hatches still render.
Test 2: Edge case — empty custom security policy file
Step 1 — Establish baseline on origin/main:
Ran git checkout --quiet origin/main && uv run python /tmp/qa_empty_policy_check.py:
empty_custom_policy: security_tag=True default_policy=False policy_path=/tmp/tmp9zhd0u84/empty_policy.j2 len=13022
This shows that, before the PR, choosing a custom policy file did not silently reintroduce the built-in policy even when the file was empty.
Step 2 — Apply the PR's changes:
Checked out vasco/clean-jinja at b140712e05c8d9c77b40c1f6116e14223ffd2da5.
Step 3 — Re-run with the PR in place:
Ran git checkout --quiet vasco/clean-jinja && uv run python /tmp/qa_empty_policy_check.py:
empty_custom_policy: security_tag=True default_policy=True policy_path=/tmp/tmp3c64fja7/empty_policy.j2 len=14504
This shows a regression: a user-supplied empty custom policy file is treated like no custom content and the built-in default policy appears instead.
Issues Found
- 🟠 Issue: Empty custom security policy files now fall back to the built-in default security policy instead of honoring the selected custom file content.
This review was created by an AI agent (OpenHands) on behalf of the user.
|
|
||
| def render(self, ctx: PromptContext) -> str | None: | ||
| content = ctx.template_kwargs.get("security_policy_content") | ||
| if content: |
There was a problem hiding this comment.
🟠 Important: Functional QA found that a zero-byte custom security_policy_filename now falls through to self.body, which reintroduces the built-in default policy. I exercised this via Agent(..., security_policy_filename=<empty file>).static_system_message: on origin/main the output had default_policy=False, but on this PR it has default_policy=True. Since the PR states custom policy files are inserted verbatim and should not leak the default policy, please distinguish an explicitly provided empty custom policy from missing custom content (for example, by checking content is not None rather than truthiness).
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable — the refactor removes a dead second prompt implementation, but the custom security-policy path now changes a public “template filename” behavior and has a small fallback bug.
[CRITICAL ISSUES]
- None found that require a blocking review from this automation.
[IMPROVEMENT OPPORTUNITIES]
- See the inline comments on custom security policy rendering. The data-flow simplification is good, but custom policy files should either preserve the documented Jinja-template contract or the breaking change should be made explicit with docs/deprecation and stronger tests.
[TESTING GAPS]
- The PR description includes targeted prompt-render evidence and relevant test commands. I did not see eval-monitor evidence plus human confirmation; because this changes prompt assembly/agent behavior, the repo-specific review guide says not to approve yet and to defer to a human maintainer/eval validation.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
This removes legacy prompt templates and changes the custom security-policy path in agent prompt construction. The deletion itself is low-complexity and covered by snapshots/tests, but prompt assembly is eval-sensitive and can affect benchmark/agent behavior; custom-policy compatibility is the main user-facing risk.
VERDICT:
✅ Worth merging after maintainer validation: The core cleanup direction is sound, but please address/acknowledge the custom policy compatibility concerns and have a human maintainer decide after the appropriate lightweight evals.
KEY INSIGHT:
The refactor is only safe if the Python registry remains a drop-in replacement not just for default prompts, but also for documented custom security-policy template behavior.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
| filename = self.security_policy_filename | ||
| if not filename or filename == "security_policy.j2": | ||
| return None | ||
| return (Path(self.prompt_dir) / filename).read_text(encoding="utf-8") |
There was a problem hiding this comment.
🟠 Important: This changes the documented custom security policy from a Jinja template into raw file contents. The public field still says Security policy template filename, and the old path rendered custom policies through Jinja with the normal template context, so existing policies using {{ ... }}, {% include %}, or filters will now be injected literally. Either keep custom policies rendered through render_template(...) before passing the content into the registry, or make this an explicit breaking contract change with docs/deprecation and tests that prove the new raw-text behavior is intentional.
|
|
||
| def render(self, ctx: PromptContext) -> str | None: | ||
| content = ctx.template_kwargs.get("security_policy_content") | ||
| if content: |
There was a problem hiding this comment.
🟠 Important: This truthiness check makes an empty custom security policy fall back to the built-in default policy. _read_custom_security_policy() already uses None as the sentinel for default/disabled policy, so a configured custom file whose rendered/read content is "" should not silently receive the default security instructions. Check for is not None (or key presence) instead and add a regression test for an empty custom policy.
HUMAN:
Deleteing old jinja prompts not used anymore.
AGENT:
End-to-end evidence — instantiated real agents and rendered the full system prompt
(not just unit tests). Before deleting
system_prompt.j2, I confirmed the registryreproduces its Jinja render byte-for-byte for the default and custom policies
(including the Windows shell refinement). Post-deletion render:
Suites:
ruffclean; fullagent/,context/, fncall-converter,delegate/, and theAPI-breakage signature test pass (1260 passed in the broad sweep).
Why
The default system prompt is now assembled from the pure-Python section registry
(#3610 port), but the original Jinja templates were left behind — mostly unused or
duplicating the registry. Keeping both is a drift/maintenance hazard: two sources of
truth for the same prompt.
Summary
.j2templates: 6 dead (unreferenced{% include %}wrappers,Python-ported in-context examples, an orphaned delegate description) and 4
model_specific/*(byte-for-byte duplicated inModelSpecificSection).{% include %}— intoSecuritySection(AgentBase._read_custom_security_policy), then deletesystem_prompt.j2+self_documentation.j2+security_risk_assessment.j2+security_policy.j2; a customsecurity_policy_filenamenow routes through the registry.render_templateandsystem_prompt_planning.j2areretained for the planning preset, custom user templates, and subclasses.
Issue Number
Relates to #3610 (prompt-registry port).
How to Test
Then render a prompt directly:
Video/Screenshots
N/A — library/CLI change with no UI. Render output included in the AGENT section above.
Type
Notes
Minor behavior changes for custom security policies (default-policy agents are
unaffected — already on the registry): content is identical, but inter-section spacing
is normalized to single blank lines; a disabled policy (
security_policy_filename="")no longer emits empty
<SECURITY></SECURITY>tags; and custom policy files are insertedverbatim (no longer Jinja-rendered). The static "matches-legacy" test oracle had become
self-referential (default agents take the registry path), so it's removed along with
system_prompt.j2;test_prompt_snapshot.py(byte-exact registry snapshots) carries thestatic regression guard, and the dynamic oracle against
system_message_suffix.j2stays.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:b140712-pythonRun
All tags pushed for this build
About Multi-Architecture Support
b140712-python) is a multi-arch manifest supporting both amd64 and arm64b140712-python-amd64) are also available if needed