refactor(sdk): remove Jinja prompts superseded by the Python prompt registry by VascoSch92 · Pull Request #3796 · OpenHands/software-agent-sdk

VascoSch92 · 2026-06-18T15:58:50Z

HUMAN:

Deleteing old jinja prompts not used anymore.

AGENT:

End-to-end evidence — instantiated real agents and rendered the full system prompt
(not just unit tests). Before deleting system_prompt.j2, I confirmed the registry
reproduces its Jinja render byte-for-byte for the default and custom policies
(including the Windows shell refinement). Post-deletion render:

$ uv run python -c "<build agents, print static_system_message markers>"
default(claude): len=14504  <ROLE>=True  <SECURITY>=True  default-policy=True  <IMPORTANT>=True
gpt-5: model-specific block present = True
custom policy: CUSTOM_RULE present =True  default-policy leaked =False
disabled policy: any <SECURITY> tag = False

Suites: ruff clean; full agent/, context/, fncall-converter, delegate/, and the
API-breakage signature test pass (1260 passed in the broad sweep).

Why

The default system prompt is now assembled from the pure-Python section registry
(#3610 port), but the original Jinja templates were left behind — mostly unused or
duplicating the registry. Keeping both is a drift/maintenance hazard: two sources of
truth for the same prompt.

Summary

Delete 10 redundant .j2 templates: 6 dead (unreferenced {% include %} wrappers,
Python-ported in-context examples, an orphaned delegate description) and 4
model_specific/* (byte-for-byte duplicated in ModelSpecificSection).
Port the last Jinja-only behavior — the custom security-policy {% include %} — into
SecuritySection (AgentBase._read_custom_security_policy), then delete
system_prompt.j2 + self_documentation.j2 + security_risk_assessment.j2 +
security_policy.j2; a custom security_policy_filename now routes through the registry.
Net −743/+80 across 20 files; render_template and system_prompt_planning.j2 are
retained for the planning preset, custom user templates, and subclasses.

Issue Number

Relates to #3610 (prompt-registry port).

How to Test

uv run pytest tests/sdk/context/prompts/ \
  tests/sdk/context/test_prompt_model_spec.py \
  tests/sdk/context/test_prompt_absolute_path.py \
  tests/sdk/agent/test_system_prompt.py \
  tests/sdk/agent/test_security_policy_integration.py

Then render a prompt directly:

uv run python -c "
from openhands.sdk.agent import Agent
from openhands.sdk.llm import LLM
print('SECURITY' in Agent(llm=LLM(model='claude-sonnet-4-5', usage_id='t'), tools=[]).static_system_message)
"

Video/Screenshots

N/A — library/CLI change with no UI. Render output included in the AGENT section above.

Type

Notes

Minor behavior changes for custom security policies (default-policy agents are
unaffected — already on the registry): content is identical, but inter-section spacing
is normalized to single blank lines; a disabled policy (security_policy_filename="")
no longer emits empty <SECURITY></SECURITY> tags; and custom policy files are inserted
verbatim (no longer Jinja-rendered). The static "matches-legacy" test oracle had become
self-referential (default agents take the registry path), so it's removed along with
system_prompt.j2; test_prompt_snapshot.py (byte-exact registry snapshots) carries the
static regression guard, and the dynamic oracle against system_message_suffix.j2 stays.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:b140712-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-b140712-python \
  ghcr.io/openhands/agent-server:b140712-python

All tags pushed for this build

ghcr.io/openhands/agent-server:b140712-golang-amd64
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-golang-amd64
ghcr.io/openhands/agent-server:vasco-clean-jinja-golang-amd64
ghcr.io/openhands/agent-server:b140712-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:b140712-golang-arm64
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-golang-arm64
ghcr.io/openhands/agent-server:vasco-clean-jinja-golang-arm64
ghcr.io/openhands/agent-server:b140712-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:b140712-java-amd64
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-java-amd64
ghcr.io/openhands/agent-server:vasco-clean-jinja-java-amd64
ghcr.io/openhands/agent-server:b140712-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:b140712-java-arm64
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-java-arm64
ghcr.io/openhands/agent-server:vasco-clean-jinja-java-arm64
ghcr.io/openhands/agent-server:b140712-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:b140712-python-amd64
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-python-amd64
ghcr.io/openhands/agent-server:vasco-clean-jinja-python-amd64
ghcr.io/openhands/agent-server:b140712-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:b140712-python-arm64
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-python-arm64
ghcr.io/openhands/agent-server:vasco-clean-jinja-python-arm64
ghcr.io/openhands/agent-server:b140712-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:b140712-golang
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-golang
ghcr.io/openhands/agent-server:vasco-clean-jinja-golang
ghcr.io/openhands/agent-server:b140712-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:b140712-java
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-java
ghcr.io/openhands/agent-server:vasco-clean-jinja-java
ghcr.io/openhands/agent-server:b140712-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:b140712-python
ghcr.io/openhands/agent-server:b140712e05c8d9c77b40c1f6116e14223ffd2da5-python
ghcr.io/openhands/agent-server:vasco-clean-jinja-python
ghcr.io/openhands/agent-server:b140712-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., b140712-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., b140712-python-amd64) are also available if needed

…egistry

github-actions · 2026-06-18T15:59:34Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-18T15:59:42Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-18T16:02:15Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/agent
base.py	354	37	89%	63, 67, 76, 79, 82, 87, 129–130, 268, 299, 325, 329, 333–334, 355, 451, 630, 677–679, 706, 716, 724–725, 767, 769–770, 842–844, 846–848, 884–885, 895–896
TOTAL	32471	9002	72%

all-hands-bot · 2026-06-18T16:09:17Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

I verified the SDK prompt-rendering paths as a real user would; the main refactor works for default, disabled, normal custom-policy, custom-template, and planning-template usage, but an empty custom policy file now leaks the built-in default policy.

Does this PR achieve its stated goal?

Mostly yes. The PR set out to remove redundant built-in Jinja prompts while keeping the Python prompt registry behavior equivalent and routing custom security policies through it; running Agent(...).static_system_message on base vs. the PR showed identical default Claude/GPT-5 prompt hashes and working normal custom/disabled policy behavior. However, a zero-byte custom security_policy_filename regresses: base rendered no default policy for that custom file, while the PR falls back to the built-in security policy.

Phase	Result
Environment Setup	✅ `make build` completed successfully.
CI Status	🟡 No failures observed; many checks pass, with coverage/QA/build jobs still pending when checked.
Functional Verification	⚠️ Core paths pass; empty custom policy file regression found.

Functional Verification

Test 1: Default prompt parity and normal custom-policy behavior

Step 1 — Establish baseline on origin/main:
Ran git checkout --quiet origin/main && uv run python /tmp/qa_prompt_check.py:

default_claude: len=14504 sha=c9d7d3f665687ed7 security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=False claude_specific=True
gpt5: len=15202 sha=0ecf2532b0abacfc security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=True claude_specific=False
disabled_policy: security_tag=True default_policy=False len=13020 sha=6ac4a8b90cd27349
custom_policy: custom_rule=True jinja_variable_literal=False jinja_variable_rendered=False default_policy_leaked=False security_tag_count=1 len=13062 sha=adb80e715e640b47
custom_template: CUSTOM_TEMPLATE model_family=openai_gpt policy=security_policy.j2
planning_template: len=5249 sha=9dfe768574be2daf planning_marker=True

This establishes the old user-facing behavior: default prompts include the expected sections/model-specific instructions; custom policy content replaces the default policy; custom system templates and the planning template render through Jinja.

Step 2 — Apply the PR's changes:
Checked out vasco/clean-jinja at b140712e05c8d9c77b40c1f6116e14223ffd2da5.

Step 3 — Re-run with the PR in place:
Ran git checkout --quiet vasco/clean-jinja && uv run python /tmp/qa_prompt_check.py:

default_claude: len=14504 sha=c9d7d3f665687ed7 security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=False claude_specific=True
gpt5: len=15202 sha=0ecf2532b0abacfc security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=True claude_specific=False
disabled_policy: security_tag=False default_policy=False len=12991 sha=5b8bface75f397d6
custom_policy: custom_rule=True jinja_variable_literal=True jinja_variable_rendered=False default_policy_leaked=False security_tag_count=1 len=13073 sha=27cb578ec84d66b9
custom_template: CUSTOM_TEMPLATE model_family=openai_gpt policy=security_policy.j2
planning_template: len=5249 sha=9dfe768574be2daf planning_marker=True

This confirms the main refactor goal for normal usage: default Claude and GPT-5 prompts are byte-identical by hash, normal custom policy content is included without default-policy leakage, disabled policy no longer emits empty <SECURITY> tags, and retained Jinja escape hatches still render.

Test 2: Edge case — empty custom security policy file

Step 1 — Establish baseline on origin/main:
Ran git checkout --quiet origin/main && uv run python /tmp/qa_empty_policy_check.py:

empty_custom_policy: security_tag=True default_policy=False policy_path=/tmp/tmp9zhd0u84/empty_policy.j2 len=13022

This shows that, before the PR, choosing a custom policy file did not silently reintroduce the built-in policy even when the file was empty.

Step 2 — Apply the PR's changes:
Checked out vasco/clean-jinja at b140712e05c8d9c77b40c1f6116e14223ffd2da5.

Step 3 — Re-run with the PR in place:
Ran git checkout --quiet vasco/clean-jinja && uv run python /tmp/qa_empty_policy_check.py:

empty_custom_policy: security_tag=True default_policy=True policy_path=/tmp/tmp3c64fja7/empty_policy.j2 len=14504

This shows a regression: a user-supplied empty custom policy file is treated like no custom content and the built-in default policy appears instead.

Issues Found

🟠 Issue: Empty custom security policy files now fall back to the built-in default security policy instead of honoring the selected custom file content.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-06-18T16:12:51Z


+    def render(self, ctx: PromptContext) -> str | None:
+        content = ctx.template_kwargs.get("security_policy_content")
+        if content:


🟠 Important: Functional QA found that a zero-byte custom security_policy_filename now falls through to self.body, which reintroduces the built-in default policy. I exercised this via Agent(..., security_policy_filename=<empty file>).static_system_message: on origin/main the output had default_policy=False, but on this PR it has default_policy=True. Since the PR states custom policy files are inserted verbatim and should not leak the default policy, please distinguish an explicitly provided empty custom policy from missing custom content (for example, by checking content is not None rather than truthiness).

all-hands-bot

🟡 Taste Rating: Acceptable — the refactor removes a dead second prompt implementation, but the custom security-policy path now changes a public “template filename” behavior and has a small fallback bug.

[CRITICAL ISSUES]

None found that require a blocking review from this automation.

[IMPROVEMENT OPPORTUNITIES]

See the inline comments on custom security policy rendering. The data-flow simplification is good, but custom policy files should either preserve the documented Jinja-template contract or the breaking change should be made explicit with docs/deprecation and stronger tests.

[TESTING GAPS]

The PR description includes targeted prompt-render evidence and relevant test commands. I did not see eval-monitor evidence plus human confirmation; because this changes prompt assembly/agent behavior, the repo-specific review guide says not to approve yet and to defer to a human maintainer/eval validation.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM
This removes legacy prompt templates and changes the custom security-policy path in agent prompt construction. The deletion itself is low-complexity and covered by snapshots/tests, but prompt assembly is eval-sensitive and can affect benchmark/agent behavior; custom-policy compatibility is the main user-facing risk.

VERDICT:
✅ Worth merging after maintainer validation: The core cleanup direction is sound, but please address/acknowledge the custom policy compatibility concerns and have a human maintainer decide after the appropriate lightweight evals.

KEY INSIGHT:
The refactor is only safe if the Python registry remains a drop-in replacement not just for default prompts, but also for documented custom security-policy template behavior.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot · 2026-06-18T16:13:33Z

+        filename = self.security_policy_filename
+        if not filename or filename == "security_policy.j2":
+            return None
+        return (Path(self.prompt_dir) / filename).read_text(encoding="utf-8")


🟠 Important: This changes the documented custom security policy from a Jinja template into raw file contents. The public field still says Security policy template filename, and the old path rendered custom policies through Jinja with the normal template context, so existing policies using {{ ... }}, {% include %}, or filters will now be injected literally. Either keep custom policies rendered through render_template(...) before passing the content into the registry, or make this an explicit breaking contract change with docs/deprecation and tests that prove the new raw-text behavior is intentional.

all-hands-bot · 2026-06-18T16:13:33Z


+    def render(self, ctx: PromptContext) -> str | None:
+        content = ctx.template_kwargs.get("security_policy_content")
+        if content:


🟠 Important: This truthiness check makes an empty custom security policy fall back to the built-in default policy. _read_custom_security_policy() already uses None as the sentinel for default/disabled policy, so a configured custom file whose rendered/read content is "" should not silently receive the default security instructions. Check for is not None (or key presence) instead and add a regression test for an empty custom policy.

refactor(sdk): remove Jinja prompts superseded by the Python prompt r…

25768b2

…egistry

VascoSch92 and others added 2 commits June 18, 2026 18:06

refactor(sdk): read custom security policy via pathlib.Path

982c9fe

Merge branch 'main' into vasco/clean-jinja

b140712

VascoSch92 requested a review from all-hands-bot June 18, 2026 16:07

all-hands-bot reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(sdk): remove Jinja prompts superseded by the Python prompt registry#3796

refactor(sdk): remove Jinja prompts superseded by the Python prompt registry#3796
VascoSch92 wants to merge 3 commits into
mainfrom
vasco/clean-jinja

VascoSch92 commented Jun 18, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

all-hands-bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Jun 18, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Jun 18, 2026

Uh oh!

all-hands-bot Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VascoSch92 commented Jun 18, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Issue Number

How to Test

Video/Screenshots

Type

Notes

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: Default prompt parity and normal custom-policy behavior

Test 2: Edge case — empty custom security policy file

Issues Found

Uh oh!

all-hands-bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VascoSch92 commented Jun 18, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

all-hands-bot commented Jun 18, 2026 •

edited

Loading