fix(acp): mask injected secrets in tool-call output and streamed text#3463
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review — fix(acp): mask injected secrets in tool-call output and streamed text
This is a solid, well-structured P0 security fix. The core strategy is correct: mask at ingestion (per streaming chunk, per tool-call entry) and re-mask at the persistence boundary after joining accumulated chunks — the only point where a secret split across two streamed chunks can be caught. The _mask_json_value recursive walker is clean, the defensive error-handling in _mask_value (fall back to original value, log, never crash) matches the existing terminal tool contract, and the test coverage is thorough, including the split-chunk edge case.
A few observations flagged inline below.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the ACPAgent path with real SDK Conversations and temporary ACP stdio subprocesses: injected ACP env secrets now mask in streamed text, tool-call raw fields, and persisted turn output.
Does this PR achieve its stated goal?
Yes. The PR set out to prevent ACP subprocess output from leaking injected credentials into streamed callbacks, tool-call events, and persisted finish/observation text. I reproduced the leak on main with a real Conversation(ACPAgent(...)) talking to an ACP stdio server that echoed ACP_QA_TOKEN, then reran the same scenarios on a2eec14e; the secret was replaced with <secret-hidden> everywhere expected, including when the secret was split across two streamed chunks and only reassembled at final persistence.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv environment usable; SDK and ACP subprocess flow ran successfully |
| CI Status | Python API check is failing and qa-changes was still in progress when checked |
| Functional Verification | ✅ Real ACPAgent conversations verified direct stream/tool output and split-chunk persistence masking |
Functional Verification
Test 1: ACP subprocess echoes an injected env secret in streamed text and tool-call fields
Step 1 — Reproduce / establish baseline (without the fix):
Checked out main and ran a real SDK conversation using ACPAgent(acp_env={"ACP_QA_TOKEN": "QA_SUPER_SECRET_12345"}) against /tmp/fake_acp_echo_agent.py, a small ACP stdio subprocess that read the env var and emitted it in AgentMessageChunk, ToolCallStart.rawInput, ToolCallStart.rawOutput, and ToolCallProgress.rawOutput.
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_conversation_qa.py:
token_outputs=['stream:QA_SUPER_SECRET_12345']
tool_event={'status': 'in_progress', 'raw_input': {'command': 'printf QA_SUPER_SECRET_12345'}, 'raw_output': 'starting QA_SUPER_SECRET_12345'}
tool_event={'status': 'completed', 'raw_input': {'command': 'printf QA_SUPER_SECRET_12345'}, 'raw_output': 'done QA_SUPER_SECRET_12345'}
action_event=message='stream:QA_SUPER_SECRET_12345' kind='FinishAction'
observation=content=[TextContent(cache_prompt=False, type='text', text='stream:QA_SUPER_SECRET_12345')] is_error=False kind='FinishObservation'
event_count=6
This confirms the original bug: an ACP subprocess can echo an injected credential and it reaches streamed token callbacks, ACP tool-call events, and final turn output in cleartext.
Step 2 — Apply the PR's changes:
Checked out fix-acp-mask-injected-secrets at a2eec14ed9f5b7ba36780abb74c170060283e653.
Step 3 — Re-run with the fix in place:
Ran the same command:
token_outputs=['stream:<secret-hidden>']
tool_event={'status': 'in_progress', 'raw_input': {'command': 'printf <secret-hidden>'}, 'raw_output': 'starting <secret-hidden>'}
tool_event={'status': 'completed', 'raw_input': {'command': 'printf <secret-hidden>'}, 'raw_output': 'done <secret-hidden>'}
action_event=message='stream:<secret-hidden>' kind='FinishAction'
observation=content=[TextContent(cache_prompt=False, type='text', text='stream:<secret-hidden>')] is_error=False kind='FinishObservation'
event_count=6
This shows the PR masks the injected env secret before it reaches streamed callbacks, raw tool-call input/output fields, and the final finish/observation events.
Test 2: Secret split across two stream chunks is masked at final persistence
Step 1 — Reproduce / establish baseline (without the fix):
Checked out main and ran /tmp/acp_split_conversation_qa.py against /tmp/fake_acp_split_agent.py, which emits the injected secret as two AgentMessageChunks: stream:QA_SUPER_ and SECRET_12345.
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_split_conversation_qa.py:
token_outputs=['stream:QA_SUPER_', 'SECRET_12345']
action_event=message='stream:QA_SUPER_SECRET_12345' kind='FinishAction'
observation=content=[TextContent(cache_prompt=False, type='text', text='stream:QA_SUPER_SECRET_12345')] is_error=False kind='FinishObservation'
event_count=4
This confirms the baseline leak at the persistence boundary: the full secret is reconstructed in FinishAction / FinishObservation.
Step 2 — Apply the PR's changes:
Checked out fix-acp-mask-injected-secrets at a2eec14ed9f5b7ba36780abb74c170060283e653.
Step 3 — Re-run with the fix in place:
Ran the same command:
token_outputs=['stream:QA_SUPER_', 'SECRET_12345']
action_event=message='stream:<secret-hidden>' kind='FinishAction'
observation=content=[TextContent(cache_prompt=False, type='text', text='stream:<secret-hidden>')] is_error=False kind='FinishObservation'
event_count=4
This verifies the re-mask at finalization catches a secret that individual streaming chunks cannot match.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
Final verdict: PASS
- Add a minimum-length guard to SecretRegistry.track_exported_value so trivially short non-secret config injected via acp_env (e.g. VERBOSITY=1) is not registered for masking — avoids destructive over-masking of common substrings. Real credentials clear it trivially. - Mask the tool-call `title` field too (a misbehaving ACP server could echo a credential in the title). - Reword _mask_value docstring to accurately describe the crash-safe fallback (may transiently leak on mask failure rather than "never"). - Expand the split-chunk test comment explaining the intentional per-chunk masking bypass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed all five review suggestions in 41d1b73 (min-length guard on track_exported_value, mask tool-call |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review — fix(acp): mask injected secrets in tool-call output and streamed text
This is a well-executed, layered P0 security fix. The strategy is correct: mask at ingestion per-chunk, mask tool-call fields at accumulation time in the _OpenHandsACPBridge accumulator, and re-mask at the persistence boundary to catch secrets split across two streamed chunks. All four feedback items from the prior review have been addressed in 41d1b73 (min-length guard, title field in _mask_tool_call_entry, docstring wording, test comment). A few additional observations below.
Summary
| Correctness | ✅ Masking is applied at every emission path: streamed text, thought chunks, tool-call start/progress, and the persistence join. |
| Test coverage | ✅ Tests cover the recursive masker, each bridge path, split-chunk join-boundary masking, injection seeding, no-op when unset, and the crash-safe fallback contract. |
| Known gap | ask_agent() fork path is explicitly deferred — acceptable; per PR description it is a smaller, separate surface. |
| Temporary seam | 🟡 acp_env / agent_context.secrets bypass-seeding acknowledged as temporary pending #1022. |
Observations
1. _MIN_TRACKED_VALUE_LENGTH = 4 does not protect against common 4-char config strings
The comment motivates the guard with VERBOSITY=1 (1 char), but >= 4 still admits very common 4-char non-secret config values — "true", "prod", "info", "main", "home" — that could easily appear in acp_env. If registered in the mask set, those substrings would be destructively redacted everywhere in ACP output. This is a known and documented trade-off, but the comment currently frames the constant as a general solution rather than a short-lived pragmatic guard. Noting explicitly that #1022 is the correct fix (not raising the constant) would make the intent clearer to future readers.
2. ToolCallProgress masking relies on merge-before-mask ordering — not visible in the diff
The new _mask_tool_call_entry(target) call (line 729) is placed correctly after the existing merge loop that writes progress fields into tc (raw_input, raw_output, content, etc.) and before _emit_tool_call_event(target). This ordering is essential for correctness, but the merge loop is above the diff hunk and invisible to diff-only reviewers. The test test_tool_call_progress_masks_new_raw_output validates it, but a one-line comment ("merge loop above has written progress fields into target; mask before emitting") would make the invariant self-documenting.
3. agent_context.secrets values already in env are neither injected nor seeded
The if name in env: continue guard skips names already present in env. Such values are not passed to track_exported_value, so if they originated from os.environ (rather than acp_env) their actual value would be in the subprocess environment without being registered in the mask set. In practice this is a narrow gap (values from acp_env are handled by the prior loop; ambient os.environ values were not injected by the agent), but a comment would help future readers understand the exemption.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
The main ACP secret-masking behavior works in a real ACP conversation, but split streamed chunks still expose reconstructable secret fragments on the live token path.
Does this PR achieve its stated goal?
Partially. I verified with a real Conversation using ACPAgent against a temporary ACP stdio server that secrets injected through both acp_env and AgentContext.secrets are masked from ACP tool-call events, full streamed chunks, thought accumulation, and persisted FinishAction output. However, when the ACP server streams the same secret split across two consecutive message chunks, the persisted event is masked but the live token callback still receives adjacent fragments (split-start:qaSecretV, then alue12345:split-end), which is still reconstructable on the streamed/WebSocket path.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; dependencies installed into .venv |
| CI Status | Python API breakage checks/Python API), 3 pending, 17 skipped |
| Functional Verification |
Functional Verification
Test 1: ACP output leaking injected secrets before the PR
Step 1 — Reproduce / establish baseline without the fix:
I created a temporary ACP stdio server that reads QA_ACP_SECRET from its environment and emits it through:
AgentMessageChunkstreamed textAgentThoughtChunktextToolCallStarttitle/raw input/raw outputToolCallProgressraw output- two consecutive split
AgentMessageChunkchunks
Ran:
git checkout origin/main
uv run python /tmp/qa_acp_mask_probe.pyRelevant output excerpt:
[
{
"mode": "acp_env",
"secret_in_callback_events": true,
"secret_in_state_events": true,
"secret_in_tokens": true,
"accumulated_thoughts": ["thought-full:qaSecretValue12345\n"],
"tokens": ["streamed-full:qaSecretValue12345\n", "split-start:qaSecretV", "alue12345:split-end"],
"callback_event_summaries": [
{"type": "ACPToolCallEvent", "title": "Run echo qaSecretValue12345", "raw_output": "started:qaSecretValue12345"},
{"type": "ActionEvent", "action": {"message": "streamed-full:qaSecretValue12345\nsplit-start:qaSecretValue12345:split-end"}}
]
},
{
"mode": "agent_context",
"secret_in_callback_events": true,
"secret_in_state_events": true,
"secret_in_tokens": true
}
]This confirms the original bug: an ACP subprocess echoing an injected value put the secret into live callbacks, persisted state events, tool-call fields, thought text, and final response text.
Step 2 — Apply the PR's changes:
Ran:
git checkout fix-acp-mask-injected-secretsStep 3 — Re-run with the fix in place:
Ran:
uv run python /tmp/qa_acp_mask_probe.pyRelevant output excerpt:
[
{
"mode": "acp_env",
"secret_in_callback_events": false,
"secret_in_state_events": false,
"secret_in_tokens": false,
"accumulated_thoughts": ["thought-full:<secret-hidden>\n"],
"tokens": ["streamed-full:<secret-hidden>\n", "split-start:qaSecretV", "alue12345:split-end"],
"callback_event_summaries": [
{"type": "ACPToolCallEvent", "title": "Run echo <secret-hidden>", "raw_input": {"cmd": "echo <secret-hidden>", "nested": ["<secret-hidden>", 7]}, "raw_output": "started:<secret-hidden>"},
{"type": "ACPToolCallEvent", "raw_output": "completed:<secret-hidden>"},
{"type": "ActionEvent", "action": {"message": "streamed-full:<secret-hidden>\nsplit-start:<secret-hidden>:split-end"}}
]
},
{
"mode": "agent_context",
"secret_in_callback_events": false,
"secret_in_state_events": false,
"secret_in_tokens": false,
"accumulated_thoughts": ["thought-full:<secret-hidden>\n"]
}
]This shows the PR fixes the primary leak for both injection channels and masks persisted state/event output. The remaining caveat is visible in the token list: split chunks still reveal the two pieces on the live token path, although the final persisted FinishAction is re-masked after joining.
Issues Found
- 🟠 Issue: Secrets split across consecutive ACP streamed message chunks are still emitted as reconstructable fragments through the live token callback/WebSocket path, even though the persisted
FinishActionis masked after joining.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
41d1b73 to
ac8e20c
Compare
|
Rebased onto current
The previously-failing |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review — fix(acp): mask injected secrets in tool-call output and streamed text
This is a well-layered P0 security fix. Two prior automated reviews have already been posted and the author has addressed all five items from the first pass (min-length guard, title masking, docstring wording, acp_env min-length, test comment). The second review raised four additional observations (IDs 3342204848, 3342204856, 3342204874, 3342207690) that are still unresolved. This review focuses on what remains open and adds one additional observation not yet raised.
Status of Open Items from Prior Reviews
| # | Thread | Severity | Status |
|---|---|---|---|
| 1 | _MIN_TRACKED_VALUE_LENGTH = 4 — common 4-char strings like "prod", "true", "info" remain susceptible to over-masking |
🟡 Design note | Unresolved — acknowledged as a pre-existing trade-off; #1022 is the correct long-term fix |
| 2 | Merge-before-mask ordering at _mask_tool_call_entry(target) is implicit from the diff — a one-line comment would make the invariant explicit |
🟡 Nit | Unresolved |
| 3 | if name in env: continue skips track_exported_value for names already present from os.environ — narrow gap worth a comment |
🟡 Minor | Unresolved |
| 4 | Functional QA: a secret split across two consecutive on_token stream chunks still crosses the network as reconstructable plaintext fragments before the persistence-boundary re-mask fires |
🟠 Important | Unresolved — no author response yet |
New Observation
_fork_accumulated_text is not masked (ask_agent fork path)
The ask_agent() fork path accumulates text chunks without masking:
# session_update, lines ~690-694 (current file)
if self._fork_session_id is not None and session_id == self._fork_session_id:
if isinstance(update, AgentMessageChunk):
if isinstance(update.content, TextContentBlock):
self._fork_accumulated_text.append(update.content.text) # no masking
returnThe PR description explicitly defers this ("The ask_agent() fork path is intentionally out of scope"), which is a reasonable scoping call. However, depending on how _fork_accumulated_text is later consumed (returned to the caller, persisted, or relayed), a credential in the fork response could also appear in cleartext. A one-line comment at the fork early-return would make the deliberate deferral discoverable for future readers — right now the only documentation of this decision is in the PR description, which is harder to find during a code audit.
Summary Assessment
The core design is sound and the test suite covers the essential correctness properties well (per-chunk masking, join-boundary re-mask, injection seeding, error fallback). The main actionable item before merge is resolving the question from item 4 above:
Does the security goal explicitly include preventing split-secret fragments from crossing the WebSocket/network boundary via the live streaming path, or is the guarantee limited to the persisted event stream?
If the answer is "persisted stream only," a code comment or PR description update scoping this out would close the question cleanly. If the answer is "includes live streaming," the on_token path needs buffered/rolling masking (e.g. maintaining a sliding window across the last N bytes) — a non-trivial change worth a follow-up issue.
Everything else looks correct: defensive error handling, lifecycle cleanup of mask in reset() and _clear_turn_callbacks(), both step() and astep() call sites, and the retry paths inside each are all handled consistently.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified ACP secret masking behavior with real SDK/ACP schema objects; the PR removes plaintext secret leakage from streamed text, thought accumulation, tool-call events, and tool-call accumulators in the exercised paths.
Does this PR achieve its stated goal?
Yes. The stated goal is to prevent injected ACP secrets from landing in streamed text/tool-call output and to seed exported injected values into the mask set. On the base commit, the same ACP session updates leaked acp-secret-12345 through token callbacks, accumulated text/thoughts, emitted ACPToolCallEvents, and the in-memory tool-call accumulator; on the PR commit, the plaintext secret was absent and every exercised sink contained <secret-hidden> instead. The new track_exported_value path was also available and masked the registered injected value.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; no test/lint/typecheck commands were run. |
| CI Status | gh pr view: core Run tests jobs green; unresolved-review-threads failing and several Agent Server/QA jobs still in progress at observation time. |
| Functional Verification | ✅ Before/after SDK script reproduced the leak on base and verified masking on PR head. |
Functional Verification
Test 1: ACP streamed text and tool-call output are masked before relay/persistence sinks
Step 1 — Reproduce / establish baseline without the fix:
Ran git switch --detach 1c385741d98b3733209984db6a2fdd1883a8e856 && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_mask_behavior.py.
Key output:
SUMMARY {"leak_locations": ["tokens", "accumulated_text", "accumulated_thoughts", "event_payloads", "accumulated_tool_calls"], "mask_sentinel_present": false, "plaintext_secret_present": true}
"tokens": ["streamed token acp-secret-12345"]
"raw_output": {"stdout": "final stdout acp-secret-12345", "lines": ["acp-secret-12345"]}
"track_exported_value_available": falseThis confirms the baseline bug: even when the secret registry has the secret in its mask set, ACP streamed tokens, thoughts, emitted tool-call events, and accumulated tool-call data carried plaintext.
Step 2 — Apply the PR's changes:
Checked out PR head ac8e20c57546ce3b209b61dc0217e7daa92a4c5b.
Step 3 — Re-run with the fix in place:
Ran git switch --detach ac8e20c57546ce3b209b61dc0217e7daa92a4c5b && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_mask_behavior.py.
Key output:
SUMMARY {"leak_locations": [], "mask_sentinel_present": true, "plaintext_secret_present": false}
"tokens": ["streamed token <secret-hidden>"]
"raw_input": {"command": "echo <secret-hidden>", "nested": ["safe", "<secret-hidden>"]}
"raw_output": {"stdout": "final stdout <secret-hidden>", "lines": ["<secret-hidden>"]}
"track_exported_value_available": true
"track_exported_value_mask_result": "registered=<secret-hidden>"This shows the fix works for the exercised user-visible sinks: streamed token relay, thought accumulation, start/progress ACPToolCallEvent payloads, nested JSON raw input/output, and the accumulator all contain the mask sentinel and no plaintext secret.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
|
@all-hands-bot re: the QA "PASS WITH ISSUES" — split-chunk live-relay fragments. Acknowledged and accepted as a documented best-effort boundary for this PR:
Filed as a follow-up: OpenHands/agent-canvas#1049. |
ac8e20c to
17f899f
Compare
|
Pushed |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified ACP secret masking behavior before/after: the base branch leaked an injected secret through ACP streamed text, tool-call fields, and the finalized turn message, while this PR masks those values with <secret-hidden>.
Does this PR achieve its stated goal?
Yes. The PR sets out to prevent injected ACP secrets from landing in streamed text/tool-call event output and persisted turn text. I exercised the ACP session-update path with real acp.schema message/thought/tool-call objects and a SecretRegistry seeded through update_secrets() + get_secret_value(): base leaked supersecret, while the PR produced leaked_secret=False, masked streamed text/tool-call fields, and masked a split secret after finalizing the turn.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; dependencies installed via uv sync --dev. |
| CI Status | ⏳ Observed 27 successful, 1 skipped, 4 pending, 0 failing checks via gh pr checks. |
| Functional Verification | ✅ Before/after SDK execution confirmed the security leak is fixed for the exercised ACP paths. |
Functional Verification
Test 1: ACP streamed text and tool-call event fields are masked
Step 1 — Reproduce / establish baseline (without the fix):
Ran git switch --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_mask_qa.py with a script that sends ACP AgentMessageChunk, AgentThoughtChunk, ToolCallStart, and ToolCallProgress objects containing supersecret through _OpenHandsACPBridge after seeding SecretRegistry:
leaked_secret= True
mask_count= 0
streamed_tokens= ['message token=supersecret']
start_event_raw_input= {'argv': ['echo', 'supersecret'], 'command': 'echo supersecret'}
terminal_event_raw_output= {'stdout': 'done supersecret'}
This confirms the bug exists on base: streamed tokens and tool-call event fields preserve the raw injected secret.
Step 2 — Apply the PR's changes:
Checked out fix-acp-mask-injected-secrets at 17f899f88d817afbab31d00fe3af684feec5b0f3.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_mask_qa.py:
leaked_secret= False
mask_count= 19
streamed_tokens= ['message token=<secret-hidden>']
start_event_raw_input= {'argv': ['echo', '<secret-hidden>'], 'command': 'echo <secret-hidden>'}
terminal_event_raw_output= {'stdout': 'done <secret-hidden>'}
This shows the changed behavior works: streamed message text and nested tool-call raw input/output are masked before reaching the emitted/accumulated ACP event data.
Test 2: Finalized ACP turn masks a secret split across chunks
Step 1 — Reproduce / establish baseline (without the fix):
The same baseline run populated an ACP agent client's accumulated response with joined split super + secret value, then finalized the turn:
finish_messages= ['joined split supersecret value']
This confirms the persistence-boundary bug exists on base: the joined final response reassembles and leaks the secret.
Step 2 — Apply the PR's changes:
Checked out fix-acp-mask-injected-secrets at 17f899f88d817afbab31d00fe3af684feec5b0f3.
Step 3 — Re-run with the fix in place:
The PR run produced:
finish_messages= ['joined split <secret-hidden> value']
This confirms the final turn text is re-masked after joining chunks, covering the split-secret case described in the PR.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
Summary
This PR addresses a real security vulnerability: injected secrets in the ACP subprocess environment could leak back in cleartext through the persisted event stream and live on_token / on_event sinks. The fix is well-structured with a layered approach:
- Per-chunk masking at ingestion in
session_update(best-effort, handles most cases) - Per-frame re-masking of tool-call entries (handles cumulative progress frames)
- Join-boundary re-masking in
_finalize_successful_turn(catches secrets split across chunk boundaries)
The recursive _mask_json_value helper handles arbitrary JSON-structured tool call payloads correctly. Test coverage is comprehensive and the defensive error handling in _mask_value matches the existing masking contract.
One security gap remains that should be addressed before merging.
🔴 Critical: ask_agent() fork path does not mask accumulated text
The _fork_accumulated_text list (used by ask_agent()) is populated at line 693 without masking:
# Line 693 — unchanged by this PR, but within scope of the fix
self._fork_accumulated_text.append(update.content.text) # ← not maskedThe result is returned directly to the caller at ask_agent() line 2656:
result = "".join(client._fork_accumulated_text) # ← may contain plaintext secretWhile the fork result is not persisted to the event stream in the same way as the main turn, ask_agent() is documented as a "sidebar reply" that is typically displayed in the UI and relayed to network consumers. A secret echoed by the ACP server in the fork session would reach whoever receives the ask_agent() return value in cleartext.
Fix: Apply self._mask_value() when appending to _fork_accumulated_text:
if self._fork_session_id is not None and session_id == self._fork_session_id:
if isinstance(update, AgentMessageChunk):
if isinstance(update.content, TextContentBlock):
self._fork_accumulated_text.append(self._mask_value(update.content.text))
returnNote that self.mask is already set by _reset_client_for_turn() before step()/astep() runs, and ask_agent() runs concurrently on the shared self._client — so self.mask is already available when the fork path executes.
A test mirroring test_message_chunk_masked_in_relay_and_accumulation but routing through _fork_session_id would complete coverage.
🟡 Acknowledged residual risk: per-chunk on_token leaks split secrets
The code comment correctly documents that a secret split across two streaming chunks will reach on_token in cleartext (each half doesn't match the pattern alone). The join-boundary re-mask catches it before persistence. This is a reasonable trade-off for streaming UX. Worth keeping the comment as documentation of the known limitation.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
The regular terminal masks every observation via secret_registry.mask_secrets_in_output(); ACP masked nothing, so a subprocess that echoed an injected credential (env, a failing curl printing an Authorization header) landed in the persisted event stream and WebSocket relay in cleartext — the largest credential-leak surface, widened on cloud where subscription tokens are injected. Bind the registry masker onto the bridge once in _start_acp_server (conversation-stable; a pure read of _exported_values with none of the cross-thread/state-lock hazards that make on_event/on_token per-turn) and route ACP output through it before emitting: - streamed AgentMessageChunk text (on_token + accumulated_text) and AgentThoughtChunk text - tool-call title/raw_input/raw_output/content, masked at ingestion so the started event, the single terminal event (#3465 dedup), and the flush/supersede path all carry masked values and the accumulator never holds plaintext - the ask_agent() fork accumulator + its joined return value, which is relayed to the caller/UI (binding the masker once is what makes this reachable even when ask_agent runs while no turn is active) - the joined response_text/thought_text in _finalize_successful_turn (persistence boundary), which also catches secrets split across chunks The mask set is the registry's _exported_values, seeded by get_secret_value when _start_acp_server reads state.secret_registry — the canonical channel that #3464 routes provider creds and (on the caller/cloud path) agent_context.secrets through. No new SecretRegistry API; the deprecated acp_env / canvas-local drain are out of scope. A failing masker falls back to the original value (crash-safe), matching the terminal tool contract. The live on_token relay still forwards a secret split mid-value across chunks as fragments; the persisted stream is fully masked (follow-up: OpenHands/agent-canvas#1049). Implements pre-cloud refactor P0-2 (OpenHands/agent-canvas#1023). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
17f899f to
ec5d309
Compare
|
Addressed the CHANGES_REQUESTED in ec5d309: masked the |
|
🔍 Review in progress… We are performing the review through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
ACP secret masking worked across streamed text, tool-call event fields, and persisted FinishAction output when exercised with real ACP schema updates and an ACPAgent.step() turn.
Does this PR achieve its stated goal?
Yes. The goal was to prevent ACP-emitted injected secrets from reaching streamed token output, tool-call output, or persisted turn output in cleartext. On the base branch, the same verification script exposed supersecret in all three sinks; on PR commit ec5d30926bf9e7a3e30a6f2aed37a653ba97524a, those values were replaced with <secret-hidden>, including a secret split across two response chunks and only reassembled at finalization.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully and installed project dependencies. |
| CI Status | ⏳ Observed 20 passing, 8 pending, and 3 skipped checks; no failing checks at observation time. |
| Functional Verification | ✅ Baseline leak reproduced; PR masked the exercised stream, tool-call, and persisted outputs. |
Functional Verification
Test 1: ACP stream, tool-call events, and persisted turn output
I used a temporary SDK script (/tmp/qa_acp_secret_masking.py) that:
- sends real
acp.schema.AgentMessageChunk,ToolCallStart, andToolCallProgressupdates through_OpenHandsACPBridge.session_update(); - captures the same
on_token/on_eventsinks that are relayed or persisted by the SDK; and - runs
ACPAgent.step()with a simulated ACP turn whose final response is split assuper+secretbefore the persistedFinishActionis emitted.
Step 1 — Reproduce baseline without the fix:
Ran git checkout --quiet origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_secret_masking.py:
STREAM_TOKEN= visible token: supersecret
STREAM_ACCUMULATED= visible token: supersecret
TOOL_START_TITLE= Running echo supersecret
TOOL_START_RAW_INPUT= {'command': 'echo supersecret', 'nested': ['arg=supersecret']}
TOOL_TERMINAL_RAW_OUTPUT= completed with supersecret
TOOL_ACCUMULATED= {'tool_call_id': 'tc-1', 'title': 'Running echo supersecret', 'tool_kind': 'execute', 'status': 'completed', 'raw_input': {'command': 'echo supersecret', 'nested': ['arg=supersecret']}, 'raw_output': 'completed with supersecret', 'content': None}
FINISH_MESSAGE= split value: supersecret done
This confirms the bug: the ACP bridge and final ACP turn output could expose the injected secret in cleartext.
Step 2 — Apply the PR's changes:
Checked out fix-acp-mask-injected-secrets at ec5d30926bf9e7a3e30a6f2aed37a653ba97524a.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_secret_masking.py:
STREAM_TOKEN= visible token: <secret-hidden>
STREAM_ACCUMULATED= visible token: <secret-hidden>
TOOL_START_TITLE= Running echo <secret-hidden>
TOOL_START_RAW_INPUT= {'command': 'echo <secret-hidden>', 'nested': ['arg=<secret-hidden>']}
TOOL_TERMINAL_RAW_OUTPUT= completed with <secret-hidden>
TOOL_ACCUMULATED= {'tool_call_id': 'tc-1', 'title': 'Running echo <secret-hidden>', 'tool_kind': 'execute', 'status': 'completed', 'raw_input': {'command': 'echo <secret-hidden>', 'nested': ['arg=<secret-hidden>']}, 'raw_output': 'completed with <secret-hidden>', 'content': None}
FINISH_MESSAGE= split value: <secret-hidden> done
This shows the fix works for the exercised behavior: streamed text and accumulated text are masked, nested tool-call JSON fields are masked before event emission/accumulation, and the final persisted message catches the split secret at the join boundary.
Issues Found
None.
This PR review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review — fix(acp): mask injected secrets in tool-call output and streamed text
🟢 Good taste — Well-structured security fix with proper defense-in-depth layering and solid test coverage.
Summary
This PR closes a real P0 gap: the ACP agent path previously forwarded subprocess output verbatim into the persisted event stream and live on_token/on_event sinks, while the regular terminal tool always ran output through secret_registry.mask_secrets_in_output. The fix routes ACP output through the same masker with three complementary tiers:
- Per-chunk masking at ingestion —
AgentMessageChunk,AgentThoughtChunk, and fork-session text are masked before touchingaccumulated_text/accumulated_thoughts/_fork_accumulated_text. - Per-frame tool-call masking —
_mask_tool_call_entrymutates the accumulator entry in place so that the start event, every progress frame, the flush path, and the supersede path (_cancel_inflight_tool_calls) all carry masked values without needing separate masking in each emitter. - Join-boundary re-masking —
_finalize_successful_turnand theask_agent()fork join both re-mask the concatenated string, catching any secret split across two consecutive chunks.
The recursive _mask_json_value helper correctly handles string, dict, list, and scalar leaves. The defensive _mask_value fallback (log at DEBUG, return original on exception) matches the terminal tool masking contract and avoids crashing the agent on a broken masker.
Note on the prior CHANGES_REQUESTED review
The existing CHANGES_REQUESTED review cites a "🔴 Critical" gap — that the ask_agent() fork path does not mask _fork_accumulated_text. This issue is already addressed by the current PR. The fix is present in two places:
- At ingestion (
session_update, fork branch):self._fork_accumulated_text.append(self._mask_value(update.content.text)) - At the join boundary (
ask_agent(), fork result):result = client._mask_value("".join(client._fork_accumulated_text))
A dedicated test (test_fork_session_text_masked) covers the fork ingestion path. The CHANGES_REQUESTED verdict no longer reflects the current code; this PR is ready for re-evaluation.
Minor observations (non-blocking)
1. Stale comment in _OpenHandsACPBridge.__init__ — "set per turn" contradicts the actual lifetime
The comment at self.mask = None (lines 597–602) says "set per turn by ACPAgent", but mask is bound once in _start_acp_server for the full conversation lifetime and explicitly not cleared by reset(). The _reset_client_for_turn docstring correctly says "The secret masker is bound once in _start_acp_server (conversation-stable), not here." The __init__ comment should match that wording.
2. reset() silently skips mask — intent not documented
reset() clears on_token, on_event, and on_activity, but says nothing about mask. A future contributor cannot distinguish "forgotten" from "by design" without reading _start_acp_server and the test. A one-liner like # mask is conversation-lifetime (bound once in _start_acp_server) — NOT cleared here makes the invariant self-documenting.
Known / acknowledged residual gap
The live on_token relay can still forward individual chunks of a secret that straddles a streaming frame boundary (each fragment alone does not match the pattern; only the join reassembles it). The PR description and inline comments correctly document this and link a follow-up issue. The join-boundary re-mask fully protects the persisted stream.
Tests
Coverage is comprehensive: TestMaskJsonValue (scalar, nested dict/list), TestACPBridgeMasking (message chunk, thought chunk, tool-call start + accumulator, tool-call progress + terminal event, no-op when mask is None, error fallback, reset() preserves mask, fork-session text), TestACPStepMasksPersistedTurn (end-to-end: secret split across chunks is caught at the persistence boundary). All tests exercise real code paths without excessive mocking.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Changes are additive and entirely within the ACP agent path. No existing API surfaces are modified. The defensive fallback in _mask_value prevents the masking logic from breaking the agent on unexpected errors.
VERDICT: ✅ Worth merging — security fix is correct and complete; the two comment-only nits above can be addressed in a follow-up or a quick fixup commit.
KEY INSIGHT: Setting mask once at conversation start (in _start_acp_server) rather than per-turn avoids a race between ask_agent() fork sessions (which run concurrently on the shared _client) and the main turn's reset() / callback-wiring cycle — a subtle but important design choice.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Problem
Priority: P0 (security hole, widened on cloud). Implements pre-cloud refactor P0-2 — OpenHands/agent-canvas#1023, part of OpenHands/agent-canvas#988.
The regular terminal masks every observation through
secret_registry.mask_secrets_in_output(). The ACP agent masked nothing — so a subprocess that echoes an injected credential (env, a failingcurlprinting anAuthorizationheader) landed in the persisted event stream + WebSocket relay in cleartext.What this does
Routes ACP output through
state.secret_registry.mask_secrets_in_output()before emitting:AgentMessageChunktext (on_token+accumulated_text) andAgentThoughtChunktext.title/raw_input/raw_output/content, masked at ingestion insession_updateso the early "started" event, the single terminal-transition event (post-refactor(acp): dedup tool-call progress at the source (O(n²)→O(1)) #3465 dedup), and the flush/supersede path all carry masked values, and the accumulator never holds plaintext. A recursive_mask_json_valuehelper masks string leaves of the arbitrary-JSON fields.response_text/thought_textin_finalize_successful_turnis re-masked at the persistence boundary, which also catches a secret split across two stream chunks (each chunk alone won't match; only the join reassembles it).The bridge gains a per-turn
maskcallable (set in_reset_client_for_turnto the livemask_secrets_in_outputbound method, cleared inreset()/_clear_turn_callbacks). No-op when unset; falls back to the original value if it ever raises — matching the terminal tool's contract.Mask set / scope
The mask set is the registry's
_exported_values, seeded byget_secret_valuewhen_start_acp_serverreadsstate.secret_registry— the canonical channel that #3464 routes provider creds (and, on the caller/cloud path,agent_context.secrets) through. NoSecretRegistryAPI changes. The deprecatedacp_envand the canvas-localagent_context.secretsdrain bypassget_secret_valueand are out of scope here (both are transitional/deprecated; seeding them would mean masking general non-secret env too).The live
on_tokenrelay still forwards a secret split mid-value across chunks as fragments; the persisted stream is fully masked. Follow-up for streaming prefix-redaction: OpenHands/agent-canvas#1049.Tests
test_acp_agent.py: recursive masker; bridge masking of message/thought chunks and tool-call start/terminal fields (incl. accumulator stays masked); no-op whenmaskunset; mask-error fallback;reset()clearsmask; end-to-endstep()masks a secret split across chunks in the persistedFinishAction(seeded via the canonicalupdate_secrets+get_secret_valuepath).Verification
ruff+pyrightclean;tests/sdk/agent/+tests/sdk/conversation/green.Docs follow-up (acp_env deprecation guidance): OpenHands/docs#544.
🤖 Generated with Claude Code
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:ec5d309-pythonRun
All tags pushed for this build
About Multi-Architecture Support
ec5d309-python) is a multi-arch manifest supporting both amd64 and arm64ec5d309-python-amd64) are also available if needed