Skip to content

MUL-2539 fix(pi): strip leaked tool-call markup safely#2956

Merged
Bohan-J merged 1 commit into
multica-ai:mainfrom
drybalka-s:drybalka/0.3.4-daemon-context-reduction
May 22, 2026
Merged

MUL-2539 fix(pi): strip leaked tool-call markup safely#2956
Bohan-J merged 1 commit into
multica-ai:mainfrom
drybalka-s:drybalka/0.3.4-daemon-context-reduction

Conversation

@drybalka-s
Copy link
Copy Markdown
Contributor

@drybalka-s drybalka-s commented May 20, 2026

Refs #3047

Summary

  • Strip leaked Pi call:*, response:*, <tool_call|>, and control-token markup from assistant text.
  • Buffer text deltas so markup split across Pi streaming chunks is still removed.
  • Preserve ordinary assistant text that merely ends with unmatched call: or response: prefixes.

Problem solved

Pi can emit raw tool-call markup through text_delta events. That markup can leak into visible assistant output and issue comments, especially when it is split across streaming deltas. The sanitizer must remove confirmed markup without dropping legitimate plain text.

Validation

  • go test ./pkg/agent -run 'TestStripPiToolCallMarkup|TestDrainPiTextBuffer|TestFlushPiTextBuffer'

@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

Someone is attempting to deploy a commit to the IndexLabs Team on Vercel.

A member of the Team first needs to authorize it.

@drybalka-s drybalka-s force-pushed the drybalka/0.3.4-daemon-context-reduction branch from e1b89b2 to 5a0a6a5 Compare May 21, 2026 12:17
@multica-eve multica-eve changed the title fix(daemon): reduce comment context usage [MUL-2539] fix(daemon): reduce comment context usage May 22, 2026
@Bohan-J
Copy link
Copy Markdown
Collaborator

Bohan-J commented May 22, 2026

Thanks for the detailed write-up. After internal discussion, here's where we landed — split by the three concerns in this PR.

1. Comment-read context reduction — please don't change CLI defaults here

We agree with the underlying problem (long-issue context bloat in comment-triggered runs), but we don't want to take this PR's CLI-side approach (changing multica issue comment list no-arg default to --recent 10, introducing --all / --summary). The reason is that we've already addressed the same problem in the daemon runtime over the past couple of days, and we'd rather not pay the cost of a CLI default-behavior change on top.

What's already in main:

  • Comment-triggered runs no longer dump the full flat timeline. The entry path reads the triggering thread's most recent 30 replies via --thread <id> --tail 30 (the root is always included, even at --tail 0).
  • If 30 replies aren't enough, the agent walks older replies in the same thread one page at a time using the Next reply cursor: --before <ts> --before-id <reply-id> line emitted on stderr.
  • When cross-thread background is genuinely needed, the agent uses --recent 20 to pull the most recently active threads, with the same --before / --before-id cursor walking older threads (stderr label: Next thread cursor).
  • The unfiltered multica issue comment list <issue-id> --output json form is explicitly discouraged in the runtime prompt for long-running issues.

That covers the "don't default to ingesting the entire timeline" goal without changing the CLI's public contract. Since multica issue comment list is also used by humans and scripts, we'd prefer to keep the no-arg behavior stable and keep agent context governance in the daemon prompt / runtime workflow layer rather than as a CLI default change. So please drop the CLI default change and the new --all / --summary surface from this PR (or split it into its own proposal with a clear breaking-change note + migration docs if you want to push it further).

2. Agent-authored comment output hardening (HEREDOC / --content-file) — directionally fine, evaluate separately

Replacing inline --content "..." in daemon reply prompts with --content-file / single-quoted HEREDOC stdin is reasonable — the failure mode (shell expansion of $, backticks, quotes inside agent-authored comment bodies) is real. We're happy to evaluate this on its own, just not bundled with the CLI-default change above. Please pull it into a separate PR (or at minimum a clearly separable commit) so it can land independently.

3. Pi sanitizer (server/pkg/agent/pi.go) — must-fix before this can merge

The current sanitizer has a real correctness bug: when the buffered assistant text starts with call: or response: but the trailing content is not a complete tool-markup structure, the EOF flush path can drop legitimate plain text (e.g. an assistant message ending in response: see below). Existing tests cover the complete-markup and split-token cases, but there's no regression coverage for unmatched call: / response: plain text at EOF.

Please:

  • Tighten the sanitizer so it only drops a buffered fragment when it has positively confirmed it's a complete tool-call / response markup structure; otherwise flush the buffered text verbatim.
  • Add a regression test covering the unmatched call: / response: EOF case so this doesn't regress.

Merge recommendation

🔄 Request changes, but the path forward is straightforward:

  • Drop the CLI-default change for multica issue comment list (and the --all / --summary surface) from this PR.
  • Split the HEREDOC / --content-file hardening into a separate PR so we can land it on its own.
  • Fix the Pi sanitizer bug + add the regression test described above.

Once those are addressed we're good to move forward. Thanks again.

@drybalka-s drybalka-s force-pushed the drybalka/0.3.4-daemon-context-reduction branch from bd2c604 to 2535cea Compare May 22, 2026 07:32
@drybalka-s drybalka-s changed the title [MUL-2539] fix(daemon): reduce comment context usage fix(pi): strip leaked tool-call markup safely May 22, 2026
@drybalka-s
Copy link
Copy Markdown
Contributor Author

Thanks for the review. I followed the requested split:

  • Dropped the CLI default change for multica issue comment list, including the --all / --summary surface, from this PR.
  • Removed the daemon comment-posting hardening from this PR and moved it to a separate PR: fix(daemon): harden agent comment posting (MUL-2567) #3066.
  • Kept this PR focused only on the Pi sanitizer.
  • Fixed the EOF/plain-text bug for unmatched call: / response: prefixes and added regression coverage for that case.

Validation for this PR:

go test ./pkg/agent -run 'TestStripPiToolCallMarkup|TestDrainPiTextBuffer|TestFlushPiTextBuffer'

@multica-eve multica-eve changed the title fix(pi): strip leaked tool-call markup safely MUL-2539 fix(pi): strip leaked tool-call markup safely May 22, 2026
Copy link
Copy Markdown
Collaborator

@Bohan-J Bohan-J left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clean follow-up, @drybalka-s — scoping this PR down to just the Pi sanitizer is exactly what we asked for, and the implementation looks solid.

What I verified on HEAD 2535ceaf:

  • Scope is right. Only server/pkg/agent/pi.go and server/pkg/agent/pi_test.go change. The CLI default-behavior change and the HEREDOC / --content-file work are out of this PR, as agreed.
  • The previous must-fix is resolved. flushPiTextBuffer now applies only the control-token regex to leftover pending text, and safePiTextEmitLen only holds back trailing bytes that could still complete a markup prefix. Plain assistant text ending in unmatched call: / response: is preserved verbatim, with a regression test covering it (TestFlushPiTextBufferKeepsUnmatchedToolPrefixes).
  • Streaming correctness. Split tool-call and split control-token across chunks are both covered (TestDrainPiTextBufferSplitToolCall, TestDrainPiTextBufferSplitControlToken).
  • Tests pass locally: go test ./pkg/agent -run 'TestStripPiToolCallMarkup|TestDrainPiTextBuffer|TestFlushPiTextBuffer' — all green.
  • CI is green (backend, frontend, installer × 2).

One non-blocking nit for a future PR if you feel like it: when buffered text starts with call: / response: but the parser can't yet decide if it's markup (e.g. an assistant reply that legitimately says call: see below ...), drainPiSanitizedText holds it in the buffer until EOF. The final output is correct, but the live-stream emit gets delayed for that segment. Splitting scanPiToolMarkupEnd into not-markup vs incomplete-markup so the obvious-not-markup case can keep streaming would be nice — but it's polish, not a blocker.

Approving and merging. Thanks again!

@Bohan-J Bohan-J merged commit 5bc77f2 into multica-ai:main May 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants