Skip to content

web_fetch_content tool#749

Open
p12tic wants to merge 6 commits into
Zoo-Code-Org:mainfrom
p12tic:web-fetch-content-tool
Open

web_fetch_content tool#749
p12tic wants to merge 6 commits into
Zoo-Code-Org:mainfrom
p12tic:web-fetch-content-tool

Conversation

@p12tic

@p12tic p12tic commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Related GitHub Issue

Closes: #748

Description

Implements web_fetch_content tool.

Test Procedure

Manual testing.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

image

Documentation Updates

Additional Notes

Please let me know if this is something that can be added to ZooCode and I will finish polishing it. I suspect at least docs will need updates.

Summary by CodeRabbit

  • New Features

    • Added support for fetching web content as a new tool, including URL-based requests and optional prompt context.
    • Web content actions now appear in the chat UI with a clearer label and target URL display.
    • Added a new web tool group and updated available modes to include web access.
  • Bug Fixes

    • Improved handling of web-fetch tool calls in progress updates and final results.
    • Added localized chat messages and tool labels across supported languages.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

No new commits to review since the last review.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6c02f66b-2d3c-49f9-b4ee-34bb57af11f1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new built-in fetch_web_content native tool that fetches HTTP/HTTPS URLs with user-approval gating, byte and character limits, HTML-to-text conversion via Cheerio, and JSON pretty-printing. The tool is registered in the OpenAI tool list, wired into the assistant message parser and dispatcher, exposed as a new web tool group in the code and debug modes, rendered in the chat UI with a Globe icon, and localized across 20 languages.

Changes

fetch_web_content built-in tool

Layer / File(s) Summary
Type and schema contracts
packages/types/src/tool.ts, packages/types/src/mode.ts, packages/types/src/vscode-extension-host.ts, src/shared/tools.ts, schemas/roomodes.json
Adds "web" to toolGroups, "fetch_web_content" to toolNames, a typed NativeToolArgs entry, fetchWebContent variant on ClineSayTool with optional url, a new TOOL_GROUPS.web entry, "web" in code and debug mode groups, and expands the roomodes JSON schema to allow "web" in groups enums.
OpenAI tool definition and registry
src/core/prompts/tools/native-tools/fetch_web_content.ts, src/core/prompts/tools/native-tools/index.ts, src/package.json
Creates the OpenAI function tool definition with strict url/prompt parameter schema, registers it in getNativeTools(), and adds cheerio as a runtime dependency.
FetchWebContentTool core
src/core/tools/FetchWebContentTool.ts
Implements FetchWebContentTool with URL validation, HTTP/HTTPS-only enforcement, AbortController timeout, streaming response reads capped at MAX_RESPONSE_BYTES, htmlToText via Cheerio, JSON pretty-printing, structured result formatting, and handlePartial URL-stabilization logic. Exports a singleton fetchWebContentTool.
Parser and dispatcher wiring
src/core/assistant-message/NativeToolCallParser.ts, src/core/assistant-message/presentAssistantMessage.ts, src/core/auto-approval/tools.ts
Adds fetch_web_content branches to createPartialToolUse and parseToolCall, wires tool dispatch and description in presentAssistantMessage, and adds fetchWebContent to the read-only auto-approval list.
Tests
src/core/tools/__tests__/fetchWebContentTool.spec.ts, src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts, src/shared/__tests__/modes.spec.ts
Adds 724-line spec for FetchWebContentTool covering execute, handlePartial, and htmlToText; adds NativeToolCallParser tests for parse/streaming/finalization of fetch_web_content; updates modes spec to assert "web" in debug mode groups.
Chat UI rendering
webview-ui/src/components/chat/ChatRow.tsx, webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx
Adds a fetchWebContent case to ChatRow using the Globe icon and wantsToFetch/didFetch localized strings, with URL display in a ToolUseBlockHeader; adds UI tests asserting icon, prompt text, and URL rendering.
i18n across 20 locales
webview-ui/src/i18n/locales/*/chat.json, webview-ui/src/i18n/locales/*/prompts.json
Adds webFetch.wantsToFetch and webFetch.didFetch to chat locale files and tools.toolNames.web to prompts locale files for ca, de, en, es, fr, hi, id, it, ja, ko, nl, pl, pt-BR, ru, tr, vi, zh-CN, zh-TW.

Sequence Diagram(s)

sequenceDiagram
  participant NativeToolCallParser
  participant presentAssistantMessage
  participant FetchWebContentTool
  participant User as User Approval
  participant fetch as HTTP fetch

  NativeToolCallParser->>presentAssistantMessage: parsed fetch_web_content ToolUse (url, prompt)
  presentAssistantMessage->>FetchWebContentTool: handle(params, task, callbacks)
  FetchWebContentTool->>FetchWebContentTool: validate URL + enforce HTTP/HTTPS
  FetchWebContentTool->>User: askApproval(fetchWebContent, url)
  User-->>FetchWebContentTool: approved
  FetchWebContentTool->>fetch: GET url (AbortController timeout, max bytes)
  fetch-->>FetchWebContentTool: streaming response body
  FetchWebContentTool->>FetchWebContentTool: htmlToText / JSON.stringify / raw text
  FetchWebContentTool-->>presentAssistantMessage: pushToolResult(formatted content)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

awaiting-review

Suggested reviewers

  • JamesRobert20
  • navedmerchant
  • hannesrudolph
  • edelauna
  • taltas

Poem

🐇 A rabbit hops across the web,
fetching pages, ebb by ebb,
HTML stripped to plain-text treasure,
JSON pretty-printed for good measure,
Globe icon glows in the chat UI—
no MCP needed, just Zoo nearby! 🌐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly names the new fetch_web_content tool introduced by this PR.
Description check ✅ Passed The PR description includes the required issue link, summary, testing, checklist, screenshot, and docs notes, though briefly.
Linked Issues check ✅ Passed The changes match #748 by adding a built-in web content fetch tool plus the wiring, parsing, UI, and tests it needs.
Out of Scope Changes check ✅ Passed The changes stay focused on the new fetch_web_content feature and its supporting types, tests, UI, and locales.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@p12tic p12tic force-pushed the web-fetch-content-tool branch 3 times, most recently from 7599218 to fc37493 Compare June 27, 2026 19:38
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

@p12tic p12tic force-pushed the web-fetch-content-tool branch 2 times, most recently from 0767e1d to 6b7c05a Compare June 27, 2026 21:00
@p12tic p12tic force-pushed the web-fetch-content-tool branch from 6b7c05a to ba0ca59 Compare June 28, 2026 19:13
@p12tic p12tic marked this pull request as ready for review June 28, 2026 19:16
@p12tic

p12tic commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai please review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

@p12tic The PR description image shows the Zoo Code UI. At the top, there’s a user prompt: “Please fetch bbc.com and tell me what news are there”. Below that, the interface shows an “API Request” section, then a message from Zoo: “No skills apply to this request — it's a simple web content fetch task.” A second section says “Zoo wants to fetch web content” and displays the URL https://www.bbc.com. Lower down, a green “Task Completed” status appears with returned content headed “Here are the current top news stories from BBC.com (June 27, 2026):”. Visible headlines include “Venezuela Earthquake Death Toll Tops 1,400”, “US Strikes Iran After Attack on Cargo Ship”, “Burkina Faso Severs Diplomatic Ties with France”, “Meloni and Trump: A Very Public Fall-Out”, and “Europe's Deadly Heatwave Breaks German Record”.

I’ll review the PR now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/shared/__tests__/modes.spec.ts (1)

607-617: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Add the matching code-mode assertion for web.

The new contract is that the web group is available in both debug and code, but this spec only pins debug. If code drops web, this layer will still stay green.

As per coding guidelines, "Use package-local unit tests for pure logic, parsing, state transitions, validation, serialization, request construction, retry decisions, and error handling."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/__tests__/modes.spec.ts` around lines 607 - 617, The debug mode
spec only asserts that the `web` group exists on `debug`, so it can miss
regressions in `code`. Update `modes.spec.ts` by adding a matching assertion for
the `code` mode using the same `modes.find(...)`/`toMatchObject(...)` pattern,
and verify its `groups` also include `web` alongside the existing `debug`
checks.

Source: Coding guidelines

🧹 Nitpick comments (1)
src/shared/tools.ts (1)

119-119: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Align prompt nullability with the native args contract.

FetchWebContentTool.execute and the new tool schema both accept prompt: null, but NativeToolArgs.fetch_web_content narrows it to string | undefined. The parser currently casts through that mismatch, so future callers lose type protection here. Prefer prompt?: string | null if null remains part of the wire format.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/tools.ts` at line 119, The `NativeToolArgs.fetch_web_content` type
is narrower than the actual `FetchWebContentTool.execute` and tool schema
contract because it excludes `null` for `prompt`. Update the `fetch_web_content`
entry in `NativeToolArgs` to allow `prompt?: string | null`, and make sure the
parser and any related type references in `FetchWebContentTool` continue to
align with that wire format so callers keep proper type safety.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`:
- Around line 376-389: The NativeToolCallParser streaming tests only cover a
complete JSON payload in one call, so they miss accumulator and partial-JSON
behavior in NativeToolCallParser.processStreamingChunk. Update the
fetch_web_content and related streaming specs in NativeToolCallParser.spec.ts to
split url and prompt across multiple fragmented chunks, asserting intermediate
partial results and final assembled nativeArgs so the accumulator logic in
startStreamingToolCall/processStreamingChunk is exercised. Use the existing
NativeToolCallParser and processStreamingChunk test cases as the place to add
these multi-chunk assertions.

In `@src/core/tools/FetchWebContentTool.ts`:
- Around line 175-215: The FetchWebContentTool validation currently only blocks
non-http(s) schemes, so it still allows SSRF targets like localhost,
private/link-local IPs, and metadata endpoints; add host-target filtering in
FetchWebContentTool before the fetch call and in the redirect path. Use the
existing url/parsedUrl handling in FetchWebContentTool to resolve the hostname,
reject internal or loopback destinations, and re-check each redirect destination
before following it. Also update src/core/auto-approval/tools.ts so
fetchWebContent is not treated as read-only auto-approval until these
network-safety checks are in place.

In `@webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx`:
- Around line 60-136: Add a Vitest case in ChatRow.fetch-web-content.spec.tsx to
cover the completed fetch state handled by ChatRow’s fetchWebContent rendering
path. Right now the suite only exercises type: "ask" with ask: "tool", so the
didFetch branch for type: "say" and say: "tool" remains untested; add a local
render assertion that the completed web content UI still shows the expected
URL/message. Use the existing ChatRow test helpers and keep the coverage under
webview-ui/src/components/chat/__tests__ rather than adding an e2e test.

In `@webview-ui/src/components/chat/ChatRow.tsx`:
- Around line 1019-1027: The `didFetch` branch in `ChatRow` is unreachable
because `fetchWebContent` is handled only in the `message.ask === "tool"` path,
while completed fetch events are routed through the later `message.type ===
"say"` switch and never rendered by `case "tool"`. Update the message routing so
`fetchWebContent` is handled in the `message.type === "say"` tool-rendering flow
as well, or otherwise make the `fetchWebContent` case reachable for completed
tool fetches. Use the existing `case "fetchWebContent"` and `case "tool"`
branches in `ChatRow` to keep the want-to-fetch/did-fetch states aligned.

---

Outside diff comments:
In `@src/shared/__tests__/modes.spec.ts`:
- Around line 607-617: The debug mode spec only asserts that the `web` group
exists on `debug`, so it can miss regressions in `code`. Update `modes.spec.ts`
by adding a matching assertion for the `code` mode using the same
`modes.find(...)`/`toMatchObject(...)` pattern, and verify its `groups` also
include `web` alongside the existing `debug` checks.

---

Nitpick comments:
In `@src/shared/tools.ts`:
- Line 119: The `NativeToolArgs.fetch_web_content` type is narrower than the
actual `FetchWebContentTool.execute` and tool schema contract because it
excludes `null` for `prompt`. Update the `fetch_web_content` entry in
`NativeToolArgs` to allow `prompt?: string | null`, and make sure the parser and
any related type references in `FetchWebContentTool` continue to align with that
wire format so callers keep proper type safety.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a036f0c0-1591-4915-a7c7-f740d0b9ce47

📥 Commits

Reviewing files that changed from the base of the PR and between 83fc6bb and ba0ca59.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (53)
  • packages/types/src/mode.ts
  • packages/types/src/tool.ts
  • packages/types/src/vscode-extension-host.ts
  • schemas/roomodes.json
  • src/core/assistant-message/NativeToolCallParser.ts
  • src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
  • src/core/assistant-message/presentAssistantMessage.ts
  • src/core/auto-approval/tools.ts
  • src/core/prompts/tools/native-tools/fetch_web_content.ts
  • src/core/prompts/tools/native-tools/index.ts
  • src/core/tools/FetchWebContentTool.ts
  • src/core/tools/__tests__/fetchWebContentTool.spec.ts
  • src/package.json
  • src/shared/__tests__/modes.spec.ts
  • src/shared/tools.ts
  • webview-ui/src/components/chat/ChatRow.tsx
  • webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx
  • webview-ui/src/i18n/locales/ca/chat.json
  • webview-ui/src/i18n/locales/ca/prompts.json
  • webview-ui/src/i18n/locales/de/chat.json
  • webview-ui/src/i18n/locales/de/prompts.json
  • webview-ui/src/i18n/locales/en/chat.json
  • webview-ui/src/i18n/locales/en/prompts.json
  • webview-ui/src/i18n/locales/es/chat.json
  • webview-ui/src/i18n/locales/es/prompts.json
  • webview-ui/src/i18n/locales/fr/chat.json
  • webview-ui/src/i18n/locales/fr/prompts.json
  • webview-ui/src/i18n/locales/hi/chat.json
  • webview-ui/src/i18n/locales/hi/prompts.json
  • webview-ui/src/i18n/locales/id/chat.json
  • webview-ui/src/i18n/locales/id/prompts.json
  • webview-ui/src/i18n/locales/it/chat.json
  • webview-ui/src/i18n/locales/it/prompts.json
  • webview-ui/src/i18n/locales/ja/chat.json
  • webview-ui/src/i18n/locales/ja/prompts.json
  • webview-ui/src/i18n/locales/ko/chat.json
  • webview-ui/src/i18n/locales/ko/prompts.json
  • webview-ui/src/i18n/locales/nl/chat.json
  • webview-ui/src/i18n/locales/nl/prompts.json
  • webview-ui/src/i18n/locales/pl/chat.json
  • webview-ui/src/i18n/locales/pl/prompts.json
  • webview-ui/src/i18n/locales/pt-BR/chat.json
  • webview-ui/src/i18n/locales/pt-BR/prompts.json
  • webview-ui/src/i18n/locales/ru/chat.json
  • webview-ui/src/i18n/locales/ru/prompts.json
  • webview-ui/src/i18n/locales/tr/chat.json
  • webview-ui/src/i18n/locales/tr/prompts.json
  • webview-ui/src/i18n/locales/vi/chat.json
  • webview-ui/src/i18n/locales/vi/prompts.json
  • webview-ui/src/i18n/locales/zh-CN/chat.json
  • webview-ui/src/i18n/locales/zh-CN/prompts.json
  • webview-ui/src/i18n/locales/zh-TW/chat.json
  • webview-ui/src/i18n/locales/zh-TW/prompts.json

Comment on lines +376 to +389
describe("fetch_web_content tool", () => {
it("should emit a partial ToolUse with nativeArgs.url during streaming", () => {
const id = "toolu_streaming_fetch_1"
NativeToolCallParser.startStreamingToolCall(id, "fetch_web_content")

const fullArgs = JSON.stringify({ url: "https://example.com", prompt: "Find info" })
const result = NativeToolCallParser.processStreamingChunk(id, fullArgs)

expect(result).not.toBeNull()
expect(result?.nativeArgs).toBeDefined()
const nativeArgs = result?.nativeArgs as { url: string; prompt?: string }
expect(nativeArgs.url).toBe("https://example.com")
expect(nativeArgs.prompt).toBe("Find info")
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Exercise fragmented streaming chunks here.

These cases only pass a complete JSON payload in one processStreamingChunk() call, so they never hit the accumulator/partial-JSON behavior in src/core/assistant-message/NativeToolCallParser.ts:250-288. A bug that only appears when url/prompt arrive across multiple chunks would still pass this suite.

As per coding guidelines, "Use package-local unit tests for pure logic, parsing, state transitions, validation, serialization, request construction, retry decisions, and error handling."

Also applies to: 423-468

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts` around
lines 376 - 389, The NativeToolCallParser streaming tests only cover a complete
JSON payload in one call, so they miss accumulator and partial-JSON behavior in
NativeToolCallParser.processStreamingChunk. Update the fetch_web_content and
related streaming specs in NativeToolCallParser.spec.ts to split url and prompt
across multiple fragmented chunks, asserting intermediate partial results and
final assembled nativeArgs so the accumulator logic in
startStreamingToolCall/processStreamingChunk is exercised. Use the existing
NativeToolCallParser and processStreamingChunk test cases as the place to add
these multi-chunk assertions.

Source: Coding guidelines

Comment on lines +175 to +215
// Only allow http and https protocols
if (!["http:", "https:"].includes(parsedUrl.protocol)) {
task.consecutiveMistakeCount++
task.recordToolError("fetch_web_content")
task.didToolFailInCurrentTurn = true
pushToolResult(
formatResponse.toolError(`Invalid protocol: ${parsedUrl.protocol}. Only http and https are supported.`),
)
return
}

task.consecutiveMistakeCount = 0

// Build the approval message
const sharedMessageProps: ClineSayTool = {
tool: "fetchWebContent",
url: url,
}

const completeMessage = JSON.stringify(sharedMessageProps satisfies ClineSayTool)
const didApprove = await askApproval("tool", completeMessage)

if (!didApprove) {
return
}

// Execute the fetch
const controller = new AbortController()
const timeout = setTimeout(() => controller.abort(), DEFAULT_TIMEOUT_MS)

try {
const response = await fetch(url, {
method: "GET",
headers: {
"User-Agent": "Mozilla/5.0 (compatible; ZooCode/1.0.0)",
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,text/plain;q=0.8,*/*;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
},
redirect: "follow",
signal: controller.signal,
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift

Block private-network and metadata targets before fetching.

The current validation only checks scheme, so this still permits SSRF to localhost, RFC1918/private ranges, link-local addresses, and cloud metadata endpoints, and then returns that response body back into the chat. That is especially risky here because src/core/auto-approval/tools.ts now treats fetchWebContent as read-only, so auto-approve can bypass the manual consent gate entirely. Deny internal/loopback/link-local targets, resolve and re-check every redirect destination, and keep this tool out of read-only auto-approval until those checks exist.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/tools/FetchWebContentTool.ts` around lines 175 - 215, The
FetchWebContentTool validation currently only blocks non-http(s) schemes, so it
still allows SSRF targets like localhost, private/link-local IPs, and metadata
endpoints; add host-target filtering in FetchWebContentTool before the fetch
call and in the redirect path. Use the existing url/parsedUrl handling in
FetchWebContentTool to resolve the hostname, reject internal or loopback
destinations, and re-check each redirect destination before following it. Also
update src/core/auto-approval/tools.ts so fetchWebContent is not treated as
read-only auto-approval until these network-safety checks are in place.

Comment on lines +60 to +136
describe("ChatRow - fetchWebContent tool", () => {
beforeEach(() => {
vi.clearAllMocks()
})

it("should display fetchWebContent ask message with URL", () => {
const message: any = {
type: "ask",
ask: "tool",
ts: Date.now(),
text: JSON.stringify({
tool: "fetchWebContent",
url: "https://example.com",
}),
partial: false,
}

renderChatRowWithProviders(message)

expect(screen.getByText("Zoo wants to fetch web content")).toBeInTheDocument()
expect(screen.getByText("https://example.com")).toBeInTheDocument()
})

it("should display the Globe icon for fetchWebContent", () => {
const message: any = {
type: "ask",
ask: "tool",
ts: Date.now(),
text: JSON.stringify({
tool: "fetchWebContent",
url: "https://docs.example.com/api",
}),
partial: false,
}

renderChatRowWithProviders(message)

expect(screen.getByLabelText("Web fetch icon")).toBeInTheDocument()
})

it("should display the URL in the tool use block", () => {
const message: any = {
type: "ask",
ask: "tool",
ts: Date.now(),
text: JSON.stringify({
tool: "fetchWebContent",
url: "https://api.github.com/repos/owner/repo",
}),
partial: false,
}

renderChatRowWithProviders(message)

expect(screen.getByText("https://api.github.com/repos/owner/repo")).toBeInTheDocument()
})

it("should not return null for fetchWebContent tool (regression test)", () => {
const message: any = {
type: "ask",
ask: "tool",
ts: Date.now(),
text: JSON.stringify({
tool: "fetchWebContent",
url: "https://www.delfi.lt",
}),
partial: false,
}

const { container } = renderChatRowWithProviders(message)

// The container should have rendered content (not null)
expect(container.innerHTML).not.toBe("")
expect(screen.getByText("Zoo wants to fetch web content")).toBeInTheDocument()
expect(screen.getByText("https://www.delfi.lt")).toBeInTheDocument()
})
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add a say/tool coverage case for the completed fetch state.

Every test here uses ask/tool, so the didFetch path is untested. A local Vitest case for type: "say", say: "tool" would catch the current regression and matches the webview testing guidance. As per coding guidelines, "Prefer local webview-ui tests for React/webview behavior such as component rendering, local state, hooks, form dirty-state, validation, or prop wiring. Add or update Vitest coverage under webview-ui/src/**/__tests__ instead of reaching for apps/vscode-e2e."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx`
around lines 60 - 136, Add a Vitest case in ChatRow.fetch-web-content.spec.tsx
to cover the completed fetch state handled by ChatRow’s fetchWebContent
rendering path. Right now the suite only exercises type: "ask" with ask: "tool",
so the didFetch branch for type: "say" and say: "tool" remains untested; add a
local render assertion that the completed web content UI still shows the
expected URL/message. Use the existing ChatRow test helpers and keep the
coverage under webview-ui/src/components/chat/__tests__ rather than adding an
e2e test.

Source: Coding guidelines

Comment on lines +1019 to +1027
case "fetchWebContent":
return (
<>
<div style={headerStyle}>
<Globe className="w-4 shrink-0" aria-label="Web fetch icon" />
<span style={{ fontWeight: "bold" }}>
{message.type === "ask"
? t("chat:webFetch.wantsToFetch")
: t("chat:webFetch.didFetch")}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

didFetch is unreachable here.

This branch sits behind tool, but tool is only parsed for message.ask === "tool". say/tool fetch events still go through the later message.type === "say" switch, whose case "tool" does not render fetchWebContent, so completed fetches disappear from the timeline.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webview-ui/src/components/chat/ChatRow.tsx` around lines 1019 - 1027, The
`didFetch` branch in `ChatRow` is unreachable because `fetchWebContent` is
handled only in the `message.ask === "tool"` path, while completed fetch events
are routed through the later `message.type === "say"` switch and never rendered
by `case "tool"`. Update the message routing so `fetchWebContent` is handled in
the `message.type === "say"` tool-rendering flow as well, or otherwise make the
`fetchWebContent` case reachable for completed tool fetches. Use the existing
`case "fetchWebContent"` and `case "tool"` branches in `ChatRow` to keep the
want-to-fetch/did-fetch states aligned.

@edelauna edelauna left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in reviewing this I noticed a couple implementation/security details we should consider, but I also just realized - how is this feature any different than asking the agent to just curl a website?

I wonder if instead this feature should be more stand alone, and a type of structured web parser.

Based on this article: https://mikhail.io/2025/10/claude-code-web-tools/ it seems like Claude fetches a page, converts it to markdown then has a different model go through and parse the contents? I wonder if this feature should be similar to the code embeddings feature where a user can specify a provider and model that will perform a structured parsing of a web fetch?

"searchFiles",
"codebaseSearch",
"runSlashCommand",
"fetchWebContent",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does grouping an outbound network fetch with file reads make sense here? With alwaysAllowReadOnly on, every fetch_web_content call auto-approves with no user prompt. I think network calls warrant their own opt-in flag, or a setting to always require explicit approval?

signal: controller.signal,
})

clearTimeout(timeout)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout is cancelled here as soon as headers arrive, so the body-read loop below runs with no deadline. A server that sends headers quickly but streams the body slowly could hold the connection open indefinitely — the 30s limit only covers time-to-first-byte. Should the timer stay active through the full read?

Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,text/plain;q=0.8,*/*;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
},
redirect: "follow",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With redirect: "follow", a server can redirect to a private IP or cloud metadata endpoint (169.254.169.254, localhost, etc.) and the redirect destination is never validated — only the original URL was checked. Would limiting redirects to the same host work here?

Suggested change
redirect: "follow",
redirect: "manual",

(Then check response.status >= 300 && response.status < 400 and only follow if new URL(response.headers.get('location')!, url).origin === new URL(url).origin)

content = text
}
} else {
content = text

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server responds with image/jpeg, application/pdf, or another binary type (the Accept header allows */*), this falls through and returns the raw decoded bytes to the model as text. Should binary MIME types be rejected before attempting to decode?

`Size: ${totalSize} bytes`,
``,
`--- Content ---`,
content.slice(0, MAX_CONTENT_CHARS),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Web content lands directly in the model's context with no trust boundary marker. If a page contains visible text like "Ignore all previous instructions...", it passes through as-is. Would wrapping this in something like <untrusted_web_content source="..."> (and noting in the tool description that content inside this tag is third-party data, not instructions) help?

}

if (prompt) {
outputLines.push(``, `--- Analysis Request ---`, `Prompt: ${prompt}`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user's analysis prompt arrives after up to 50k chars of untrusted content. A malicious page could also embed --- Analysis Request --- in its visible text before the real one appears. Would putting the prompt instruction before the content block help with both ordering and injection resistance?

clearTimeout(timeout)

if (!response.ok) {
pushToolResult(formatResponse.toolError(`HTTP ${response.status}: ${response.statusText}`))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tools (e.g. GenerateImageTool) call task.say("error", ...) before pushToolResult so the error surfaces as a visible bubble in chat — without it, only the model sees this. Same gap exists at the size-limit path a bit further down.

// Process content based on type
let content: string
if (contentType.includes("text/html") || contentType.includes("application/xhtml")) {
content = htmlToText(text)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheerio.load() is synchronous. For responses near the 5 MB limit, parsing can block the extension host event loop for hundreds of milliseconds, freezing IntelliSense and file watchers. Would slicing the HTML to a smaller cap before parsing (e.g. 500 KB) be a reasonable trade-off, since MAX_CONTENT_CHARS will truncate the output anyway?

@github-actions github-actions Bot added the awaiting-author PR is waiting for the author to address requested changes label Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-author PR is waiting for the author to address requested changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fetch_web_content tool

2 participants