Add live web-search status streaming for chat completions by adambalogh · Pull Request #84 · OpenGradient/tee-gateway

adambalogh · 2026-06-01T11:32:04Z

Summary

Adds real-time "searching the web" status indicators to streaming chat completions when providers (Anthropic, OpenAI) perform native web searches mid-stream. Clients can now display live UI feedback while the model browses the web, improving perceived responsiveness.

Changes

New function extract_web_search_events() in llm_backend.py:
- Detects web-search blocks in streamed message chunks (both Anthropic server_tool_use and OpenAI web_search_call formats)
- Extracts search queries when available (from input or action fields)
- Deduplicates blocks across chunks using stable IDs/indices to emit one event per actual search
- Returns lightweight event dicts for UI consumption
Helper function _web_search_query_from_block() in llm_backend.py:
- Best-effort extraction of search query from web-search content blocks
- Handles both Anthropic and OpenAI block structures
- Returns None during streaming when query is not yet fully accumulated
Web-search status frame emission in chat_controller.py:
- Tracks seen web-search blocks per stream session to avoid duplicate status frames
- Emits lightweight SSE frames with "web_search": {"status": "searching", "query": <str|null>} when new searches are detected
- Status frames carry no content delta and do not affect response signing or billing
Comprehensive test coverage in test_web_search.py:
- TestExtractWebSearchEvents: 10 unit tests covering None/plain-text inputs, both provider formats, query extraction, deduplication, and edge cases
- TestChatControllerWebSearchStreaming: 2 integration tests verifying status frames are emitted exactly once per search and not emitted when web search is disabled

Implementation Details

Status frames are UI-only signals independent of extract_web_search_count() (which counts completed searches for billing)
Deduplication uses (block_type, id_or_index) tuples to handle Anthropic's incremental JSON delta streaming where the same block reappears across chunks
Query extraction gracefully handles partial/incomplete blocks during streaming by returning None when query is not yet available
No changes to response signing, billing, or core LLM routing logic

https://claude.ai/code/session_01N5NjTA5rQhVGuLzNDmAs86

When a provider runs a native web search mid-stream, surface it to the client so the UI can show a "searching the web" indicator instead of a silent pause. - llm_backend: add extract_web_search_events(), a per-chunk best-effort detector for web_search_call (OpenAI) / server_tool_use (Anthropic) blocks, deduped by block id/index across chunks, with best-effort query extraction - chat_controller: in the streaming loop, emit a lightweight SSE frame ({"web_search": {"status": "searching", "query"?}}) on each newly-seen search. The frame carries no content delta, so it is excluded from the signed output hash and does not affect per-search billing - tests: per-chunk detection/dedup/query extraction and streaming-path emission (one frame per deduped search; none when web_search is off) https://claude.ai/code/session_01N5NjTA5rQhVGuLzNDmAs86

…a-CKC4E

claude added 2 commits June 1, 2026 11:30

Merge remote-tracking branch 'origin/main' into claude/zealous-hypati…

d3b36e9

…a-CKC4E

adambalogh marked this pull request as ready for review June 3, 2026 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add live web-search status streaming for chat completions#84

Add live web-search status streaming for chat completions#84
adambalogh wants to merge 2 commits into
mainfrom
claude/zealous-hypatia-CKC4E

adambalogh commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adambalogh commented Jun 1, 2026

Summary

Changes

Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants