Skip to content

Add live web-search status streaming for chat completions#84

Open
adambalogh wants to merge 2 commits into
mainfrom
claude/zealous-hypatia-CKC4E
Open

Add live web-search status streaming for chat completions#84
adambalogh wants to merge 2 commits into
mainfrom
claude/zealous-hypatia-CKC4E

Conversation

@adambalogh
Copy link
Copy Markdown
Contributor

Summary

Adds real-time "searching the web" status indicators to streaming chat completions when providers (Anthropic, OpenAI) perform native web searches mid-stream. Clients can now display live UI feedback while the model browses the web, improving perceived responsiveness.

Changes

  • New function extract_web_search_events() in llm_backend.py:

    • Detects web-search blocks in streamed message chunks (both Anthropic server_tool_use and OpenAI web_search_call formats)
    • Extracts search queries when available (from input or action fields)
    • Deduplicates blocks across chunks using stable IDs/indices to emit one event per actual search
    • Returns lightweight event dicts for UI consumption
  • Helper function _web_search_query_from_block() in llm_backend.py:

    • Best-effort extraction of search query from web-search content blocks
    • Handles both Anthropic and OpenAI block structures
    • Returns None during streaming when query is not yet fully accumulated
  • Web-search status frame emission in chat_controller.py:

    • Tracks seen web-search blocks per stream session to avoid duplicate status frames
    • Emits lightweight SSE frames with "web_search": {"status": "searching", "query": <str|null>} when new searches are detected
    • Status frames carry no content delta and do not affect response signing or billing
  • Comprehensive test coverage in test_web_search.py:

    • TestExtractWebSearchEvents: 10 unit tests covering None/plain-text inputs, both provider formats, query extraction, deduplication, and edge cases
    • TestChatControllerWebSearchStreaming: 2 integration tests verifying status frames are emitted exactly once per search and not emitted when web search is disabled

Implementation Details

  • Status frames are UI-only signals independent of extract_web_search_count() (which counts completed searches for billing)
  • Deduplication uses (block_type, id_or_index) tuples to handle Anthropic's incremental JSON delta streaming where the same block reappears across chunks
  • Query extraction gracefully handles partial/incomplete blocks during streaming by returning None when query is not yet available
  • No changes to response signing, billing, or core LLM routing logic

https://claude.ai/code/session_01N5NjTA5rQhVGuLzNDmAs86

claude added 2 commits June 1, 2026 11:30
When a provider runs a native web search mid-stream, surface it to the
client so the UI can show a "searching the web" indicator instead of a
silent pause.

- llm_backend: add extract_web_search_events(), a per-chunk best-effort
  detector for web_search_call (OpenAI) / server_tool_use (Anthropic)
  blocks, deduped by block id/index across chunks, with best-effort query
  extraction
- chat_controller: in the streaming loop, emit a lightweight SSE frame
  ({"web_search": {"status": "searching", "query"?}}) on each newly-seen
  search. The frame carries no content delta, so it is excluded from the
  signed output hash and does not affect per-search billing
- tests: per-chunk detection/dedup/query extraction and streaming-path
  emission (one frame per deduped search; none when web_search is off)

https://claude.ai/code/session_01N5NjTA5rQhVGuLzNDmAs86
@adambalogh adambalogh marked this pull request as ready for review June 3, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants