-
-
Notifications
You must be signed in to change notification settings - Fork 52
ECA becomes unresponsive when tool output causes LLM context overflow #391
Description
Description
When a tool (especially shell commands) produces very large output, ECA hits the LLM provider's token limit with:
LLM response status: 400 body: {"error":{"message":"prompt token count of 174837 exceeds the limit of 168000","code":"model_max_prompt_tokens_exceeded"}}
After this error, the ECA server and Emacs become unresponsive/frozen, requiring a restart.
Root Cause Analysis
The issue involves multiple contributing factors on both server and client side.
Server-side
-
No proactive token budget check before API call: The server sends the full message history to the LLM provider without estimating token usage. Context overflow is only detected reactively (HTTP 400 response) and recovered via prune + auto-compact. While the recovery logic works, it doesn't prevent the initial expensive roundtrip and the cascade of effects on the client.
-
Tool output truncation exists but may be insufficient:
outputTruncationdefaults to 2000 lines / 50 KB, but a single tool result at the maximum (50 KB ≈ 12,500 tokens) combined with a long conversation history can still exceed the context limit. The truncation also doesn't account for the cumulative token budget across all messages. -
Potential infinite recovery loop: If auto-compact succeeds but the subsequent resume prompt still triggers context overflow,
auto-compacting?has already been reset tofalse, allowing another compact cycle. There is no counter orrecovery-attempted?guard to preventCompact → Resume → Overflow → Compact → …loops (analogous to the existingauto-continued?guard for truncated responses).
Client-side (Emacs)
-
Synchronous JSON parsing of large messages: The process filter in
eca-process.elparses incoming JSON messages synchronously with no size limit. Before the overflow error, the server sendstoolCallednotifications containing the full tool output. Parsing a multi-MB JSON message blocks the Emacs main thread. -
font-lock-ensurerefontifies entire buffer: After each text content insertion (eca-chat.el),font-lock-ensureruns over the entire chat buffer. Ingfm-modewithmarkdown-fontify-code-blocks-natively, this becomes very expensive for large buffers. -
align-tables/beautify-tablesscan entire buffer: On"finished"status, both functions are called with(point-min), scanning the entire chat buffer for markdown tables — expensive for long conversations. -
Expandable content has no size limit: Tool call outputs are stored and rendered in expandable overlays without any truncation. Large outputs bloat the buffer.
These client-side issues compound: large tool output → expensive JSON parse → expensive rendering → expensive fontification → expensive table alignment, all running synchronously on the Emacs main thread.
Steps to Reproduce
- Start an ECA chat session with a model that has a moderate context window (e.g., 128K-168K tokens)
- Have a conversation long enough to consume a significant portion of the context
- Execute a tool/shell command that produces very large output (e.g.,
find / -name "*.log"or similar) - The LLM provider returns HTTP 400 (token limit exceeded)
- Emacs becomes unresponsive