Skip to content

ECA becomes unresponsive when tool output causes LLM context overflow #391

@snoopier

Description

@snoopier

Description

When a tool (especially shell commands) produces very large output, ECA hits the LLM provider's token limit with:

LLM response status: 400 body: {"error":{"message":"prompt token count of 174837 exceeds the limit of 168000","code":"model_max_prompt_tokens_exceeded"}}

After this error, the ECA server and Emacs become unresponsive/frozen, requiring a restart.

Root Cause Analysis

The issue involves multiple contributing factors on both server and client side.

Server-side

  1. No proactive token budget check before API call: The server sends the full message history to the LLM provider without estimating token usage. Context overflow is only detected reactively (HTTP 400 response) and recovered via prune + auto-compact. While the recovery logic works, it doesn't prevent the initial expensive roundtrip and the cascade of effects on the client.

  2. Tool output truncation exists but may be insufficient: outputTruncation defaults to 2000 lines / 50 KB, but a single tool result at the maximum (50 KB ≈ 12,500 tokens) combined with a long conversation history can still exceed the context limit. The truncation also doesn't account for the cumulative token budget across all messages.

  3. Potential infinite recovery loop: If auto-compact succeeds but the subsequent resume prompt still triggers context overflow, auto-compacting? has already been reset to false, allowing another compact cycle. There is no counter or recovery-attempted? guard to prevent Compact → Resume → Overflow → Compact → … loops (analogous to the existing auto-continued? guard for truncated responses).

Client-side (Emacs)

  1. Synchronous JSON parsing of large messages: The process filter in eca-process.el parses incoming JSON messages synchronously with no size limit. Before the overflow error, the server sends toolCalled notifications containing the full tool output. Parsing a multi-MB JSON message blocks the Emacs main thread.

  2. font-lock-ensure refontifies entire buffer: After each text content insertion (eca-chat.el), font-lock-ensure runs over the entire chat buffer. In gfm-mode with markdown-fontify-code-blocks-natively, this becomes very expensive for large buffers.

  3. align-tables / beautify-tables scan entire buffer: On "finished" status, both functions are called with (point-min), scanning the entire chat buffer for markdown tables — expensive for long conversations.

  4. Expandable content has no size limit: Tool call outputs are stored and rendered in expandable overlays without any truncation. Large outputs bloat the buffer.

These client-side issues compound: large tool output → expensive JSON parse → expensive rendering → expensive fontification → expensive table alignment, all running synchronously on the Emacs main thread.

Steps to Reproduce

  1. Start an ECA chat session with a model that has a moderate context window (e.g., 128K-168K tokens)
  2. Have a conversation long enough to consume a significant portion of the context
  3. Execute a tool/shell command that produces very large output (e.g., find / -name "*.log" or similar)
  4. The LLM provider returns HTTP 400 (token limit exceeded)
  5. Emacs becomes unresponsive

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions