Skip to content

Fail fast on oversized plain LM prompts and clarify llm_query semantics#143

Open
taivu1998 wants to merge 1 commit into
alexzhang13:mainfrom
taivu1998:codex/issue-42-context-window-guard
Open

Fail fast on oversized plain LM prompts and clarify llm_query semantics#143
taivu1998 wants to merge 1 commit into
alexzhang13:mainfrom
taivu1998:codex/issue-42-context-window-guard

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

This PR adds a fail-fast context-window guard for plain LM calls and clarifies when models should use llm_query(...) versus rlm_query(...).

Closes #42.

Problem

Issue #42 reports context-window failures when large prompts are sent through sub-calls. After reviewing the code path, the main practical gap was not child RLM prompt inheritance, but plain llm_query(...) and leaf LM calls: they could send oversized prompts directly to provider SDKs with no preflight validation, which led to provider-specific failures and inconsistent error messages.

What Changed

  • Added ContextWindowExceededError for oversized prompt validation failures.
  • Added shared prompt-fit utilities in rlm.utils.token_utils:
    • estimate_text_tokens(...)
    • count_prompt_tokens(...)
    • validate_prompt_fits_context_window(...)
  • Enforced prompt-size validation before provider SDK calls in all built-in LM clients:
    • OpenAI
    • Anthropic
    • Gemini
    • Azure OpenAI
    • Portkey
  • Reused the existing LMHandler and REPL error propagation path so validation errors surface cleanly inside llm_query(...) / rlm_query(...) fallback behavior.
  • Re-exported ContextWindowExceededError from the top-level rlm package.
  • Updated system prompt and docs to clarify:
    • llm_query(...) is for prompts that already fit the target model's context window.
    • rlm_query(...) is the deeper/offloaded path where recursive subcalls are available.

Design Notes

  • This keeps the fix intentionally narrow and low-complexity.
  • There is no auto-chunking, auto-routing, or fallback summarization in this PR.
  • Existing batch semantics are preserved.
  • The validation is shared and provider-agnostic, while still using the repo's existing model limit table and token estimation helpers.

Tests

Added and updated tests for:

  • token estimation and validation helpers
  • client-side preflight validation before SDK calls
  • LMHandler propagation of context-window failures
  • LocalREPL / rlm_query(...) fallback error surfacing
  • max-depth plain LM fallback behavior in _subcall

Verification

UV_CACHE_DIR=/tmp/uv-cache uv run pytest -q

Result:

  • 292 passed, 7 skipped

@taivu1998
Copy link
Copy Markdown
Author

Hi @alexzhang13, could you help review it when you have time? Thanks!

1 similar comment
@taivu1998
Copy link
Copy Markdown
Author

Hi @alexzhang13, could you help review it when you have time? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ContextWindowExceededError when sub-calls inherit large prompts in RLM

1 participant