Skip to content

fix: add max_retries param to ModuleLLM to bound retry attempts#275

Closed
PewterZz wants to merge 1 commit intomesa:mainfrom
PewterZz:fix/bounded-retry-max-retries
Closed

fix: add max_retries param to ModuleLLM to bound retry attempts#275
PewterZz wants to merge 1 commit intomesa:mainfrom
PewterZz:fix/bounded-retry-max-retries

Conversation

@PewterZz
Copy link
Copy Markdown

Pre-PR Checklist

  • This PR is a bug fix, not a new feature or enhancement.

Summary

ModuleLLM.generate() and agenerate() retried transient errors (APIConnectionError, Timeout, RateLimitError) with exponential backoff but no upper bound, causing simulations to appear hung indefinitely on persistent API failures.

Bug / Issue

Closes #266.

The @retry decorator in module_llm.py used wait_exponential with no stop condition. A persistent Timeout or RateLimitError would retry forever, blocking the simulation with no indication of what was happening.

Implementation

  • Added max_retries: int = 5 parameter to ModuleLLM.__init__
  • Refactored generate() from a class-level @retry decorator to inline Retrying() so self.max_retries is accessible at call time
  • Added stop=stop_after_attempt(self.max_retries) to both generate() and agenerate()
  • Default of 5 preserves current behavior for users who do not pass max_retries

Testing

  • Updated 2 existing tests that used generate.__wrapped__ (which no longer exists since the decorator was removed); replaced with max_retries=1 to bypass retry
  • Added 1 new test test_generate_stops_after_max_retries verifying that generate() raises after exactly max_retries attempts
  • 269 tests pass (pytest tests/ --ignore=tests/test_integration)
  • black and ruff clean

Additional Notes

The agenerate() path already used inline AsyncRetrying() so the change there is a single line adding stop=stop_after_attempt(self.max_retries).

Previously generate() and agenerate() used unbounded exponential backoff
with no stop condition, causing simulations to appear hung indefinitely
on persistent API errors (issue mesa#266).

Add max_retries: int = 5 parameter to ModuleLLM.__init__. Refactor
generate() from class-level @Retry decorator to inline Retrying() so
self.max_retries is accessible at call time. Update agenerate() to pass
stop_after_attempt(self.max_retries) to AsyncRetrying.

Update existing tests to use max_retries=1 (bypassing decorator
__wrapped__ which no longer exists). Add test verifying generate()
raises after exactly max_retries attempts.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ec1d5fe7-3f8b-4fc2-b70a-0ce9fde0d708

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PewterZz
Copy link
Copy Markdown
Author

Closing in favour of #267 which was opened earlier and covers the same fix. Sorry for the duplicate.

@PewterZz PewterZz closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues with Gemini quickstart flow (retry behavior, model errors, and tool usage

1 participant