Skip to content

Improved Ollama integration with async + retry support#274

Open
Zzeelan7 wants to merge 5 commits intomesa:mainfrom
Zzeelan7:feature/ollama-optimization
Open

Improved Ollama integration with async + retry support#274
Zzeelan7 wants to merge 5 commits intomesa:mainfrom
Zzeelan7:feature/ollama-optimization

Conversation

@Zzeelan7
Copy link
Copy Markdown

Thanks for opening a PR! Please click the Preview tab and select a PR template:

Fixes #214 - Memory Decay & Salience Pruning in STLTMemory
Related to #178 - Expose Token Usage via LiteLLM
Related to #200 - Critical Performance Issues

Feature/enhancement PRs require prior maintainer approval in an issue or discussion before they are accepted.

Copilot AI review requested due to automatic review settings March 30, 2026 07:00
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ca44484-430a-42ed-a9e9-887c9bcd7fc7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de89bf4318

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +377 to +380
if self.short_term_memory:
entry = self.short_term_memory.popleft()
if entry not in evicted:
evicted.append(entry)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid consolidating when consolidation is disabled

When consolidation_capacity is disabled (None via consolidation_capacity=0), this overflow branch still appends the removed entry to evicted, which causes process_step() to call _update_long_term_memory() and issue an LLM summarization anyway. That changes the documented/previous semantics of “discard oldest without consolidation,” and it introduces unexpected API calls/cost and long-term memory mutations for users who intentionally turned consolidation off.

Useful? React with 👍 / 👎.

Comment on lines +348 to +351
target_tokens = current_tokens * 0.75 # Target 75% of max

for salience, tokens, entry in entries_with_salience:
if tokens_freed >= (current_tokens - target_tokens):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce token target from max_tokens, not current usage

This computes target_tokens as current_tokens * 0.75, so pruning only removes 25% of the current footprint rather than driving memory under the configured max_tokens threshold. If memory is far over limit (for example, 3000 tokens with max_tokens=1000), one pass can still leave it massively above budget, so the advertised token cap is not actually enforced and prompt size can remain over context constraints.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Memory Decay & Salience Pruning in STLTMemory

1 participant