Skip to content

Proposal: Message Id Refactor (DO NOT MERGE)#122

Open
L-u-k-e wants to merge 14 commits intoMirix-AI:mainfrom
LiaoJianhe:lp/message-id-refactor
Open

Proposal: Message Id Refactor (DO NOT MERGE)#122
L-u-k-e wants to merge 14 commits intoMirix-AI:mainfrom
LiaoJianhe:lp/message-id-refactor

Conversation

@L-u-k-e
Copy link
Copy Markdown
Contributor

@L-u-k-e L-u-k-e commented Mar 18, 2026

https://jira.cloud.intuit.com/jira/software/c/projects/VEPAGE/boards/3735?selectedIssue=VEPAGE-435

Problem being addressed:

MIRIX manages agent conversation history through a message_ids JSON column on the agents table. This is a flat array of message IDs representing the agent's "in-context memory." The design has several scaling and correctness problems:

Scaling bottleneck. One meta-agent (and its sub-agents) exists per client, shared across all end-users. The message_ids array on a single agent row accumulates message IDs for every user. Every message operation (append, trim, clear) requires a read-modify-write of this array on the agent row, creating write contention when multiple workers process messages for different users concurrently.

Lost message state updates under concurrent load. Because the message_ids array is read-modify-written as a whole, concurrent processing of messages for different users on the same agent causes lost updates. Worker A reads message_ids, Worker B reads the same message_ids, both append their respective message IDs, and whichever writes last silently overwrites the other's changes. This is a classic lost-update anomaly. Under production load — where a single agent processes messages for hundreds of millions of users simultaneously — this means message references are silently dropped, leading to missing conversation context, orphaned message rows, and non-deterministic agent behavior.

Eager loading hazard. The messages relationship on the Agent ORM uses lazy="selectin", which means loading an agent eagerly loads all of its messages into memory. For an agent serving millions of users, this is a ticking time bomb.

Redundant system message storage. The system prompt is stored twice: once in agent.system (a column on the agent row) and once as a Message row at position 0 of message_ids. The code reads the system prompt from the message row, enriches it with memories, and sends it to the LLM. The agent.system column is the source of truth but the message row is what gets used at runtime.

Write-then-delete churn. For the ECMS memory extraction path (the production use case), every agent step persists messages to the database, then immediately deletes them when CLEAR_HISTORY_AFTER_MEMORY_UPDATE fires. Sub-agents each store their own copies of the input messages, their LLM responses, tool results, and heartbeat messages -- all of which are deleted moments later. This is pure I/O overhead.


View the Plan as formatted markdown here: https://github.com/LiaoJianhe/MIRIX_Intuit/blob/2b21637b5acbf4df6d88b32dc7930d749452518f/.cursor/plans/message_ids_refactor_plan_9eb1b08f.plan.md

Use the diff to make comments


No mid-step DB writes are needed. The summarizer works entirely on the in-memory list, and the final state — including the summary message — is what gets persisted at the end.

For retention = 0, summarization never fires — there are no retained messages to accumulate, and memory extraction agents run a single step.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually am leaning towards simplifying the system to remove summarization. There is no need to retain more than like 10 chat messages for the purposes of memory extraction. Summarization should never be necessary with most models.

I think we'd have a more maintainable and easy to understand system by removing summarization. WDYT?

LiaoJianhe and others added 13 commits March 20, 2026 09:38
* Fix the missing greenlet issue

* Fix the Github CI pipeline errors

* Further fix Github CI pipeline errors on Redis test
Resolved conflicts preserving the message_ids refactor changes:
- agent.py: kept removal of should_clear_history block
- orm/agent.py: took re-org's greenlet-safe tools local var, dropped message_ids
- agent_manager.py: kept hard_delete-based reset_messages; restored _generate_initial_message_sequence and append_initial_message_sequence_to_in_context_messages from re-org; added missing MessageCreate/PydanticMessage imports
- client_manager.py: kept removal of message_ids manipulation
- message_manager.py: kept get_messages_for_agent_user over removed cleanup_all_detached_messages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sage_sequence_to_in_context_messages

These were accidentally restored from re-org during merge conflict resolution.
No callers remain (sdk.py no longer passes add_default_initial_messages).
Also removes the dead if add_default_initial_messages block from reset_messages
and the now-unused MessageCreate/PydanticMessage imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants