Skip to content

fix(mates): delta-based DM digests to prevent content loss past 6k chars#282

Draft
leiyangyou wants to merge 1 commit intochadbyte:mainfrom
leiyangyou:fix/delta-dm-digests
Draft

fix(mates): delta-based DM digests to prevent content loss past 6k chars#282
leiyangyou wants to merge 1 commit intochadbyte:mainfrom
leiyangyou:fix/delta-dm-digests

Conversation

@leiyangyou
Copy link
Copy Markdown
Contributor

Problem

The current DM digest system collects the full conversation from index 0 on every turn, capped at 6000 chars. Once a DM session exceeds ~5 turns:

  • Every subsequent digest captures the same first 6k chars
  • New turns are truncated and never reach the memory summary
  • The debounce (_dmDigestPending) can skip turns entirely
  • The lastResponseText fallback only leaks ~200 chars per digest

This hasn't been a problem yet because all existing digests are from @mentions (which use a separate code path with fixed 2k per question/response), but it would affect any long-running DM session with a mate.

Proposed approach: delta-based collection

Instead of re-reading the full conversation each time, only collect new turns since the last successful digest. This means:

  • Each delta is small (1-3 turns), well under the 6k cap
  • The concurrency debounce naturally batches skipped turns into the next delta
  • No content is ever lost

Key design decisions (want your input on these):

  1. Delta vs full-conversation: Each digest now covers a small slice rather than the full conversation. This changes the nature of what gets stored in session-digests.jsonl — fragments vs self-contained summaries. BM25 search quality may be affected since individual entries have less context.

  2. Prior summary injection: To give Haiku context for small deltas like "User: sounds good / Mate: proceeding", we inject the existing memory-summary.md into the digest worker prompt. This lets Haiku produce properly contextualized digests but increases prompt size.

  3. Checkpoint persistence: We write digest_checkpoint entries into session.history so the index survives server restarts. These are filtered from replayHistory so they don't leak to the UI.

  4. onDone(err) convention: The checkpoint only advances on success (err === null). On Haiku failures or parse errors, the index stays put so the turns can be retried.

Changes

  • lib/project.js: Delta collection with per-session index, prior summary injection, parse failure guard, history trim guard, onDone(err) for digest worker callbacks
  • lib/sessions.js: Filter digest_checkpoint from replayHistory

Open questions

  • Is delta-based the right direction, or was the full-conversation approach intentional?
  • Should digests be time-debounced (flush after N minutes of quiet) rather than concurrency-debounced?
  • Is storing checkpoints in session history the right place, or should this be a separate file?

Test plan

  • Verify DM digest produces entries in session-digests.jsonl
  • Verify long DM sessions (>6k chars) still capture all turns
  • Verify session switch doesn't misalign the digest index
  • Verify server restart recovers the checkpoint from history
  • Verify history rewind resets the digest index
  • Verify digest_checkpoint entries don't appear in chat UI
  • Verify mention digests are unaffected

@leiyangyou leiyangyou force-pushed the fix/delta-dm-digests branch from 19f179e to 4116b46 Compare April 7, 2026 21:47
@chadbyte
Copy link
Copy Markdown
Owner

chadbyte commented Apr 8, 2026

This is heading in a great direction. Delta-based digests feel like the right next step for session memory. Looking forward to it.

I've been watching your PRs and it's clear you use Mates a lot. I'm curious, walk me through what a typical session with a Mate looks like for you?

@chadbyte chadbyte marked this pull request as ready for review April 8, 2026 06:01
@chadbyte chadbyte marked this pull request as draft April 8, 2026 06:01
@leiyangyou
Copy link
Copy Markdown
Contributor Author

Thanks! Honestly my usage is pretty varied — I jump between a few different projects, and the thing I appreciate most is how fluidly I can switch between different projects and sessions. It's a much better flow than juggling CLI tabs.

In the downtime between projects I'll circle back to Clay to add features or fixes I've been wanting. I also access Clay remotely through mobile sometimes, which has been surprisingly useful.

Mates is probably the most interesting feature to me right now. I'm still exploring what's possible with it — the delta digest thing wasn't really a bug I hit, more something Claude Code spotted when I was poking around to understand what mates can do, and it seemed worth digging into.

@leiyangyou
Copy link
Copy Markdown
Contributor Author

Also worth mentioning — I do intend at some point to invite some of my colleagues into Clay. The collaboration angle is a big plus. Things like working within the same project together, or opening things up for non-tech folks to do data analysis on a project I'm maintaining through Claude Code. That's a really compelling use case.

@leiyangyou leiyangyou force-pushed the fix/delta-dm-digests branch 3 times, most recently from b9f4590 to 94a85f8 Compare April 10, 2026 11:25
Only collect new turns since the last successful digest instead of
re-reading from index 0. Checkpoint persisted in session history
so the index survives restarts. Prior memory summary injected into
digest prompt for context on small deltas.
@leiyangyou leiyangyou force-pushed the fix/delta-dm-digests branch from 94a85f8 to 2d45be8 Compare April 10, 2026 11:31
@chadbyte
Copy link
Copy Markdown
Owner

Thanks for the thorough writeup and the implementation. Answers to your open questions:

1. Delta vs full-conversation: The full-conversation approach wasn't intentional design, just the initial implementation. Delta is the right direction. No concerns here.

2. Debounce: Current concurrency-based approach is fine. With the delta change, skipped turns get picked up by the next delta via the index, so there's no content loss. Time-based debounce would add timer complexity without a clear benefit.

3. Checkpoint in session history: This is the right place. Keeping it in history means it naturally stays in sync with trim/rewind operations, and you've already handled the trim reset in project-sessions.js. A separate file would introduce sync issues between the checkpoint and the actual history state.

Looks good overall. Ready to take it out of draft when the test plan is checked off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants