Skip to content

Consider per-turn memory injection via pi.on("context") to preserve prompt caching #23

@gebeer

Description

@gebeer

Currently pi-memory has two injection modes:

  1. Default startup injection:

    • On session_start, injects one hidden custom_message with fallback memory.
    • Good for stable prompt caching.
    • But it has no user prompt yet, so it must inject broad memory buckets (pref.*, tool.*, project facts, lessons, etc.).
    • This can load memories that are not relevant to the actual task.
  2. perTurnInjection: true:

    • Skips the startup dump.
    • On before_agent_start, searches memory using the current user prompt.
    • Injects a more relevant <memory> block.
    • But it does so by appending to event.systemPrompt, which mutates the system prompt every turn and likely hurts provider prompt caching.

Could we add a third mode that uses Pi’s context hook instead?

Pi docs describe:

pi.on("context", async (event, ctx) => {
  // event.messages - deep copy, safe to modify
  return { messages: modifiedMessages };
});

Idea:

  • Keep the system prompt stable.
  • On each LLM call, retrieve relevant memory from the current user prompt / latest context.
  • Insert a hidden/custom memory message into event.messages, ideally immediately before the latest user message.
  • Return modified messages for that call only.
  • Do not persist duplicate memory messages into session history.
  • Keep the latest user message as the final message, avoiding the old v1.1.x issue where injected memory landed after the user prompt and the model responded to the memory block instead of the user.

Possible config shape:

{
  "memory": {
    "injectionMode": "startup" | "systemPromptPerTurn" | "contextPerTurn",
    "lessonInjection": "selective"
  }
}

Or perhaps keep backward compatibility:

{
  "memory": {
    "perTurnInjection": true,
    "perTurnInjectionTarget": "context"
  }
}

Benefits:

  • More relevant than startup fallback injection.
  • Avoids mutating systemPrompt each turn.
  • Should preserve provider/system prompt cache behavior better.
  • Avoids persistent session-history spam if done through context.
  • Preserves correct ordering by injecting before the current user message.

Open questions:

  • Does Pi’s provider caching benefit from a stable system prompt if messages change per turn? Presumably yes, at least compared to changing the system prompt itself.
  • Should context-per-turn memory be hidden from session history entirely, or visible/debuggable somehow?
  • Should retrieval use only event.prompt, or also inspect recent messages in event.messages?
  • Should we dedupe memories already injected in earlier turns, or is per-call ephemeral injection enough?

I’m happy to prototype this if the approach sounds compatible with pi’s lifecycle and you haven't tried that yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions