Consider per-turn memory injection via pi.on("context") to preserve prompt caching

Currently pi-memory has two injection modes:

1. Default startup injection:
   - On `session_start`, injects one hidden `custom_message` with fallback memory.
   - Good for stable prompt caching.
   - But it has no user prompt yet, so it must inject broad memory buckets (`pref.*`, `tool.*`, project facts, lessons, etc.).
   - This can load memories that are not relevant to the actual task.

2. `perTurnInjection: true`:
   - Skips the startup dump.
   - On `before_agent_start`, searches memory using the current user prompt.
   - Injects a more relevant `<memory>` block.
   - But it does so by appending to `event.systemPrompt`, which mutates the system prompt every turn and likely hurts provider prompt caching.

Could we add a third mode that uses Pi’s `context` hook instead?

Pi docs describe:

```ts
pi.on("context", async (event, ctx) => {
  // event.messages - deep copy, safe to modify
  return { messages: modifiedMessages };
});
```

Idea:

- Keep the system prompt stable.
- On each LLM call, retrieve relevant memory from the current user prompt / latest context.
- Insert a hidden/custom memory message into `event.messages`, ideally immediately **before the latest user message**.
- Return modified messages for that call only.
- Do not persist duplicate memory messages into session history.
- Keep the latest user message as the final message, avoiding the old v1.1.x issue where injected memory landed after the user prompt and the model responded to the memory block instead of the user.

Possible config shape:

```json
{
  "memory": {
    "injectionMode": "startup" | "systemPromptPerTurn" | "contextPerTurn",
    "lessonInjection": "selective"
  }
}
```

Or perhaps keep backward compatibility:

```json
{
  "memory": {
    "perTurnInjection": true,
    "perTurnInjectionTarget": "context"
  }
}
```

Benefits:

- More relevant than startup fallback injection.
- Avoids mutating `systemPrompt` each turn.
- Should preserve provider/system prompt cache behavior better.
- Avoids persistent session-history spam if done through `context`.
- Preserves correct ordering by injecting before the current user message.

Open questions:

- Does Pi’s provider caching benefit from a stable system prompt if messages change per turn? Presumably yes, at least compared to changing the system prompt itself.
- Should context-per-turn memory be hidden from session history entirely, or visible/debuggable somehow?
- Should retrieval use only `event.prompt`, or also inspect recent messages in `event.messages`?
- Should we dedupe memories already injected in earlier turns, or is per-call ephemeral injection enough?

I’m happy to prototype this if the approach sounds compatible with pi’s lifecycle and you haven't tried that yet.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider per-turn memory injection via pi.on("context") to preserve prompt caching #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Consider per-turn memory injection via pi.on("context") to preserve prompt caching #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions