From f7e0c0fd05e2f16d07df925a2d596d89a35be187 Mon Sep 17 00:00:00 2001 From: opspawn Date: Sun, 15 Feb 2026 11:30:55 +0000 Subject: [PATCH 1/5] docs: add task/session/event data lifecycle documentation Addresses #1199 by documenting: - Data model and table relationships - Default storage configuration (SQLite emptyDir) - Lack of built-in retention/cleanup mechanisms - Which tables grow fastest and are safe to prune - Production recommendations (PostgreSQL, retention policies, monitoring) Signed-off-by: opspawn --- docs/architecture/data-lifecycle.md | 211 ++++++++++++++++++++++++++++ 1 file changed, 211 insertions(+) create mode 100644 docs/architecture/data-lifecycle.md diff --git a/docs/architecture/data-lifecycle.md b/docs/architecture/data-lifecycle.md new file mode 100644 index 000000000..cb1f08086 --- /dev/null +++ b/docs/architecture/data-lifecycle.md @@ -0,0 +1,211 @@ +# Task, Session, and Event Data Lifecycle + +This document describes how kagent stores task, session, and event data, the default storage configuration, and recommendations for production deployments. + +## Overview + +kagent persists all task, session, and event data in a relational database managed by GORM. By default it uses SQLite backed by a memory-based `emptyDir` volume. PostgreSQL is also supported for production or high-availability deployments. + +## Data Model + +kagent stores the following entities: + +| Entity | Table | Description | +|--------|-------|-------------| +| **Session** | `session` | A conversation context between a user and an agent | +| **Task** | `task` | An A2A task (agent invocation) within a session | +| **Event** | `event` | A message or event within a session | +| **Agent** | `agent` | Agent configuration | +| **Push Notification** | `push_notification` | Push notification config for a task | +| **Feedback** | `feedback` | User feedback on agent responses | +| **Tool** | `tool` | Tool metadata from MCP servers | +| **ToolServer** | `toolserver` | Registered MCP tool servers | +| **LangGraph Checkpoint** | `lg_checkpoint` | LangGraph agent state checkpoints | +| **LangGraph Checkpoint Write** | `lg_checkpoint_write` | Individual write operations for checkpoints | +| **CrewAI Agent Memory** | `crewai_agent_memory` | Long-term memory for CrewAI agents | +| **CrewAI Flow State** | `crewai_flow_state` | Flow execution state for CrewAI agents | + +All tables include `created_at`, `updated_at`, and `deleted_at` (soft-delete) timestamp columns managed by GORM. + +### How Data Is Created + +1. **Sessions** are created when a user starts a new conversation with an agent via the API (`POST /api/sessions`). +2. **Tasks** are created when an A2A task is submitted within a session. The full task payload is serialized as JSON in the `data` column. +3. **Events** are created for each message exchanged during a session (user messages, agent responses, tool calls). The `data` column stores the serialized `protocol.Message`. +4. **Checkpoints** and **memory** records are created by the agent framework (LangGraph or CrewAI) during task execution to persist intermediate state. + +### Data Relationships + +```text +Agent (1) ──── (*) Session (1) ──── (*) Task + │ │ + └──── (*) Event └──── (*) PushNotification +``` + +> **Note:** There are no foreign key cascade constraints between sessions, tasks, and events. Deleting a session does **not** automatically delete its associated tasks or events. The only cascade constraint is on `Feedback.MessageID` (`OnDelete:CASCADE`). + +## Default Storage Configuration + +### Kubernetes (Helm Chart) + +By default, the Helm chart provisions SQLite on a **memory-backed `emptyDir`** volume: + +```yaml +# helm/kagent/values.yaml +database: + type: sqlite + sqlite: + databaseName: kagent.db +``` + +The deployment template mounts this as: + +```yaml +volumes: + - name: sqlite-volume + emptyDir: + sizeLimit: 500Mi + medium: Memory +``` + +- **Mount path:** `/sqlite-volume/` +- **Database file:** `/sqlite-volume/kagent.db` +- **Size limit:** 500Mi (from RAM) +- **Persistence:** Data is **lost** when the pod restarts + +This configuration is suitable for quick demos and local experimentation but **not for production use**. + +### Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `DATABASE_TYPE` | `sqlite` | Database backend (`sqlite` or `postgres`) | +| `SQLITE_DATABASE_PATH` | `/sqlite-volume/kagent.db` (Helm) or `./kagent.db` (binary) | Path to SQLite file | +| `POSTGRES_DATABASE_URL` | `postgres://postgres:kagent@pgsql-postgresql.kagent.svc.cluster.local:5432/postgres` | PostgreSQL connection URL | + +### CLI Flags + +| Flag | Default | Description | +|------|---------|-------------| +| `--database-type` | `sqlite` | Database backend | +| `--sqlite-database-path` | `./kagent.db` | SQLite file path | +| `--postgres-database-url` | (see above) | PostgreSQL connection URL | + +## Data Retention + +### Current Behavior + +**kagent has no built-in data retention, cleanup, or garbage collection mechanisms.** All task, session, event, checkpoint, and memory data grows indefinitely unless explicitly managed by the operator. + +Available manual deletion operations: + +| Operation | API Endpoint | What It Deletes | +|-----------|-------------|-----------------| +| Delete session | `DELETE /api/sessions/{session_id}` | Session record only (soft delete) | +| Delete task | `DELETE /api/tasks/{task_id}` | Task record only (soft delete) | + +> **Important:** GORM soft-deletes set `deleted_at` but do not remove rows from the database. The data still occupies storage. Soft-deleted records are excluded from queries but remain on disk. + +### What Grows Fastest + +In typical usage, the **events table** grows the fastest because every message in every conversation creates a new row with a full JSON payload. The **LangGraph checkpoint** tables can also grow significantly since each agent execution step may create checkpoint records. + +### Safe-to-Prune Data + +The following categories of data can generally be pruned without affecting active operations: + +- **Soft-deleted records** (`deleted_at IS NOT NULL`) — already excluded from queries +- **Old events** — historical conversation messages no longer needed for active sessions +- **Old checkpoints** (`lg_checkpoint`, `lg_checkpoint_write`) — only the most recent checkpoint is needed for resuming an agent; older checkpoints can be removed +- **Completed tasks** — tasks in a terminal state that are no longer referenced + +> **Caution:** Always verify that sessions are not actively in use before pruning their associated events. Deleting events from an active session will cause data loss. + +## Production Recommendations + +### 1. Switch to PostgreSQL + +For any deployment beyond local experimentation, migrate to PostgreSQL: + +```yaml +# values.yaml +database: + type: postgres + postgres: + url: postgres://user:password@your-postgres-host:5432/kagent +``` + +Benefits: + +- Persistent storage that survives pod restarts +- Better concurrency and query performance +- Standard backup and replication tools (`pg_dump`, WAL archiving, etc.) +- Ability to run multiple controller replicas (SQLite only supports `replicas: 1`) + +### 2. Switch SQLite to Disk-Backed Storage + +If you must use SQLite, at minimum switch from memory-backed to disk-backed storage: + +**Option A — Disk-backed emptyDir** (data still lost on pod deletion, but survives container restarts and does not consume RAM): + +```yaml +volumes: + - name: sqlite-volume + emptyDir: + sizeLimit: 10Gi +``` + +**Option B — PersistentVolumeClaim** (data persists across pod restarts): + +```yaml +controller: + volumes: + - name: sqlite-volume + persistentVolumeClaim: + claimName: kagent-sqlite-pvc + volumeMounts: + - name: sqlite-volume + mountPath: /sqlite-volume +``` + +### 3. Implement External Retention Policies + +Since kagent does not include built-in retention, operators should implement their own. Example SQL for PostgreSQL: + +```sql +-- Delete soft-deleted records older than 30 days +DELETE FROM event WHERE deleted_at IS NOT NULL AND deleted_at < NOW() - INTERVAL '30 days'; +DELETE FROM task WHERE deleted_at IS NOT NULL AND deleted_at < NOW() - INTERVAL '30 days'; +DELETE FROM session WHERE deleted_at IS NOT NULL AND deleted_at < NOW() - INTERVAL '30 days'; + +-- Prune events older than 90 days (adjust to your needs) +DELETE FROM event WHERE created_at < NOW() - INTERVAL '90 days'; + +-- Prune old checkpoints +DELETE FROM lg_checkpoint WHERE created_at < NOW() - INTERVAL '30 days'; +DELETE FROM lg_checkpoint_write WHERE created_at < NOW() - INTERVAL '30 days'; +``` + +For SQLite deployments where direct SQL access may be limited in Kubernetes, use the REST API to periodically list and delete old sessions. + +### 4. Monitor Storage Usage + +- Monitor the SQLite database file size or PostgreSQL table sizes +- Set alerts when storage exceeds thresholds (e.g., 80% of the `emptyDir` size limit) +- The default 500Mi memory-backed `emptyDir` can fill up quickly under normal usage + +### 5. Back Up Your Data + +- **PostgreSQL:** Use standard tools (`pg_dump`, WAL archiving, managed database backups) +- **SQLite with PVC:** Schedule periodic copies of the database file +- **SQLite with emptyDir:** Data cannot be reliably backed up (it is ephemeral) + +## Related Files + +- [models.go](../../go/pkg/database/models.go) — Database models (schema definitions) +- [client.go](../../go/internal/database/client.go) — Database client implementation +- [manager.go](../../go/internal/database/manager.go) — Database connection and initialization +- [service.go](../../go/internal/database/service.go) — Database service helpers +- [app.go](../../go/pkg/app/app.go) — Application configuration (database flags and env vars) +- [values.yaml](../../helm/kagent/values.yaml) — Helm chart default values +- [controller-deployment.yaml](../../helm/kagent/templates/controller-deployment.yaml) — Controller deployment template From d65a0f2b1b61c402b1c7a3c31b51e231efd1858f Mon Sep 17 00:00:00 2001 From: opspawn Date: Sun, 15 Feb 2026 13:30:35 +0000 Subject: [PATCH 2/5] fix: correct PostgreSQL default URL to match code The documented default for --postgres-database-url in the CLI Flags section was "(see above)" referencing the Helm chart value, but the actual CLI binary default in go/pkg/app/app.go:158 is postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud. Update the CLI Flags section to show the actual code default, and clarify the Environment Variables section to distinguish between Helm and binary defaults (matching the existing pattern for SQLITE_DATABASE_PATH). Signed-off-by: opspawn Signed-off-by: opspawn --- docs/architecture/data-lifecycle.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/architecture/data-lifecycle.md b/docs/architecture/data-lifecycle.md index cb1f08086..6e7b0949d 100644 --- a/docs/architecture/data-lifecycle.md +++ b/docs/architecture/data-lifecycle.md @@ -81,7 +81,7 @@ This configuration is suitable for quick demos and local experimentation but **n |----------|---------|-------------| | `DATABASE_TYPE` | `sqlite` | Database backend (`sqlite` or `postgres`) | | `SQLITE_DATABASE_PATH` | `/sqlite-volume/kagent.db` (Helm) or `./kagent.db` (binary) | Path to SQLite file | -| `POSTGRES_DATABASE_URL` | `postgres://postgres:kagent@pgsql-postgresql.kagent.svc.cluster.local:5432/postgres` | PostgreSQL connection URL | +| `POSTGRES_DATABASE_URL` | `postgres://postgres:kagent@pgsql-postgresql.kagent.svc.cluster.local:5432/postgres` (Helm) or `postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud` (binary) | PostgreSQL connection URL | ### CLI Flags @@ -89,7 +89,7 @@ This configuration is suitable for quick demos and local experimentation but **n |------|---------|-------------| | `--database-type` | `sqlite` | Database backend | | `--sqlite-database-path` | `./kagent.db` | SQLite file path | -| `--postgres-database-url` | (see above) | PostgreSQL connection URL | +| `--postgres-database-url` | `postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud` | PostgreSQL connection URL | ## Data Retention From ad26b4525054e234198646df944218d5223b8740 Mon Sep 17 00:00:00 2001 From: opspawn Date: Fri, 20 Feb 2026 22:00:32 +0000 Subject: [PATCH 3/5] chore: retrigger CI Signed-off-by: opspawn From c2b6d6ceb88242672cb2150357b2761e967cdd10 Mon Sep 17 00:00:00 2001 From: fl-sean03 Date: Fri, 27 Feb 2026 05:42:16 +0000 Subject: [PATCH 4/5] docs: rewrite data lifecycle to describe current state only Rewrites the data lifecycle doc to focus on what currently exists: tables, schemas, default configuration, and data flow. Removes all prescriptive production recommendations, SQL examples, and "you should" language per reviewer feedback. Signed-off-by: Sean Florez --- docs/architecture/data-lifecycle.md | 216 ++++++++-------------------- 1 file changed, 58 insertions(+), 158 deletions(-) diff --git a/docs/architecture/data-lifecycle.md b/docs/architecture/data-lifecycle.md index 6e7b0949d..1e6b54e41 100644 --- a/docs/architecture/data-lifecycle.md +++ b/docs/architecture/data-lifecycle.md @@ -1,40 +1,31 @@ # Task, Session, and Event Data Lifecycle -This document describes how kagent stores task, session, and event data, the default storage configuration, and recommendations for production deployments. - -## Overview - -kagent persists all task, session, and event data in a relational database managed by GORM. By default it uses SQLite backed by a memory-based `emptyDir` volume. PostgreSQL is also supported for production or high-availability deployments. +This document describes how kagent currently stores task, session, and event data. ## Data Model -kagent stores the following entities: - -| Entity | Table | Description | -|--------|-------|-------------| -| **Session** | `session` | A conversation context between a user and an agent | -| **Task** | `task` | An A2A task (agent invocation) within a session | -| **Event** | `event` | A message or event within a session | -| **Agent** | `agent` | Agent configuration | -| **Push Notification** | `push_notification` | Push notification config for a task | -| **Feedback** | `feedback` | User feedback on agent responses | -| **Tool** | `tool` | Tool metadata from MCP servers | -| **ToolServer** | `toolserver` | Registered MCP tool servers | -| **LangGraph Checkpoint** | `lg_checkpoint` | LangGraph agent state checkpoints | -| **LangGraph Checkpoint Write** | `lg_checkpoint_write` | Individual write operations for checkpoints | -| **CrewAI Agent Memory** | `crewai_agent_memory` | Long-term memory for CrewAI agents | -| **CrewAI Flow State** | `crewai_flow_state` | Flow execution state for CrewAI agents | +kagent persists data in a relational database managed by [GORM](https://gorm.io/). The schema is defined in Go structs and tables are created via GORM's `AutoMigrate` at startup (no versioned migrations). + +### Tables + +| Table | Primary Key | Description | +|-------|-------------|-------------| +| `agent` | `id` | Agent configuration. Stores agent type and an optional JSON `config` blob. | +| `session` | `(id, user_id)` | A conversation context. Optionally linked to an agent via `agent_id`. | +| `task` | `id` | An A2A task within a session. The full `protocol.Task` is JSON-serialized in the `data` column. | +| `event` | `(id, user_id)` | A message or event within a session. The `data` column stores a JSON-serialized `protocol.Message`. Indexed on `session_id`. | +| `push_notification` | `id` | Push notification configuration for a task. Indexed on `task_id`. | +| `feedback` | `(id, user_id)` | User feedback on agent responses. Has an `OnDelete:CASCADE` constraint on `message_id`. | +| `tool` | `(id, server_name, group_kind)` | Tool metadata discovered from MCP servers. | +| `toolserver` | `(name, group_kind)` | Registered MCP tool servers. | +| `lg_checkpoint` | `(user_id, thread_id, checkpoint_ns, checkpoint_id)` | LangGraph agent state checkpoints. | +| `lg_checkpoint_write` | `(user_id, thread_id, checkpoint_ns, checkpoint_id, write_idx)` | Individual write operations for LangGraph checkpoints. | +| `crewai_agent_memory` | `(user_id, thread_id)` | Long-term memory for CrewAI agents. | +| `crewai_flow_state` | `(user_id, thread_id, method_name)` | Flow execution state for CrewAI agents. | All tables include `created_at`, `updated_at`, and `deleted_at` (soft-delete) timestamp columns managed by GORM. -### How Data Is Created - -1. **Sessions** are created when a user starts a new conversation with an agent via the API (`POST /api/sessions`). -2. **Tasks** are created when an A2A task is submitted within a session. The full task payload is serialized as JSON in the `data` column. -3. **Events** are created for each message exchanged during a session (user messages, agent responses, tool calls). The `data` column stores the serialized `protocol.Message`. -4. **Checkpoints** and **memory** records are created by the agent framework (LangGraph or CrewAI) during task execution to persist intermediate state. - -### Data Relationships +### Relationships ```text Agent (1) ──── (*) Session (1) ──── (*) Task @@ -42,13 +33,17 @@ Agent (1) ──── (*) Session (1) ──── (*) Task └──── (*) Event └──── (*) PushNotification ``` -> **Note:** There are no foreign key cascade constraints between sessions, tasks, and events. Deleting a session does **not** automatically delete its associated tasks or events. The only cascade constraint is on `Feedback.MessageID` (`OnDelete:CASCADE`). +There are no foreign key cascade constraints between sessions, tasks, and events. Deleting a session does not automatically delete its tasks or events. The only cascade constraint is `Feedback.MessageID` (`OnDelete:CASCADE`). + +### Write Semantics + +All `Store*` methods use GORM's `OnConflict{UpdateAll: true}` clause, giving upsert behavior — a record is created if it does not exist, or updated in place if it does. ## Default Storage Configuration -### Kubernetes (Helm Chart) +### Kubernetes (Helm) -By default, the Helm chart provisions SQLite on a **memory-backed `emptyDir`** volume: +The Helm chart defaults to SQLite on a **memory-backed `emptyDir`** volume: ```yaml # helm/kagent/values.yaml @@ -58,7 +53,7 @@ database: databaseName: kagent.db ``` -The deployment template mounts this as: +The controller deployment mounts: ```yaml volumes: @@ -66,146 +61,51 @@ volumes: emptyDir: sizeLimit: 500Mi medium: Memory -``` - -- **Mount path:** `/sqlite-volume/` -- **Database file:** `/sqlite-volume/kagent.db` -- **Size limit:** 500Mi (from RAM) -- **Persistence:** Data is **lost** when the pod restarts - -This configuration is suitable for quick demos and local experimentation but **not for production use**. - -### Environment Variables - -| Variable | Default | Description | -|----------|---------|-------------| -| `DATABASE_TYPE` | `sqlite` | Database backend (`sqlite` or `postgres`) | -| `SQLITE_DATABASE_PATH` | `/sqlite-volume/kagent.db` (Helm) or `./kagent.db` (binary) | Path to SQLite file | -| `POSTGRES_DATABASE_URL` | `postgres://postgres:kagent@pgsql-postgresql.kagent.svc.cluster.local:5432/postgres` (Helm) or `postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud` (binary) | PostgreSQL connection URL | - -### CLI Flags - -| Flag | Default | Description | -|------|---------|-------------| -| `--database-type` | `sqlite` | Database backend | -| `--sqlite-database-path` | `./kagent.db` | SQLite file path | -| `--postgres-database-url` | `postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud` | PostgreSQL connection URL | - -## Data Retention - -### Current Behavior - -**kagent has no built-in data retention, cleanup, or garbage collection mechanisms.** All task, session, event, checkpoint, and memory data grows indefinitely unless explicitly managed by the operator. - -Available manual deletion operations: - -| Operation | API Endpoint | What It Deletes | -|-----------|-------------|-----------------| -| Delete session | `DELETE /api/sessions/{session_id}` | Session record only (soft delete) | -| Delete task | `DELETE /api/tasks/{task_id}` | Task record only (soft delete) | - -> **Important:** GORM soft-deletes set `deleted_at` but do not remove rows from the database. The data still occupies storage. Soft-deleted records are excluded from queries but remain on disk. - -### What Grows Fastest - -In typical usage, the **events table** grows the fastest because every message in every conversation creates a new row with a full JSON payload. The **LangGraph checkpoint** tables can also grow significantly since each agent execution step may create checkpoint records. - -### Safe-to-Prune Data -The following categories of data can generally be pruned without affecting active operations: - -- **Soft-deleted records** (`deleted_at IS NOT NULL`) — already excluded from queries -- **Old events** — historical conversation messages no longer needed for active sessions -- **Old checkpoints** (`lg_checkpoint`, `lg_checkpoint_write`) — only the most recent checkpoint is needed for resuming an agent; older checkpoints can be removed -- **Completed tasks** — tasks in a terminal state that are no longer referenced - -> **Caution:** Always verify that sessions are not actively in use before pruning their associated events. Deleting events from an active session will cause data loss. - -## Production Recommendations - -### 1. Switch to PostgreSQL - -For any deployment beyond local experimentation, migrate to PostgreSQL: - -```yaml -# values.yaml -database: - type: postgres - postgres: - url: postgres://user:password@your-postgres-host:5432/kagent +volumeMounts: + - name: sqlite-volume + mountPath: /sqlite-volume ``` -Benefits: +The database file is `/sqlite-volume/kagent.db`. Because `medium: Memory` maps to tmpfs, **all data is lost when the pod restarts or is rescheduled.** -- Persistent storage that survives pod restarts -- Better concurrency and query performance -- Standard backup and replication tools (`pg_dump`, WAL archiving, etc.) -- Ability to run multiple controller replicas (SQLite only supports `replicas: 1`) +### PostgreSQL -### 2. Switch SQLite to Disk-Backed Storage +Setting `database.type: postgres` switches to an external PostgreSQL instance. The default connection URL in the Helm chart is: -If you must use SQLite, at minimum switch from memory-backed to disk-backed storage: - -**Option A — Disk-backed emptyDir** (data still lost on pod deletion, but survives container restarts and does not consume RAM): - -```yaml -volumes: - - name: sqlite-volume - emptyDir: - sizeLimit: 10Gi ``` - -**Option B — PersistentVolumeClaim** (data persists across pod restarts): - -```yaml -controller: - volumes: - - name: sqlite-volume - persistentVolumeClaim: - claimName: kagent-sqlite-pvc - volumeMounts: - - name: sqlite-volume - mountPath: /sqlite-volume +postgres://postgres:kagent@pgsql-postgresql.kagent.svc.cluster.local:5432/postgres ``` -### 3. Implement External Retention Policies +### Configuration -Since kagent does not include built-in retention, operators should implement their own. Example SQL for PostgreSQL: +| Source | Variable / Flag | Default | +|--------|----------------|---------| +| Env | `DATABASE_TYPE` | `sqlite` | +| Env | `SQLITE_DATABASE_PATH` | `/sqlite-volume/kagent.db` (Helm) or `./kagent.db` (binary) | +| Env | `POSTGRES_DATABASE_URL` | *(see Helm values)* | +| Flag | `--database-type` | `sqlite` | +| Flag | `--sqlite-database-path` | `./kagent.db` | +| Flag | `--postgres-database-url` | `postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud` | -```sql --- Delete soft-deleted records older than 30 days -DELETE FROM event WHERE deleted_at IS NOT NULL AND deleted_at < NOW() - INTERVAL '30 days'; -DELETE FROM task WHERE deleted_at IS NOT NULL AND deleted_at < NOW() - INTERVAL '30 days'; -DELETE FROM session WHERE deleted_at IS NOT NULL AND deleted_at < NOW() - INTERVAL '30 days'; - --- Prune events older than 90 days (adjust to your needs) -DELETE FROM event WHERE created_at < NOW() - INTERVAL '90 days'; - --- Prune old checkpoints -DELETE FROM lg_checkpoint WHERE created_at < NOW() - INTERVAL '30 days'; -DELETE FROM lg_checkpoint_write WHERE created_at < NOW() - INTERVAL '30 days'; -``` - -For SQLite deployments where direct SQL access may be limited in Kubernetes, use the REST API to periodically list and delete old sessions. +## Data Retention -### 4. Monitor Storage Usage +kagent has no built-in data retention, cleanup, or garbage collection. All rows grow indefinitely. -- Monitor the SQLite database file size or PostgreSQL table sizes -- Set alerts when storage exceeds thresholds (e.g., 80% of the `emptyDir` size limit) -- The default 500Mi memory-backed `emptyDir` can fill up quickly under normal usage +The API exposes soft-delete endpoints: -### 5. Back Up Your Data +| Endpoint | Effect | +|----------|--------| +| `DELETE /api/sessions/{id}` | Sets `deleted_at` on the session row (does not cascade) | +| `DELETE /api/tasks/{id}` | Sets `deleted_at` on the task row | -- **PostgreSQL:** Use standard tools (`pg_dump`, WAL archiving, managed database backups) -- **SQLite with PVC:** Schedule periodic copies of the database file -- **SQLite with emptyDir:** Data cannot be reliably backed up (it is ephemeral) +Soft-deleted rows remain on disk; they are excluded from queries but not removed. ## Related Files -- [models.go](../../go/pkg/database/models.go) — Database models (schema definitions) -- [client.go](../../go/internal/database/client.go) — Database client implementation -- [manager.go](../../go/internal/database/manager.go) — Database connection and initialization -- [service.go](../../go/internal/database/service.go) — Database service helpers -- [app.go](../../go/pkg/app/app.go) — Application configuration (database flags and env vars) -- [values.yaml](../../helm/kagent/values.yaml) — Helm chart default values -- [controller-deployment.yaml](../../helm/kagent/templates/controller-deployment.yaml) — Controller deployment template +- [`models.go`](../../go/pkg/database/models.go) — GORM struct definitions (schema source of truth) +- [`client.go`](../../go/internal/database/client.go) — Database client implementation +- [`manager.go`](../../go/internal/database/manager.go) — Database connection and `AutoMigrate` +- [`app.go`](../../go/pkg/app/app.go) — CLI flags and environment variable mapping +- [`values.yaml`](../../helm/kagent/values.yaml) — Helm chart defaults +- [`controller-deployment.yaml`](../../helm/kagent/templates/controller-deployment.yaml) — Volume and mount definitions From e316784c1f273c195a0e7b008b3f164d837140bf Mon Sep 17 00:00:00 2001 From: fl-sean03 Date: Fri, 27 Feb 2026 10:48:19 +0000 Subject: [PATCH 5/5] docs: focus data lifecycle on current state per reviewer feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Re: EItanya's review — "start with just the current state" Restructured the data lifecycle doc to be purely descriptive: - Added "Where Data Lives" section distinguishing Kubernetes CRDs (etcd) from runtime database tables — kagent stores data in both places - Added explicit "Deletion Behavior" section documenting what happens when sessions, tasks, messages, and CRDs are deleted (no cascades between sessions/tasks/events; only Feedback cascades on message delete; CRD deletion does not clean up database rows) - Added "Conversation History" section describing how events accumulate per session with no automatic pruning - Removed all prescriptive language ("not for production use", "you should") - Kept only factual descriptions of the current implementation Signed-off-by: opspawn --- docs/architecture/data-lifecycle.md | 54 ++++++++++++++++------------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/docs/architecture/data-lifecycle.md b/docs/architecture/data-lifecycle.md index 1e6b54e41..da2b08347 100644 --- a/docs/architecture/data-lifecycle.md +++ b/docs/architecture/data-lifecycle.md @@ -1,19 +1,23 @@ -# Task, Session, and Event Data Lifecycle +# Data Lifecycle -This document describes how kagent currently stores task, session, and event data. +This document describes the current state of data storage in kagent: what data is created, where it lives, and what happens when resources are deleted. -## Data Model +## Where Data Lives -kagent persists data in a relational database managed by [GORM](https://gorm.io/). The schema is defined in Go structs and tables are created via GORM's `AutoMigrate` at startup (no versioned migrations). +kagent stores data in two places: -### Tables +1. **Kubernetes (etcd)** — Agent, ToolServer, RemoteMCPServer, ModelConfig, ModelProviderConfig, and Memory custom resources are stored as CRDs managed by the Kubernetes API server. These follow standard Kubernetes lifecycle: they persist until explicitly deleted via `kubectl delete` or the API, and are subject to the cluster's etcd storage. + +2. **Relational database** — Runtime data (sessions, tasks, events, checkpoints, feedback) is stored in a relational database managed by [GORM](https://gorm.io/). Tables are created via `AutoMigrate` at startup (no versioned migrations). + +## Database Tables | Table | Primary Key | Description | |-------|-------------|-------------| -| `agent` | `id` | Agent configuration. Stores agent type and an optional JSON `config` blob. | +| `agent` | `id` | Agent configuration synced from CRDs. Stores agent type and an optional JSON `config` blob. | | `session` | `(id, user_id)` | A conversation context. Optionally linked to an agent via `agent_id`. | | `task` | `id` | An A2A task within a session. The full `protocol.Task` is JSON-serialized in the `data` column. | -| `event` | `(id, user_id)` | A message or event within a session. The `data` column stores a JSON-serialized `protocol.Message`. Indexed on `session_id`. | +| `event` | `(id, user_id)` | A message or event within a session. Stores a JSON-serialized `protocol.Message`. Indexed on `session_id`. | | `push_notification` | `id` | Push notification configuration for a task. Indexed on `task_id`. | | `feedback` | `(id, user_id)` | User feedback on agent responses. Has an `OnDelete:CASCADE` constraint on `message_id`. | | `tool` | `(id, server_name, group_kind)` | Tool metadata discovered from MCP servers. | @@ -33,12 +37,27 @@ Agent (1) ──── (*) Session (1) ──── (*) Task └──── (*) Event └──── (*) PushNotification ``` -There are no foreign key cascade constraints between sessions, tasks, and events. Deleting a session does not automatically delete its tasks or events. The only cascade constraint is `Feedback.MessageID` (`OnDelete:CASCADE`). +There are no foreign key cascade constraints between sessions, tasks, and events. The only cascade constraint is `Feedback.MessageID` (`OnDelete:CASCADE`). ### Write Semantics All `Store*` methods use GORM's `OnConflict{UpdateAll: true}` clause, giving upsert behavior — a record is created if it does not exist, or updated in place if it does. +## Deletion Behavior + +- **Deleting a session** (`DELETE /api/sessions/{id}`): Sets `deleted_at` on the session row. Associated tasks and events are **not** deleted or modified. +- **Deleting a task** (`DELETE /api/tasks/{id}`): Sets `deleted_at` on the task row. Push notifications for the task are **not** deleted. +- **Deleting a message**: Sets `deleted_at` on the event row. Feedback referencing the message **is** cascade-deleted (the only cascade in the schema). +- **Deleting a Kubernetes CRD** (e.g., `kubectl delete agent my-agent`): Removes the resource from etcd. The corresponding database `agent` row and its sessions/events are **not** automatically cleaned up. + +Soft-deleted rows remain in the database. They are excluded from queries by GORM's default scoping but are not physically removed. + +## Conversation History + +Conversation history is stored as `event` rows linked to a session via `session_id`. Each event contains a JSON-serialized `protocol.Message` (user messages, agent responses, tool calls). Events grow with each interaction and are never automatically pruned or rotated. + +LangGraph checkpoints (`lg_checkpoint`, `lg_checkpoint_write`) store intermediate agent state during task execution. These also accumulate over time with no automatic cleanup. + ## Default Storage Configuration ### Kubernetes (Helm) @@ -61,13 +80,9 @@ volumes: emptyDir: sizeLimit: 500Mi medium: Memory - -volumeMounts: - - name: sqlite-volume - mountPath: /sqlite-volume ``` -The database file is `/sqlite-volume/kagent.db`. Because `medium: Memory` maps to tmpfs, **all data is lost when the pod restarts or is rescheduled.** +The database file is `/sqlite-volume/kagent.db`. Because `medium: Memory` maps to tmpfs, all data is lost when the pod restarts or is rescheduled. ### PostgreSQL @@ -77,7 +92,7 @@ Setting `database.type: postgres` switches to an external PostgreSQL instance. T postgres://postgres:kagent@pgsql-postgresql.kagent.svc.cluster.local:5432/postgres ``` -### Configuration +### Configuration Reference | Source | Variable / Flag | Default | |--------|----------------|---------| @@ -90,16 +105,7 @@ postgres://postgres:kagent@pgsql-postgresql.kagent.svc.cluster.local:5432/postgr ## Data Retention -kagent has no built-in data retention, cleanup, or garbage collection. All rows grow indefinitely. - -The API exposes soft-delete endpoints: - -| Endpoint | Effect | -|----------|--------| -| `DELETE /api/sessions/{id}` | Sets `deleted_at` on the session row (does not cascade) | -| `DELETE /api/tasks/{id}` | Sets `deleted_at` on the task row | - -Soft-deleted rows remain on disk; they are excluded from queries but not removed. +kagent has no built-in data retention, cleanup, or garbage collection. All rows grow indefinitely until manually deleted through the API or direct database access. ## Related Files