Skip to content

Improve topic_key generation: sorted segments + hierarchical sub-paths #118

@carlosjortiz

Description

@carlosjortiz

📋 Pre-flight Checks

  • I have searched existing issues and this is not a duplicate
  • I understand this issue needs status:approved before a PR can be opened

🔍 Problem Description

The current topic_key generation in SuggestTopicKey() has two limitations:

1. Word order sensitivity causes silent duplicates

The segment portion of the key preserves the original word order from the title. This means observations about the same topic can produce different keys depending on phrasing:

  • "Auth model refactor"architecture/auth-model-refactor
  • "Refactor auth model"architecture/refactor-auth-model

Since topic_key drives upsert logic, these become two separate observations instead of updating the same one. Over time this silently fragments memory, especially with AI agents as primary writers.

2. Flat structure limits organization

The current format is family/segment (e.g., architecture/auth-model). There's no way to group related observations under a domain:

  • architecture/auth-model and architecture/auth-middleware have no explicit relationship
  • Querying "everything about auth architecture" requires FTS, not a simple prefix match

💡 Proposed Solution

A. Sort words alphabetically in segment generation

In normalizeTopicSegment(), sort the words before joining:

words := strings.Fields(strings.ToLower(segment))
sort.Strings(words)
segment = strings.Join(words, "-")

Before: "Auth model refactor"auth-model-refactor
After: "Auth model refactor"auth-model-refactor (same)
After: "Refactor auth model"auth-model-refactor (same! was different before)

This makes upsert matching robust regardless of how the agent phrases the title.

B. Support hierarchical sub-paths (family/domain/segment)

Allow an optional middle level in the key hierarchy:

architecture/auth/model
architecture/auth/middleware
bug/payments/nil-panic
decision/api/versioning-strategy

This enables prefix queries like:

SELECT * FROM observations WHERE topic_key LIKE 'architecture/auth/%'

No schema change requiredtopic_key is already a TEXT column. This is purely a convention + generation logic change.

📦 Affected Area

Store (database, queries)

🔄 Alternatives Considered

Considered adding a tags column with a junction table for flexible categorization. Discarded because the primary writers are AI agents, which are inconsistent at tagging (e.g., auth vs authentication vs login). Without a human curation loop, tags would add noise rather than improve search. FTS5 + topic_key already covers the search use case without the extra schema complexity.

📎 Additional Context

Recommendation

I recommend implementing both solutions together — they complement each other well:

  • A (sorted segments) eliminates silent duplicates at the source
  • B (sub-paths) adds the organizational layer that makes prefix queries useful

Implementing them separately would work, but together they deliver a cohesive improvement to topic_key reliability and discoverability.

Volunteer

I'm happy to take this on and submit a PR for both changes. The scope is well-defined and contained.

Impact

  • Reduces duplicate observations caused by inconsistent agent phrasing
  • Improves memory organization with hierarchical grouping
  • Zero schema changes — works within existing TEXT column
  • Low implementation effort — core change is ~10 lines in normalizeTopicSegment() + convention updates

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions