Multi repo hardening by jarmen423 · Pull Request #1 · jarmen423/codememory

jarmen423 · 2026-04-08T01:30:20Z

No description provided.

gemini-code-assist

Code Review

This pull request implements multi-repository support for CodeMemory by introducing repo_id partitioning within a shared Neo4j database. Key changes include migrating global uniqueness constraints to composite ones, updating ingestion passes to stamp nodes with repository identifiers, and enhancing search functionality with over-fetching and structural reranking to prevent cross-repo result pollution. Additionally, a new agent-authored memory graph system is introduced, allowing for the creation, search, and management of memory nodes and relations. Feedback focuses on improving maintainability by replacing magic numbers with named constants in the reranking logic and ensuring documentation portability by using relative file paths.

gemini-code-assist · 2026-04-08T01:34:10Z

+  - `D:\code\codememory\src\codememory\ingestion\graph.py`
+  - `D:\code\codememory\src\codememory\server\app.py`
+  - `D:\code\codememory\src\codememory\server\tools.py`


The documentation contains absolute local file paths. For better portability and to make it useful for all contributors, these should be replaced with relative paths from the repository root. For example, D:\code\codememory\src\codememory\ingestion\graph.py should be src/codememory/ingestion/graph.py. This applies to other absolute paths in this document as well (lines 86, 560-565).

Suggested change

- `D:\code\codememory\src\codememory\ingestion\graph.py`

- `D:\code\codememory\src\codememory\server\app.py`

- `D:\code\codememory\src\codememory\server\tools.py`

- `src/codememory/ingestion/graph.py`

- `src/codememory/server/app.py`

- `src/codememory/server/tools.py`

gemini-code-assist · 2026-04-08T01:34:10Z

+            calls_out, called_by, methods, file_imports, siblings
        """
+        active_repo = repo_id or self.repo_id
+        overfetch = limit * 3 if active_repo else limit


The overfetch multiplier 3 is a magic number. It would be better to define it as a named constant at the class level (e.g., RERANK_OVERFETCH_MULTIPLIER = 3) for better readability and easier configuration in the future.

Suggested change

overfetch = limit * 3 if active_repo else limit

overfetch = limit * self.RERANK_OVERFETCH_MULTIPLIER if active_repo else limit

gemini-code-assist · 2026-04-08T01:34:10Z

+        for r in results:
+            structural = 0.0
+            if r.get("calls_out"):
+                structural += 0.05  # has outgoing calls
+            if r.get("called_by"):
+                structural += 0.05  # is called by others — more central
+            if r.get("methods"):
+                structural += 0.03  # is a class with methods
+            r["final_score"] = r.get("score", 0.0) * 0.9 + structural
+        results.sort(key=lambda x: x["final_score"], reverse=True)
+        return results[:limit]


The reranking logic uses several magic numbers for calculating the structural bonus and final score (e.g., 0.9, 0.05, 0.03). To improve readability and maintainability, it would be better to define these as named constants at the class level (e.g., RERANK_VECTOR_WEIGHT = 0.9, RERANK_BONUS_CALLS_OUT = 0.05, etc.).

Suggested change

for r in results:

structural = 0.0

if r.get("calls_out"):

structural += 0.05 # has outgoing calls

if r.get("called_by"):

structural += 0.05 # is called by others — more central

if r.get("methods"):

structural += 0.03 # is a class with methods

r["final_score"] = r.get("score", 0.0) * 0.9 + structural

results.sort(key=lambda x: x["final_score"], reverse=True)

return results[:limit]

for r in results:

structural = 0.0

if r.get("calls_out"):

structural += self.RERANK_BONUS_CALLS_OUT

if r.get("called_by"):

structural += self.RERANK_BONUS_CALLED_BY

if r.get("methods"):

structural += self.RERANK_BONUS_METHODS

r["final_score"] = r.get("score", 0.0) * self.RERANK_VECTOR_WEIGHT + structural

results.sort(key=lambda x: x["final_score"], reverse=True)

return results[:limit]

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dd83129bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T01:37:01Z

+            entity_names: List[str] = []
+            with self.driver.session() as session:
+                for entity in normalized_entities:
+                    embedding = self._get_memory_embedding_or_none(entity["embedding_text"])


Preserve embeddings when observations are omitted

When create_memory_entities() updates an existing node with observations omitted/empty, the ON MATCH branch intentionally keeps prior m.observations and m.observation_text, but this line still computes a new embedding from the empty payload and then overwrites m.embedding. In that metadata-only update path, vector search will rank the entity using an embedding that no longer matches its stored observations, causing silent relevance regressions until a manual backfill is run.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR hardens CodeMemory for multi-repository usage within a single Neo4j database by introducing repo_id partitioning, repo-scoped constraints, and repo-filtered retrieval (including graph-enriched semantic search and new memory CRUD/search MCP tools).

Changes:

Add repo_id scoping across ingestion passes, composite uniqueness constraints, and repo-filtered code search with over-fetch + reranking.
Introduce memory graph CRUD/search/backfill capabilities exposed via MCP tools and support them with Neo4j schema (constraints/indexes).
Update CLI/status queries, tests, and documentation to propagate/describe repo scoping and new tools.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_server.py	Expands semantic search formatting expectations and adds MCP memory tool tests.
tests/test_graph.py	Adds repo-scoped builder fixture and tests for repo-filtering + memory graph operations.
tests/test_cli.py	Verifies repo-scoped `status` queries and ensures repo_root is passed into builder in CLI commands.
src/codememory/server/tools.py	Enhances semantic search output to include graph context fields and adds memory method passthroughs.
src/codememory/server/app.py	Passes `repo_root` into graph initialization, initializes memory schema, and adds MCP memory tools + formatting helpers.
src/codememory/ingestion/graph.py	Core multi-repo partitioning: `repo_id`, composite constraints, repo-scoped ingestion passes, graph-enriched semantic search + rerank, and full memory CRUD/search/backfill implementation.
src/codememory/cli.py	Makes `status/search/deps/impact` propagate `repo_root` and scopes `status` counts by `repo_id`.
README.md	Updates git command examples and lists new memory MCP tools.
docs/TROUBLESHOOTING.md	Updates git command troubleshooting steps to match new flags.
docs/plans/ultraplan-4.5.26.md	Adds a detailed implementation plan for repo partitioning/migration and retrieval changes.
docs/MULTI_CODEBASE_RESEARCH.md	Adds research notes on multi-codebase GraphRAG approaches and repo partitioning rationale.
docs/MULTI_CODEBASE_RESEARCH_REPORT.md	Adds distilled research report with recommendations and validated environment constraints.
docs/MCP_INTEGRATION.md	Updates git-sync example usage.
docs/GIT_GRAPH.md	Aligns git-init/git-sync examples and option lists with current CLI.
docs/FIELD_TEST_TEMPLATE.md	Updates field-test checklist to new git-sync workflow/flags.
docs/ARCHITECTURE.md	Documents repo_id partitioning identity model and retrieval behavior.
docs/API.md	Updates API docs for `semantic_search()` repo filtering, reranking, and memory search behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T01:37:17Z

+    def setup_memory_schema(self):
+        """Create constraints and indexes for agent-authored memory entities."""
+        if self.repo_id:
+            name_constraint = (
+                "DROP CONSTRAINT memory_name_unique IF EXISTS",
+                "CREATE CONSTRAINT memory_repo_name_unique IF NOT EXISTS "
+                f"FOR (m:{self.MEMORY_LABEL}) REQUIRE (m.repo_id, m.name) IS UNIQUE",
+            )
+        else:
+            name_constraint = (
+                "CREATE CONSTRAINT memory_name_unique IF NOT EXISTS "
+                f"FOR (m:{self.MEMORY_LABEL}) REQUIRE m.name IS UNIQUE",
+            )


When running with repo_id enabled, setup_memory_schema() drops the legacy unique constraint and creates the composite (repo_id, name) constraint, but it never backfills repo_id onto existing :Memory nodes. This can leave pre-migration Memory nodes unreachable by repo-scoped reads (search_memory_nodes/read_memory_graph filter by repo_id) and can also allow duplicate names to coexist (old node without repo_id + new node with repo_id). Consider invoking the existing _backfill_repo_id() (or a memory-only backfill) before applying the composite constraint when repo_id is set.

Copilot · 2026-04-08T01:37:17Z

+        "score": 0.92,        # raw vector similarity (0–1)
+        "final_score": 0.94,  # 0.9×vector_score + structural_bonus
        "text": "def authenticate(username, password):..."
    },
    ...
 ]
 ```

+- `final_score` is always present when `repo_id` filtering is active (via `_rerank_results()`).
+


The documentation claims final_score is only present when repo_id filtering is active, and that score is always raw vector similarity (0–1). In the implementation (KnowledgeGraphBuilder._rerank_results), final_score is added for every result (including when repo_id is not set), and when the vector path falls back to fulltext search the returned score comes from Neo4j fulltext scoring (not guaranteed to be 0–1). Please update the contract here to match actual behavior, or adjust the code to only add final_score / only rerank when repo filtering is active.

Copilot · 2026-04-08T01:37:17Z

 """Tests for the KnowledgeGraphBuilder module."""

-import pytest
+from pathlib import Path
 from unittest.mock import Mock, patch

+import pytest
+


Path is imported from pathlib but not used anywhere in this test module. Removing the unused import will keep the test file clean and avoid lint failures if unused-import checks are enabled.

jarmen423 added 2 commits April 7, 2026 21:02

Add native memory graph tooling and research notes

7f37d3f

Harden multi-repo repo_id scoping

4dd8312

Copilot AI review requested due to automatic review settings April 8, 2026 01:30

Copilot started reviewing on behalf of jarmen423 April 8, 2026 01:30 View session

jarmen423 merged commit e12bfe9 into main Apr 8, 2026
3 of 7 checks passed

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi repo hardening#1

Multi repo hardening#1
jarmen423 merged 2 commits into
mainfrom
multi-repo-hardening

jarmen423 commented Apr 8, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	overfetch = limit * 3 if active_repo else limit
	overfetch = limit * self.RERANK_OVERFETCH_MULTIPLIER if active_repo else limit

Conversation

jarmen423 commented Apr 8, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants