Skip to content

Multi repo hardening#1

Merged
jarmen423 merged 2 commits into
mainfrom
multi-repo-hardening
Apr 8, 2026
Merged

Multi repo hardening#1
jarmen423 merged 2 commits into
mainfrom
multi-repo-hardening

Conversation

@jarmen423
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings April 8, 2026 01:30
@jarmen423 jarmen423 merged commit e12bfe9 into main Apr 8, 2026
3 of 7 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements multi-repository support for CodeMemory by introducing repo_id partitioning within a shared Neo4j database. Key changes include migrating global uniqueness constraints to composite ones, updating ingestion passes to stamp nodes with repository identifiers, and enhancing search functionality with over-fetching and structural reranking to prevent cross-repo result pollution. Additionally, a new agent-authored memory graph system is introduced, allowing for the creation, search, and management of memory nodes and relations. Feedback focuses on improving maintainability by replacing magic numbers with named constants in the reranking logic and ensuring documentation portability by using relative file paths.

Comment on lines +35 to +37
- `D:\code\codememory\src\codememory\ingestion\graph.py`
- `D:\code\codememory\src\codememory\server\app.py`
- `D:\code\codememory\src\codememory\server\tools.py`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation contains absolute local file paths. For better portability and to make it useful for all contributors, these should be replaced with relative paths from the repository root. For example, D:\code\codememory\src\codememory\ingestion\graph.py should be src/codememory/ingestion/graph.py. This applies to other absolute paths in this document as well (lines 86, 560-565).

Suggested change
- `D:\code\codememory\src\codememory\ingestion\graph.py`
- `D:\code\codememory\src\codememory\server\app.py`
- `D:\code\codememory\src\codememory\server\tools.py`
- `src/codememory/ingestion/graph.py`
- `src/codememory/server/app.py`
- `src/codememory/server/tools.py`

calls_out, called_by, methods, file_imports, siblings
"""
active_repo = repo_id or self.repo_id
overfetch = limit * 3 if active_repo else limit
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The overfetch multiplier 3 is a magic number. It would be better to define it as a named constant at the class level (e.g., RERANK_OVERFETCH_MULTIPLIER = 3) for better readability and easier configuration in the future.

Suggested change
overfetch = limit * 3 if active_repo else limit
overfetch = limit * self.RERANK_OVERFETCH_MULTIPLIER if active_repo else limit

Comment on lines +1478 to +1488
for r in results:
structural = 0.0
if r.get("calls_out"):
structural += 0.05 # has outgoing calls
if r.get("called_by"):
structural += 0.05 # is called by others — more central
if r.get("methods"):
structural += 0.03 # is a class with methods
r["final_score"] = r.get("score", 0.0) * 0.9 + structural
results.sort(key=lambda x: x["final_score"], reverse=True)
return results[:limit]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reranking logic uses several magic numbers for calculating the structural bonus and final score (e.g., 0.9, 0.05, 0.03). To improve readability and maintainability, it would be better to define these as named constants at the class level (e.g., RERANK_VECTOR_WEIGHT = 0.9, RERANK_BONUS_CALLS_OUT = 0.05, etc.).

Suggested change
for r in results:
structural = 0.0
if r.get("calls_out"):
structural += 0.05 # has outgoing calls
if r.get("called_by"):
structural += 0.05 # is called by others — more central
if r.get("methods"):
structural += 0.03 # is a class with methods
r["final_score"] = r.get("score", 0.0) * 0.9 + structural
results.sort(key=lambda x: x["final_score"], reverse=True)
return results[:limit]
for r in results:
structural = 0.0
if r.get("calls_out"):
structural += self.RERANK_BONUS_CALLS_OUT
if r.get("called_by"):
structural += self.RERANK_BONUS_CALLED_BY
if r.get("methods"):
structural += self.RERANK_BONUS_METHODS
r["final_score"] = r.get("score", 0.0) * self.RERANK_VECTOR_WEIGHT + structural
results.sort(key=lambda x: x["final_score"], reverse=True)
return results[:limit]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dd83129bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

entity_names: List[str] = []
with self.driver.session() as session:
for entity in normalized_entities:
embedding = self._get_memory_embedding_or_none(entity["embedding_text"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve embeddings when observations are omitted

When create_memory_entities() updates an existing node with observations omitted/empty, the ON MATCH branch intentionally keeps prior m.observations and m.observation_text, but this line still computes a new embedding from the empty payload and then overwrites m.embedding. In that metadata-only update path, vector search will rank the entity using an embedding that no longer matches its stored observations, causing silent relevance regressions until a manual backfill is run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens CodeMemory for multi-repository usage within a single Neo4j database by introducing repo_id partitioning, repo-scoped constraints, and repo-filtered retrieval (including graph-enriched semantic search and new memory CRUD/search MCP tools).

Changes:

  • Add repo_id scoping across ingestion passes, composite uniqueness constraints, and repo-filtered code search with over-fetch + reranking.
  • Introduce memory graph CRUD/search/backfill capabilities exposed via MCP tools and support them with Neo4j schema (constraints/indexes).
  • Update CLI/status queries, tests, and documentation to propagate/describe repo scoping and new tools.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_server.py Expands semantic search formatting expectations and adds MCP memory tool tests.
tests/test_graph.py Adds repo-scoped builder fixture and tests for repo-filtering + memory graph operations.
tests/test_cli.py Verifies repo-scoped status queries and ensures repo_root is passed into builder in CLI commands.
src/codememory/server/tools.py Enhances semantic search output to include graph context fields and adds memory method passthroughs.
src/codememory/server/app.py Passes repo_root into graph initialization, initializes memory schema, and adds MCP memory tools + formatting helpers.
src/codememory/ingestion/graph.py Core multi-repo partitioning: repo_id, composite constraints, repo-scoped ingestion passes, graph-enriched semantic search + rerank, and full memory CRUD/search/backfill implementation.
src/codememory/cli.py Makes status/search/deps/impact propagate repo_root and scopes status counts by repo_id.
README.md Updates git command examples and lists new memory MCP tools.
docs/TROUBLESHOOTING.md Updates git command troubleshooting steps to match new flags.
docs/plans/ultraplan-4.5.26.md Adds a detailed implementation plan for repo partitioning/migration and retrieval changes.
docs/MULTI_CODEBASE_RESEARCH.md Adds research notes on multi-codebase GraphRAG approaches and repo partitioning rationale.
docs/MULTI_CODEBASE_RESEARCH_REPORT.md Adds distilled research report with recommendations and validated environment constraints.
docs/MCP_INTEGRATION.md Updates git-sync example usage.
docs/GIT_GRAPH.md Aligns git-init/git-sync examples and option lists with current CLI.
docs/FIELD_TEST_TEMPLATE.md Updates field-test checklist to new git-sync workflow/flags.
docs/ARCHITECTURE.md Documents repo_id partitioning identity model and retrieval behavior.
docs/API.md Updates API docs for semantic_search() repo filtering, reranking, and memory search behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +525 to +537
def setup_memory_schema(self):
"""Create constraints and indexes for agent-authored memory entities."""
if self.repo_id:
name_constraint = (
"DROP CONSTRAINT memory_name_unique IF EXISTS",
"CREATE CONSTRAINT memory_repo_name_unique IF NOT EXISTS "
f"FOR (m:{self.MEMORY_LABEL}) REQUIRE (m.repo_id, m.name) IS UNIQUE",
)
else:
name_constraint = (
"CREATE CONSTRAINT memory_name_unique IF NOT EXISTS "
f"FOR (m:{self.MEMORY_LABEL}) REQUIRE m.name IS UNIQUE",
)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running with repo_id enabled, setup_memory_schema() drops the legacy unique constraint and creates the composite (repo_id, name) constraint, but it never backfills repo_id onto existing :Memory nodes. This can leave pre-migration Memory nodes unreachable by repo-scoped reads (search_memory_nodes/read_memory_graph filter by repo_id) and can also allow duplicate names to coexist (old node without repo_id + new node with repo_id). Consider invoking the existing _backfill_repo_id() (or a memory-only backfill) before applying the composite constraint when repo_id is set.

Copilot uses AI. Check for mistakes.
Comment thread docs/API.md
Comment on lines +878 to +887
"score": 0.92, # raw vector similarity (0–1)
"final_score": 0.94, # 0.9×vector_score + structural_bonus
"text": "def authenticate(username, password):..."
},
...
]
```

- `final_score` is always present when `repo_id` filtering is active (via `_rerank_results()`).

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation claims final_score is only present when repo_id filtering is active, and that score is always raw vector similarity (0–1). In the implementation (KnowledgeGraphBuilder._rerank_results), final_score is added for every result (including when repo_id is not set), and when the vector path falls back to fulltext search the returned score comes from Neo4j fulltext scoring (not guaranteed to be 0–1). Please update the contract here to match actual behavior, or adjust the code to only add final_score / only rerank when repo filtering is active.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_graph.py
Comment on lines 1 to +7
"""Tests for the KnowledgeGraphBuilder module."""

import pytest
from pathlib import Path
from unittest.mock import Mock, patch

import pytest

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path is imported from pathlib but not used anywhere in this test module. Removing the unused import will keep the test file clean and avoid lint failures if unused-import checks are enabled.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants