Skip to content

fix: make entity_registry.research() local-only by default#811

Merged
bensig merged 2 commits intoMemPalace:developfrom
Kesshite:fix/security-wikipedia-ssrf
Apr 15, 2026
Merged

fix: make entity_registry.research() local-only by default#811
bensig merged 2 commits intoMemPalace:developfrom
Kesshite:fix/security-wikipedia-ssrf

Conversation

@Kesshite
Copy link
Copy Markdown
Contributor

Summary

  • research() previously called _wikipedia_lookup() unconditionally, sending entity names to en.wikipedia.org on every uncached lookup — violating the project's local-first and privacy by architecture principles (CLAUDE.md)
  • research() is now local-only by default: returns "unknown" for uncached words without any network call
  • New allow_network=True parameter required to explicitly opt in to Wikipedia lookups
  • Wikipedia 404 now returns "unknown" (confidence 0.3) instead of asserting "person" (confidence 0.70), preventing entity registry poisoning with false positives

Refs: #809 (Finding 1)

What changed

mempalace/entity_registry.py

  • research() — added allow_network: bool = False parameter; uncached words return "unknown" immediately when allow_network is False
  • _wikipedia_lookup() — 404 handler now returns inferred_type: "unknown" instead of "person"; added privacy warning docstring

tests/test_entity_registry.py

  • test_research_local_only_by_default — verifies no network call happens by default
  • test_research_with_allow_network — verifies Wikipedia is called when opted in
  • test_research_local_only_not_cached — verifies local-only results are not persisted to wiki_cache
  • test_wikipedia_404_returns_unknown — verifies 404 returns "unknown", not "person"
  • Updated existing tests to pass allow_network=True where they intentionally test the Wikipedia path

Test plan

  • pytest tests/test_entity_registry.py -v — 29/29 passed
  • pytest tests/ -v --ignore=tests/benchmarks — 689 passed, 3 failed (pre-existing: version mismatch + KG re-add bug)
  • ruff check — all checks passed
  • ruff format --check — all files formatted
  • No new dependencies added
  • Verify no other module calls research() without allow_network (confirmed: only tests call it, and they already mock _wikipedia_lookup)

research() previously called _wikipedia_lookup() unconditionally,
sending entity names to en.wikipedia.org on every uncached lookup.
This violates the project's local-first and privacy-by-architecture
principles documented in CLAUDE.md.

Changes:
- research() now returns "unknown" for uncached words by default
- New allow_network=True parameter required for Wikipedia lookups
- Wikipedia 404 now returns "unknown" instead of asserting "person"
  with 0.70 confidence, preventing entity registry poisoning
- Added privacy warning docstring to _wikipedia_lookup()
- Added tests for local-only default, opt-in network, 404 handling,
  and cache-not-persisted-on-local-only behaviour

Refs: MemPalace#809
- Use .get() instead of .setdefault() for cache reads in research()
  so the local-only path never mutates _data unnecessarily
- Move .setdefault() to the network-write path only
- Use result.setdefault() for word/confirmed keys to ensure
  consistent return shape across all _wikipedia_lookup error paths
- Extract duplicated mock_result dict into _MOCK_SAOIRSE_PERSON
  constant shared by 3 test functions
@Kesshite Kesshite force-pushed the fix/security-wikipedia-ssrf branch from 53da579 to 01ead58 Compare April 14, 2026 11:03
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code reviewed — no issues found. CLAUDE.md compliance verified.

@bensig bensig merged commit f20f45a into MemPalace:develop Apr 15, 2026
6 checks passed
igorls added a commit that referenced this pull request Apr 16, 2026
Advisor caught: initial boundary (962776c..develop) skipped PRs that
landed on develop after v3.3.0 tag but before the sync-back merge.
Adds entries for #871 MEMPAL_VERBOSE, #811 research() local-only
default, #866 init .gitignore, #864 MCP stdout redirect, #863
precompact hook, #865 searcher empty results, #831 cold-start palace,
#862 init help, #815 Slack provenance, #840 save hook auto-mine.
Also drops the awkward caveat on #846 created_at — it's post-v3.3.0.
@igorls igorls mentioned this pull request Apr 16, 2026
8 tasks
shafdev pushed a commit to shafdev/mempalace that referenced this pull request Apr 17, 2026
Advisor caught: initial boundary (962776c..develop) skipped PRs that
landed on develop after v3.3.0 tag but before the sync-back merge.
Adds entries for MemPalace#871 MEMPAL_VERBOSE, MemPalace#811 research() local-only
default, MemPalace#866 init .gitignore, MemPalace#864 MCP stdout redirect, MemPalace#863
precompact hook, MemPalace#865 searcher empty results, MemPalace#831 cold-start palace,
MemPalace#862 init help, MemPalace#815 Slack provenance, MemPalace#840 save hook auto-mine.
Also drops the awkward caveat on MemPalace#846 created_at — it's post-v3.3.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kg Knowledge graph bug Something isn't working security Security related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants