From 52ac7c4d8f9c3045abe4c759d4d0b590e01deba8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Victor=20Gonz=C3=A1lez-Pacheco?= <victor@fitizens.io>
Date: Sat, 23 May 2026 10:42:43 +0200
Subject: [PATCH 1/2] [#24] narrow bare except in ApiExecutor + topic_synth;
 doc xbrain diff in ARCHITECTURE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes #24. Two changes in one PR:

1) ARCHITECTURE.md — new "Snapshot diffing" section explaining xbrain diff
(#18) end-to-end. The module was listed in "Where things live" but had no
narrative. Now covers: what the report contains (Items / Topics / Vocab /
Summary), the pure-Python TF cosine choice and rationale (no new deps,
IDF degenerates on N=2), the 5-member growth floor, JSON vs text output,
and how diff is the foundation for drift monitoring + WS3 eval (#8).

2) ApiExecutor.enrich_items + synthesize_overviews_api — narrow the
catch-everything except, count failures, raise on total-failure batches.

The pre-fix behaviour:
- `except Exception as exc:` swallowed every exception including
  AttributeError, KeyboardInterrupt, anthropic.AuthenticationError, etc.
- A 100%-broken run printed N warn-lines and exited 0. The CLI's
  downstream commands (xbrain topics, xbrain generate) silently consumed
  the empty result. The user got a stale wiki without a signal.

The fix:
- New `_recoverable_errors()` tuple per module:
  (anthropic.APIError, ValueError, json.JSONDecodeError, KeyError).
  `pydantic.ValidationError` is a ValueError subclass, covered.
  Lazy-imported because `anthropic` is optional in some test setups.
- Per-item / per-topic failure counter.
- `RuntimeError` raised when every input failed (`successes == 0 and
  failures > 0`) — CLI's _handle_cli_errors catches RuntimeError →
  clean exit-1.
- Partial-failure summary line printed to stderr (`enriched: N, failed: M`
  / `synthesized: N, failed: M`).
- Programmer bugs (AttributeError, ...) and KeyboardInterrupt now
  propagate — Ctrl-C works again during development, tracebacks surface.

Tests (4 new):
- `test_api_executor_skips_item_on_api_failure` (updated): uses
  anthropic.APIError + asserts the partial-failure summary line.
- `test_api_executor_raises_when_all_items_fail` (new): every call raises
  APIError → RuntimeError("All N items failed enrichment").
- `test_api_executor_propagates_programmer_bugs` (new): AttributeError
  propagates instead of being swallowed.
- Same three for synthesize_overviews_api in test_topic_synth.py.

Total: 352 tests (up from 348), coverage 89%, `uv run poe check` all-green.

PRD: vault/zz-support-files/docs/prds/2026-05-23-xbrain-24-narrow-bare-except.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 ARCHITECTURE.md             | 21 +++++++++++++++
 src/xbrain/executors/api.py | 37 +++++++++++++++++++++++--
 src/xbrain/topic_synth.py   | 45 ++++++++++++++++++++++++++++---
 tests/test_executors_api.py | 43 +++++++++++++++++++++++++++--
 tests/test_topic_synth.py   | 54 +++++++++++++++++++++++++++++++++++--
 5 files changed, 190 insertions(+), 10 deletions(-)
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
index b62164d..12ea400 100644
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -383,6 +383,27 @@ The same executor model is used by `vocab` (with [`vocab.py`](src/xbrain/vocab.p
 
 ---
 
+## Snapshot diffing
+
+`xbrain diff <snap-a> [snap-b]` (default `snap-b` = live `data/`) compares two snapshot data directories and answers one question: **what moved between these two states?** Built on top of the snapshot lifecycle from issue #17 — without snapshots there is nothing to diff.
+
+The module ([`src/xbrain/diff.py`](src/xbrain/diff.py)) is a **pure orchestrator**: the only I/O is the three loader calls at entry (`load_store`, `load_vocab`, `load_topic_pages`); everything else is in-memory pydantic dataclasses. The CLI is the only thing that touches `typer.echo` — `diff_snapshots` returns a `DiffReport` and lets the caller render it.
+
+The report has four sections, each pinning a different axis of change:
+
+- **Items** — how many items were reassigned (`primary_topic` differs between A and B, both sides enriched), top N most-frequent transitions (`ai-coding → software-engineering: 12 items`), `None → topic` rows when an item gained enrichment between the two snapshots.
+- **Topics** — per-topic membership delta (added / removed / unchanged item ids), plus an overview-drift classification (`identical` / `similar` / `different` / `not_comparable`) using a pure-Python TF cosine over the two topic-page overview texts.
+- **Vocab** — slugs added, slugs removed, count of unchanged slugs. Rename detection is out of scope for v1 (a `delta` of `+1` added and `+1` removed is the user's cue).
+- **Summary** — top-level counts (items in both, enriched in both, reassigned, reassigned_pct, vocab churn, topic-page counts) — same fields the JSON-format consumers anchor on.
+
+**Pure-Python TF cosine**, not embeddings, not TF-IDF. Two reasons: (1) zero new dependencies (no scikit-learn, no sentence-transformers); (2) IDF degenerates on N=2 documents anyway, so plain TF gives the same `identical / similar / different` bucketing without the noise. The tokenizer covers Latin-1 accented characters (`à-ÿ`) so Spanish / French overviews compare correctly. Topics with fewer than 5 members never trigger a growth flag — a 2→3 jump is 50% growth but statistically meaningless on a tiny topic.
+
+**Output:** `--format text` (default, human-readable section blocks) or `--format json` (pydantic `model_dump_json`, stable schema for downstream consumers — the WS3 eval harness in issue #8 will read this).
+
+`xbrain diff` is also the foundation for **drift monitoring** between runs: take a snapshot, re-enrich, diff. A jump in `reassigned_pct` on a small corpus change is a signal that the prompt or model output is unstable; that is the eval-by-comparison question WS3 will formalise.
+
+---
+
 ## Invariants
 
 These are the rules the rest of the architecture rests on. Breaking any of them produces silent data corruption or makes the system unreproducible.
diff --git a/src/xbrain/executors/api.py b/src/xbrain/executors/api.py
index c6175e8..3936edc 100644
--- a/src/xbrain/executors/api.py
+++ b/src/xbrain/executors/api.py
@@ -8,6 +8,7 @@
 
 from __future__ import annotations
 
+import json
 import sys
 
 from xbrain.executors.base import EnrichmentJudgment
@@ -18,6 +19,26 @@
 _MAX_TOKENS = 600
 
 
+def _recoverable_errors() -> tuple[type[Exception], ...]:
+    """Exception classes a per-item failure should swallow + log + continue on.
+
+    `anthropic.APIError` covers auth, rate-limit, server-side and network
+    errors the SDK normalises. `ValueError` covers validator rejections and
+    `pydantic.ValidationError` (a `ValueError` subclass in pydantic v2).
+    `json.JSONDecodeError` covers a malformed LLM response. `KeyError` covers
+    a response missing an expected field.
+
+    Lazy-imported because `anthropic` is an optional dependency in the test
+    environment (the client is faked).
+    """
+    try:
+        from anthropic import APIError
+
+        return (APIError, ValueError, json.JSONDecodeError, KeyError)
+    except ImportError:
+        return (ValueError, json.JSONDecodeError, KeyError)
+
+
 def _vocab_block(vocab: list[Topic]) -> str:
     return "\n".join(f"- {t.slug}: {t.description}" for t in vocab)
 
@@ -82,7 +103,9 @@ def __init__(self, model: str, output_language: str, client=None):
 
     def enrich_items(self, items: list[Item], vocab: list[Topic]) -> list[EnrichmentJudgment]:
         system = _system_prompt(self._output_language)
+        recoverable = _recoverable_errors()
         results: list[EnrichmentJudgment] = []
+        failures = 0
         for item in items:
             try:
                 response = self._client.messages.create(
@@ -105,12 +128,22 @@ def enrich_items(self, items: list[Item], vocab: list[Topic]) -> list[Enrichment
                         topics=list(judgment["topics"]),
                     )
                 )
-            except Exception as exc:  # noqa: BLE001
+            except recoverable as exc:
                 # One transient/malformed response must not abort the batch:
-                # the item stays pending and is retried on the next run.
+                # the item stays pending and is retried on the next run. Note:
+                # programmer bugs (`AttributeError`, …) and `KeyboardInterrupt`
+                # are NOT in `recoverable` — they propagate so the developer
+                # sees the traceback and Ctrl-C still works.
+                failures += 1
                 print(
                     f"warn: enrichment failed for item {item.id}: {exc}",
                     file=sys.stderr,
                 )
                 continue
+        if items and not results and failures > 0:
+            raise RuntimeError(
+                f"All {failures} items failed enrichment; see warnings above for details."
+            )
+        if failures > 0:
+            print(f"enriched: {len(results)}, failed: {failures}", file=sys.stderr)
         return results
diff --git a/src/xbrain/topic_synth.py b/src/xbrain/topic_synth.py
index 0546ec2..2c68fec 100644
--- a/src/xbrain/topic_synth.py
+++ b/src/xbrain/topic_synth.py
@@ -77,19 +77,49 @@ def _judgment_from_response(response, slug: str) -> OverviewJudgment:
     )
 
 
+def _recoverable_errors() -> tuple[type[Exception], ...]:
+    """Exception classes a per-topic failure should swallow + log + continue on.
+
+    Mirrors `xbrain.executors.api._recoverable_errors`. See that helper for
+    rationale. `anthropic.APIError` is lazy-imported because `anthropic` is
+    optional in the test environment (the client is faked).
+    """
+    try:
+        import json as _json
+        from anthropic import APIError
+
+        return (APIError, ValueError, _json.JSONDecodeError, KeyError)
+    except ImportError:
+        import json as _json
+
+        return (ValueError, _json.JSONDecodeError, KeyError)
+
+
 def synthesize_overviews_api(
     inputs: list[TopicInput],
     model: str,
     output_language: str,
     client=None,
 ) -> list[OverviewJudgment]:
-    """Synthesize topic overviews via the Anthropic API — one call per topic."""
+    """Synthesize topic overviews via the Anthropic API — one call per topic.
+
+    A `RuntimeError` is raised when EVERY input topic fails — exposed as a
+    non-zero exit through the CLI's `_handle_cli_errors` wrapper. Partial
+    success (some topics synthesise, some fail) returns the successes and
+    prints a summary line to stderr.
+
+    Programmer bugs (`AttributeError`, ...) and `KeyboardInterrupt` are not
+    in the recoverable set — they propagate so the developer sees the
+    traceback and Ctrl-C still works.
+    """
     if client is None:
         from anthropic import Anthropic  # lazy: tests inject a fake
 
         client = Anthropic()  # reads ANTHROPIC_API_KEY from the environment
     system = _system_prompt(output_language)
+    recoverable = _recoverable_errors()
     results: list[OverviewJudgment] = []
+    failures = 0
     for topic in inputs:
         try:
             response = client.messages.create(
@@ -99,14 +129,21 @@ def synthesize_overviews_api(
                 messages=[{"role": "user", "content": _user_prompt(topic)}],
             )
             results.append(_judgment_from_response(response, topic.slug))
-        except Exception as exc:  # noqa: BLE001
-            # One failed topic must not abort the batch — it stays unsynthesized
-            # and is retried on the next run.
+        except recoverable as exc:
+            # One failed topic must not abort the batch — it stays
+            # unsynthesized and is retried on the next run.
+            failures += 1
             print(
                 f"warn: topic synthesis failed for {topic.slug}: {exc}",
                 file=sys.stderr,
             )
             continue
+    if inputs and not results and failures > 0:
+        raise RuntimeError(
+            f"All {failures} topics failed synthesis; see warnings above for details."
+        )
+    if failures > 0:
+        print(f"synthesized: {len(results)}, failed: {failures}", file=sys.stderr)
     return results
 
 
diff --git a/tests/test_executors_api.py b/tests/test_executors_api.py
index 63c8f31..c090994 100644
--- a/tests/test_executors_api.py
+++ b/tests/test_executors_api.py
@@ -101,13 +101,52 @@ def test_api_executor_skips_wrong_shape_response(capsys):
 
 def test_api_executor_skips_item_on_api_failure(capsys):
     # A transient API failure on one item must not abort the whole batch.
+    from anthropic import APIError
+
     client = FakeAnthropic(
         [
-            RuntimeError("503 service unavailable"),
+            APIError("503 service unavailable", request=None, body=None),
             {"summary": "r", "primary_topic": "misc", "topics": ["misc"]},
         ]
     )
     ex = ApiExecutor(model="m", output_language="English", client=client)
     out = ex.enrich_items([_item("1"), _item("2")], VOCAB)
     assert {j.item_id for j in out} == {"2"}
-    assert "enrichment failed for item 1" in capsys.readouterr().err
+    captured = capsys.readouterr().err
+    assert "enrichment failed for item 1" in captured
+    # Partial-failure summary line is visible on stderr
+    assert "enriched: 1, failed: 1" in captured
+
+
+def test_api_executor_raises_when_all_items_fail():
+    """An API key revocation / total outage must surface as non-zero exit, not
+    silent empty result. The CLI's _handle_cli_errors catches RuntimeError."""
+    import pytest
+    from anthropic import APIError
+
+    client = FakeAnthropic(
+        [
+            APIError("401 unauthorized", request=None, body=None),
+            APIError("401 unauthorized", request=None, body=None),
+        ]
+    )
+    ex = ApiExecutor(model="m", output_language="English", client=client)
+    with pytest.raises(RuntimeError, match="All 2 items failed enrichment"):
+        ex.enrich_items([_item("1"), _item("2")], VOCAB)
+
+
+def test_api_executor_propagates_programmer_bugs():
+    """`AttributeError` and other non-recoverable exceptions must NOT be
+    swallowed — the developer needs to see the traceback.
+    """
+    import pytest
+
+    class _Boom:
+        class messages:  # noqa: N801
+            @staticmethod
+            def create(*_args, **_kwargs):
+                raise AttributeError("programmer bug — undefined attribute")
+
+    ex = ApiExecutor(model="m", output_language="English", client=_Boom())
+    with pytest.raises(AttributeError, match="programmer bug"):
+        ex.enrich_items([_item("1")], VOCAB)
diff --git a/tests/test_topic_synth.py b/tests/test_topic_synth.py
index d2e0a40..4cc102f 100644
--- a/tests/test_topic_synth.py
+++ b/tests/test_topic_synth.py
@@ -54,14 +54,64 @@ def test_synthesize_overviews_api_skips_an_invalid_judgment():
     assert [r.slug for r in results] == ["good"]
 
 
-def test_synthesize_overviews_api_isolates_an_api_error():
-    client = FakeAnthropic([RuntimeError("API caída"), {"overview": "ok", "notes": []}])
+def test_synthesize_overviews_api_isolates_an_api_error(capsys):
+    from anthropic import APIError
+
+    client = FakeAnthropic(
+        [
+            APIError("API caída", request=None, body=None),
+            {"overview": "ok", "notes": []},
+        ]
+    )
     inputs = [
         TopicInput(slug="first", description="d", summaries=["s"]),
         TopicInput(slug="second", description="d", summaries=["s"]),
     ]
     results = synthesize_overviews_api(inputs, model="m", output_language="English", client=client)
     assert [r.slug for r in results] == ["second"]
+    # Partial-failure summary line is visible on stderr
+    assert "synthesized: 1, failed: 1" in capsys.readouterr().err
+
+
+def test_synthesize_overviews_api_raises_when_all_topics_fail():
+    """Total failure (every API call raises) must surface non-zero exit, not
+    silent empty result. The CLI's _handle_cli_errors catches RuntimeError."""
+    import pytest
+    from anthropic import APIError
+
+    client = FakeAnthropic(
+        [
+            APIError("401 unauthorized", request=None, body=None),
+            APIError("401 unauthorized", request=None, body=None),
+        ]
+    )
+    inputs = [
+        TopicInput(slug="a", description="d", summaries=["s"]),
+        TopicInput(slug="b", description="d", summaries=["s"]),
+    ]
+    with pytest.raises(RuntimeError, match="All 2 topics failed synthesis"):
+        synthesize_overviews_api(inputs, model="m", output_language="English", client=client)
+
+
+def test_synthesize_overviews_api_propagates_programmer_bugs():
+    """`AttributeError` and other non-recoverable exceptions must NOT be
+    swallowed — the developer needs to see the traceback.
+    """
+    import pytest
+
+    class _Boom:
+        class messages:  # noqa: N801
+            @staticmethod
+            def create(*_args, **_kwargs):
+                raise AttributeError("programmer bug — undefined attribute")
+
+    with pytest.raises(AttributeError, match="programmer bug"):
+        synthesize_overviews_api(
+            [TopicInput(slug="x", description="d", summaries=["s"])],
+            model="m",
+            output_language="English",
+            client=_Boom(),
+        )
 
 
 def test_overview_judgment_is_a_model():

From d19b171abf22940497af17a126de197af57009fb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Victor=20Gonz=C3=A1lez-Pacheco?= <victor@fitizens.io>
Date: Sat, 23 May 2026 10:45:37 +0200
Subject: [PATCH 2/2] [#24] address review pipeline: SUMMARY prefix + 6 missing
 tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Addresses the actionable findings from PR #31's 3-reviewer panel
(code-reviewer SHIP IT; silent-failure-hunter + test-analyzer
flagged improvements):

silent-failure-hunter M2 — partial-failure summary line was
indistinguishable from the per-item `warn:` lines that precede it. A
1000-item batch with 50 failures buried the single summary at the end
of 50 noise lines.
- Prefix with `SUMMARY:` so the line stands out. Trivial.
- Existing partial-failure asserts updated accordingly.

test-analyzer gap #1 (rating 7) — KeyboardInterrupt propagation was
implicitly correct (KeyboardInterrupt is a BaseException, not Exception,
so the narrow tuple naturally excludes it) but no regression test
pinned it. A future refactor that widens the catch to BaseException
would silently break Ctrl-C.
- New `test_*_propagates_keyboard_interrupt` for both modules.

test-analyzer improvement #2 (rating 5) — no negative assert that the
all-failed branch does NOT print the summary line (it raises first; a
future refactor that moves the print above the raise would slip).
- New `test_*_emits_no_summary_on_total_failure` for both modules.

test-analyzer improvement #3 (rating 4) — no negative assert that the
all-succeed path stays silent on stderr.
- New `test_*_emits_no_summary_when_all_succeed` for both modules.

Skipped: silent-failure M1 (`except ImportError` opacity for a
misinstalled anthropic SDK). The reviewer themselves noted it is
"actually safe (loud failure)" since downstream code would propagate
the SDK error as a bug rather than catch it silently. Borderline,
non-blocker.

Total: 358 tests (up from 352), coverage 89%, `uv run poe check`
all-green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 src/xbrain/executors/api.py |  7 ++++-
 src/xbrain/topic_synth.py   |  7 ++++-
 tests/test_executors_api.py | 46 +++++++++++++++++++++++++++++++-
 tests/test_topic_synth.py   | 52 ++++++++++++++++++++++++++++++++++++-
 4 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/src/xbrain/executors/api.py b/src/xbrain/executors/api.py
index 3936edc..2b18138 100644
--- a/src/xbrain/executors/api.py
+++ b/src/xbrain/executors/api.py
@@ -145,5 +145,10 @@ def enrich_items(self, items: list[Item], vocab: list[Topic]) -> list[Enrichment
                 f"All {failures} items failed enrichment; see warnings above for details."
             )
         if failures > 0:
-            print(f"enriched: {len(results)}, failed: {failures}", file=sys.stderr)
+            # SUMMARY prefix so the line is distinguishable from the per-item
+            # `warn:` lines that precede it in a partial-failure batch.
+            print(
+                f"SUMMARY: enriched: {len(results)}, failed: {failures}",
+                file=sys.stderr,
+            )
         return results
diff --git a/src/xbrain/topic_synth.py b/src/xbrain/topic_synth.py
index 2c68fec..94a7b41 100644
--- a/src/xbrain/topic_synth.py
+++ b/src/xbrain/topic_synth.py
@@ -143,7 +143,12 @@ def synthesize_overviews_api(
             f"All {failures} topics failed synthesis; see warnings above for details."
         )
     if failures > 0:
-        print(f"synthesized: {len(results)}, failed: {failures}", file=sys.stderr)
+        # SUMMARY prefix so the line stands out from the per-topic `warn:`
+        # lines that precede it in a partial-failure batch.
+        print(
+            f"SUMMARY: synthesized: {len(results)}, failed: {failures}",
+            file=sys.stderr,
+        )
     return results
 
 
diff --git a/tests/test_executors_api.py b/tests/test_executors_api.py
index c090994..c022aad 100644
--- a/tests/test_executors_api.py
+++ b/tests/test_executors_api.py
@@ -115,7 +115,7 @@ def test_api_executor_skips_item_on_api_failure(capsys):
     captured = capsys.readouterr().err
     assert "enrichment failed for item 1" in captured
     # Partial-failure summary line is visible on stderr
-    assert "enriched: 1, failed: 1" in captured
+    assert "SUMMARY: enriched: 1, failed: 1" in captured
 
 
 def test_api_executor_raises_when_all_items_fail():
@@ -150,3 +150,47 @@ def create(*_args, **_kwargs):
     ex = ApiExecutor(model="m", output_language="English", client=_Boom())
     with pytest.raises(AttributeError, match="programmer bug"):
         ex.enrich_items([_item("1")], VOCAB)
+
+
+def test_api_executor_propagates_keyboard_interrupt():
+    """Ctrl-C must NOT be swallowed by the recoverable-errors tuple. The
+    narrow catch uses Exception subclasses; KeyboardInterrupt inherits from
+    BaseException and falls through. This is the property of Python, but
+    pin it as a regression test — a future refactor that switches the
+    tuple to BaseException would silently break Ctrl-C without any failing
+    test.
+    """
+    import pytest
+
+    class _CtrlC:
+        class messages:  # noqa: N801
+            @staticmethod
+            def create(*_args, **_kwargs):
+                raise KeyboardInterrupt
+
+    ex = ApiExecutor(model="m", output_language="English", client=_CtrlC())
+    with pytest.raises(KeyboardInterrupt):
+        ex.enrich_items([_item("1")], VOCAB)
+
+
+def test_api_executor_emits_no_summary_on_total_failure(capsys):
+    """The all-failed branch raises before printing the summary — there is
+    no `SUMMARY: enriched: 0, ...` line on a total-failure run. The raised
+    RuntimeError is the signal."""
+    import pytest
+    from anthropic import APIError
+
+    client = FakeAnthropic([APIError("503", request=None, body=None)])
+    ex = ApiExecutor(model="m", output_language="English", client=client)
+    with pytest.raises(RuntimeError):
+        ex.enrich_items([_item("1")], VOCAB)
+    assert "SUMMARY:" not in capsys.readouterr().err
+
+
+def test_api_executor_emits_no_summary_when_all_succeed(capsys):
+    """No failures, no noise: a clean batch stays silent on stderr."""
+    payload = {"summary": "r", "primary_topic": "misc", "topics": ["misc"]}
+    client = FakeAnthropic([payload])
+    ex = ApiExecutor(model="m", output_language="English", client=client)
+    ex.enrich_items([_item("1")], VOCAB)
+    assert "SUMMARY:" not in capsys.readouterr().err
diff --git a/tests/test_topic_synth.py b/tests/test_topic_synth.py
index 4cc102f..67d4fdd 100644
--- a/tests/test_topic_synth.py
+++ b/tests/test_topic_synth.py
@@ -70,7 +70,7 @@ def test_synthesize_overviews_api_isolates_an_api_error(capsys):
     results = synthesize_overviews_api(inputs, model="m", output_language="English", client=client)
     assert [r.slug for r in results] == ["second"]
     # Partial-failure summary line is visible on stderr
-    assert "synthesized: 1, failed: 1" in capsys.readouterr().err
+    assert "SUMMARY: synthesized: 1, failed: 1" in capsys.readouterr().err
 
 
 def test_synthesize_overviews_api_raises_when_all_topics_fail():
@@ -179,3 +179,53 @@ def test_apply_overview_judgments_rejects_a_non_dict_entry():
     valid, invalid = apply_overview_judgments(judgments)
     assert [j.slug for j in valid] == ["good"]
     assert any("not a JSON object" in e for e in invalid[0][1])
+
+
+def test_synthesize_overviews_api_propagates_keyboard_interrupt():
+    """Ctrl-C must propagate — falls through the narrow catch because
+    KeyboardInterrupt inherits from BaseException, not Exception. Regression
+    test against a future refactor that widens the catch.
+    """
+    import pytest
+
+    class _CtrlC:
+        class messages:  # noqa: N801
+            @staticmethod
+            def create(*_args, **_kwargs):
+                raise KeyboardInterrupt
+
+    with pytest.raises(KeyboardInterrupt):
+        synthesize_overviews_api(
+            [TopicInput(slug="x", description="d", summaries=["s"])],
+            model="m",
+            output_language="English",
+            client=_CtrlC(),
+        )
+
+
+def test_synthesize_overviews_api_emits_no_summary_on_total_failure(capsys):
+    """The all-failed branch raises before printing the summary."""
+    import pytest
+    from anthropic import APIError
+
+    client = FakeAnthropic([APIError("503", request=None, body=None)])
+    with pytest.raises(RuntimeError):
+        synthesize_overviews_api(
+            [TopicInput(slug="x", description="d", summaries=["s"])],
+            model="m",
+            output_language="English",
+            client=client,
+        )
+    assert "SUMMARY:" not in capsys.readouterr().err
+
+
+def test_synthesize_overviews_api_emits_no_summary_when_all_succeed(capsys):
+    """No failures, no noise: a clean batch stays silent on stderr."""
+    client = FakeAnthropic([{"overview": "ok", "notes": []}])
+    synthesize_overviews_api(
+        [TopicInput(slug="x", description="d", summaries=["s"])],
+        model="m",
+        output_language="English",
+        client=client,
+    )
+    assert "SUMMARY:" not in capsys.readouterr().err