dofliu · dofliu · Jun 15, 2026 · Jun 15, 2026
diff --git a/docs/C3_NARRATION_AB_PROPOSAL.md b/docs/C3_NARRATION_AB_PROPOSAL.md
@@ -0,0 +1,76 @@
+# C-3 旁白模型遷移 A/B 提案（2.5-flash → 3.x）
+
+> 對應 `docs/PRODUCT_READINESS.md` C-3（🟡 GATE，需開額度驗證品質）。
+> 狀態：**A/B 工具已備好（offline），等劉老師本機開額度跑過 → 看品質 → 決定切不切。**
+
+## 1. 為什麼要動
+
+- `slide_ingest.py:43` 旁白模型寫死 `MODEL = "gemini-2.5-flash"`（**將淘汰**，有 EOL 風險）。
+- M 軸（角色登錄表 `core/models.py`）`text.fast` 預設已是 `gemini-3.5-flash`，但旁白 chokepoint
+  **還沒走 `resolve()`**（M-2 刻意把這段 defer 給 C-3，因為直接換 = 把旁白默默從 2.5 遷到 3.5，
+  品質沒驗就上線不可接受）。
+- `3.5-flash` 實測接受 `thinking_budget=0`（旁白生成本來就關 thinking）。技術上可換，**只差品質驗證**。
+
+## 2. 為什麼是 GATE（routine 不自主切）
+
+換模型會**燒你的 Gemini 額度**且**影響每一支影片的旁白品質**（正確性 / 通順度 / 完整收尾 /
+講解深度）。這是主觀品質判斷，必須**你本機開額度跑、用眼睛看**。routine 只負責把「能跑的工具 +
+切換步驟」備好，不自主打真實 API、不自主切預設。
+
+## 3. A/B 工具（已 offline 備好）
+
+`tools/ab_narration.py`：對**同一份簡報的同幾頁**，用舊模型與候選模型**各生一次旁白並排輸出**。
+
+- **只跑旁白生成**——不跑章節切分 / TTS / ffmpeg / 完整 render，比「跑兩支完整影片」**省很多額度**。
+- **不改正式 pipeline 預設**——只是對同一頁注入不同 `model` 呼叫**真實的** `narrate_page_with_gemini`
+  （prompt、三段式 retry、`thinking_budget=0`、`_clean_narration` 全與正式線一致，**不會 prompt 漂移**）。
+  為此 `narrate_page_with_gemini` 加了一個**選填 `model` 參數**（預設仍 `MODEL`，正式 pipeline 零影響）。
+
+### 怎麼跑（在你本機，設好 `GEMINI_API_KEY`）
+
+```bash
+export GEMINI_API_KEY=...          # 別貼進任何會 commit 的檔案
+python tools/ab_narration.py 你的簡報.pdf \
+    --pages 1,3,5 \
+    --models gemini-2.5-flash,gemini-3.5-flash \
+    --out ab_narration_report.md
+```
+
+挑**有代表性的幾頁**（純文字頁 + 含公式/圖表頁 + 章節銜接頁）即可，不必整份跑。
+輸出 `ab_narration_report.md` 兩欄並排 + 各模型字元用量小結。
+
+## 4. 決策準則（看報告時對照）
+
+逐頁比兩欄旁白，3.x 要**不劣於** 2.5 才值得遷：
+
+| 面向 | 看什麼 |
+|---|---|
+| **正確性** | 公式/數字/專有名詞有沒有講錯（最重要，呼應 review gate 賣點） |
+| **完整收尾** | 句子有沒有腰斬 / 需要 retry 的頻率（3.x 若更少觸發 retry = 更省） |
+| **通順度** | 中文是否自然、口語講解感 |
+| **深度/長度** | 講解詳盡度是否與 2.5 相當（別變太短/太長拖影片） |
+| **成本** | 字元用量小結相對量級（精準單價待 C-2 對齊官方定價） |
+
+## 5. 驗過怎麼切（後續一刀，offline）
+
+A/B 滿意後，把旁白 chokepoint 從寫死改成走登錄表（= M-2 defer 的那段）：
+
+1. `slide_ingest.py`：`MODEL` 改成 `resolve("text.fast")`（或新增 `narration` 角色再 resolve）。
+2. 章節切分（`detect_chapters_with_gemini`）同步換。
+3. 跑 `pytest tests/`（硬規則 #7）。
+
+切完「換模型 = 改登錄表一個值 / 設定頁一個下拉」（M-3 已給逐角色設定 UI）。
+
+### Rollback
+
+若上線後發現品質退步：設定頁把 `text.fast`（或 `narration` 角色）覆寫回 `gemini-2.5-flash`
+即可即時退回，**不需改 code、不需重啟**（M-3 `model_roles` 最高優先）。
+
+## 6. 待你拍板的開放問題
+
+1. **遷移範圍**：只遷「逐頁旁白」，還是連「章節切分」一起？（兩者都吃 image input，建議一起驗一起換。）
+2. **走 `text.fast` 還是新增 `narration` 角色**？走 `text.fast` 最省（複用既有角色）；新增 `narration`
+   角色可讓旁白與其他 fast 文字用途**各自選模型**（更彈性，但多一個角色要維護）。建議**先走 `text.fast`**，
+   有需要再拆。
+3. **候選模型**：預設比 `gemini-2.5-flash` vs `gemini-3.5-flash`。要不要也納 `gemini-3-pro` 之類更強的比？
+   （pro 貴、旁白未必需要，建議先 flash 對 flash。）
diff --git a/docs/PRODUCT_READINESS.md b/docs/PRODUCT_READINESS.md
@@ -268,10 +268,23 @@
   char-based 近似模型。
 - [ ] 🟡 **C-2 單價對齊真實**（GATE，需查官方定價）— 現況單價是估算。對齊 Gemini 3 系列 +
   GCP TTS + （未來）image 真實單價。定價會變動 → 抽成設定常數 + 文件註明「以官方為準」。
-- [ ] 🟡 **C-3 旁白模型遷 3.x**（GATE，需開額度驗證品質）— `slide_ingest.py:43`
+- [~] 🟡 **C-3 旁白模型遷 3.x**（GATE，需開額度驗證品質）— `slide_ingest.py:43`
   `MODEL = "gemini-2.5-flash"`（將淘汰）。**M 軸完成後這只是改角色表 `text.fast` 一個值**。
   3.5-flash 實測接受 `thinking_budget=0`，但**旁白品質要先驗**再換。寫成 A/B proposal，劉老師
-  開額度跑過再切。（劉老師 2026-06-07：需額度會給權限。）
+  開額度跑過再切。（劉老師 2026-06-07：需額度會給權限；2026-06-15：開額度。）
+  - ✅ 2026-06-15 **A/B 工具 + 提案完成（offline 前置；實跑＝你本機開額度）**。劉老師 2026-06-15
+    開額度。因本 routine 環境**無 `GEMINI_API_KEY`**（你的 key 在你本機）且不該把 key 帶進 session，
+    品質 A/B 必須在**你本機**跑——故 routine 把「能跑的工具 + 切換步驟」備好：① `tools/ab_narration.py`
+    對同一份簡報同幾頁、用舊模型 vs 候選模型**各生一次旁白並排輸出**（**只跑旁白生成、不跑 TTS/
+    ffmpeg/完整 render ＝省額度**；注入不同 `model` 呼叫**真實** `narrate_page_with_gemini`、prompt/
+    retry/`thinking_budget=0` 全與正式線一致＝不漂移）；為此 `slide_ingest.narrate_page_with_gemini`
+    加**選填 `model` 參數**（預設仍 `MODEL`，正式 pipeline 零影響）。② `docs/C3_NARRATION_AB_PROPOSAL.md`：
+    為什麼動 / 怎麼在你本機跑（指令）/ 決策準則表（正確性·完整收尾·通順·深度·成本）/ 驗過怎麼切
+    （chokepoint 改走 `resolve("text.fast")`）+ rollback（設定頁覆寫回 2.5、免改 code）/ 3 個待拍板
+    開放問題（範圍 / `text.fast` vs 新增 `narration` 角色 / 候選模型）。補 `tests/test_ab_narration.py`
+    11 測（頁碼解析 / run_ab 每頁每模型透傳 / 報告並排+用量 / 缺 key SystemExit，**全 fake client
+    不打 API**）。本機相關子集 142 passed。**下一步＝你本機跑 A/B → 看品質 → 回報要不要切**，要切就
+    開後續一刀換 chokepoint。
 - [ ] 🟢 **C-4 `gemini-3.1-pro-image` 等開放再換**（GATE）— 劉老師想用但 API 未開放。等開放
   從 `gemini-3-pro-image` 換（`core/infocards/models.py`）。掛追蹤。
 - [x] 🟢 **C-5 模型 id 自我健檢**（offline）— ✅ 2026-06-09 完成。新增 `tools/check_models.py`：蒐集

diff --git a/slide_ingest.py b/slide_ingest.py
@@ -267,11 +267,17 @@ def _truncate_at_sentence(text: str, target: int = NARRATION_TARGET_CHARS,
 
 def narrate_page_with_gemini(client, page_png: bytes, chapter_title: str,
                               chapter_pages: int, page_in_chapter: int,
-                              prev_narration: str, *, brief: bool = False) -> str:
+                              prev_narration: str, *, brief: bool = False,
+                              model: str | None = None) -> str:
     """單頁 → narration 草稿。Gemini 偶爾會在中文句中提早 STOP 導致句子腰斬,
-    結尾若不是句號類符號就 retry 一次, temperature 提高 + prompt 加強完整性要求。"""
+    結尾若不是句號類符號就 retry 一次, temperature 提高 + prompt 加強完整性要求。
+
+    model: 覆寫旁白模型 id（預設沿用模組 MODEL）。供 C-3 旁白模型 A/B 比對用
+    （tools/ab_narration.py 對同一頁跑 2.5 vs 3.x 比品質），不影響正式 pipeline 預設。"""
     from google.genai import types
 
+    model = model or MODEL
+
     template = NARRATION_PROMPT_BRIEF if brief else NARRATION_PROMPT_DETAILED
     base_prompt = template.format(
         chapter_title=chapter_title,
@@ -306,7 +312,7 @@ def narrate_page_with_gemini(client, page_png: bytes, chapter_title: str,
 
         try:
             resp = client.models.generate_content(
-                model=MODEL,
+                model=model,
                 contents=parts + [prompt],
                 config=types.GenerateContentConfig(
                     temperature=temp,
@@ -316,7 +322,7 @@ def narrate_page_with_gemini(client, page_png: bytes, chapter_title: str,
                 ),
             )
             from core import usage
-            usage.record_text_now("video", MODEL, prompt, resp.text or "",
+            usage.record_text_now("video", model, prompt, resp.text or "",
                                   label="narration")
             text = _clean_narration(resp.text)
             if text and text.endswith(_SENTENCE_END):

diff --git a/tests/test_ab_narration.py b/tests/test_ab_narration.py
@@ -0,0 +1,119 @@
+"""tools/ab_narration 的離線測試（C-3 旁白 A/B 比對工具）。
+
+全程用 fake narrate_fn / fake client，**不打真 Gemini、不渲染真 PDF**。驗收：
+- 頁碼解析（逗號 / 範圍 / 去重排序 / 夾邊界）。
+- run_ab 對每頁 × 每模型各呼叫一次、把 model 正確透傳。
+- 報告並排呈現兩模型輸出 + 字元用量小結。
+- _build_client 缺 key 直接 SystemExit（不靜默）。
+"""
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+import pytest
+
+# tools/ 不是 package；比照 test_check_models.py 掛 tools 目錄上 path 當 top-level module。
+_TOOLS_DIR = Path(__file__).resolve().parent.parent / "tools"
+if str(_TOOLS_DIR) not in sys.path:
+    sys.path.append(str(_TOOLS_DIR))
+
+import ab_narration as ab  # noqa: E402
+
+
+class TestParsePages:
+    def test_comma(self):
+        assert ab.parse_pages("1,3,5", 10) == [1, 3, 5]
+
+    def test_range(self):
+        assert ab.parse_pages("2-4", 10) == [2, 3, 4]
+
+    def test_mixed_dedup_sorted(self):
+        assert ab.parse_pages("5,1-3,3", 10) == [1, 2, 3, 5]
+
+    def test_clamps_to_total(self):
+        assert ab.parse_pages("1,8,99", 5) == [1]
+
+    def test_blank_chunks_ignored(self):
+        assert ab.parse_pages("1,,2,", 5) == [1, 2]
+
+
+class TestRunAb:
+    def test_each_page_each_model_called_with_model(self):
+        calls = []
+
+        def fake_narrate(client, png, title, ch_pages, p_in_ch, prev,
+                         *, brief=False, model=None):
+            calls.append((png, model, brief))
+            return f"[{model}] 旁白 for {png.decode()}"
+
+        pages = {1: b"p1", 2: b"p2"}
+        models = ["gemini-2.5-flash", "gemini-3.5-flash"]
+        results = ab.run_ab(object(), fake_narrate, pages, models)
+
+        # 2 頁 × 2 模型 = 4 次呼叫，model 各自正確透傳
+        assert len(calls) == 4
+        assert {c[1] for c in calls} == set(models)
+        assert results[0]["page"] == 1 and results[1]["page"] == 2
+        cell = results[0]["models"]["gemini-3.5-flash"]
+        assert cell["text"] == "[gemini-3.5-flash] 旁白 for p1"
+        assert cell["chars"] == len(cell["text"])
+
+    def test_pages_sorted(self):
+        def fake_narrate(c, png, *a, model=None, **k):
+            return "x"
+
+        results = ab.run_ab(object(), fake_narrate, {3: b"c", 1: b"a"},
+                            ["m1", "m2"])
+        assert [r["page"] for r in results] == [1, 3]
+
+    def test_brief_flag_forwarded(self):
+        seen = {}
+
+        def fake_narrate(c, png, *a, brief=False, model=None, **k):
+            seen["brief"] = brief
+            return "y"
+
+        ab.run_ab(object(), fake_narrate, {1: b"a"}, ["m1"], brief=True)
+        assert seen["brief"] is True
+
+
+class TestReport:
+    def test_report_has_both_models_and_usage(self):
+        results = [
+            {"page": 1, "models": {
+                "m-old": {"text": "舊版旁白", "chars": 4},
+                "m-new": {"text": "新版旁白較長一點", "chars": 8},
+            }},
+        ]
+        out = ab.render_report(results, ["m-old", "m-new"], "deck.pdf")
+        assert "deck.pdf" in out
+        assert "第 1 頁" in out
+        assert "舊版旁白" in out and "新版旁白較長一點" in out
+        assert "`m-old`" in out and "`m-new`" in out
+        # 用量小結含兩模型字元總數
+        assert "字元用量小結" in out
+        assert "輸出共 4 字" in out and "輸出共 8 字" in out
+
+
+class TestBuildClient:
+    def test_missing_key_exits(self, monkeypatch):
+        monkeypatch.delenv("GEMINI_API_KEY", raising=False)
+        with pytest.raises(SystemExit):
+            ab._build_client()
+
+
+class TestNarrateOnePage:
+    def test_passes_model_and_context(self):
+        captured = {}
+
+        def fake_narrate(client, png, title, ch_pages, p_in_ch, prev,
+                         *, brief=False, model=None):
+            captured.update(model=model, page_in_chapter=p_in_ch, png=png)
+            return "ok"
+
+        out = ab.narrate_one_page(object(), fake_narrate, b"png", "gemini-3.5-flash",
+                                  page_in_chapter=7)
+        assert out == "ok"
+        assert captured["model"] == "gemini-3.5-flash"
+        assert captured["page_in_chapter"] == 7