表單填寫 M4.5/M4.6 + 安全 / CI / repo 衛生改進#19
Merged
Merged
Conversation
Form Fill: - M4.5 Layer 2 新增 page.find_tables() 表格 cell 偵測(label cell + 相鄰空白 cell), 解決表格式表單偵測失效的真實 bug - M4.6 Layer 2 偵測 0 欄位時自動 fallback 到影像 backend(Gemini) - 新增表格表單 fixture 與 M4.5/M4.6 測試 安全與設定: - 加密金鑰改走 Settings 單一來源,啟動時若用公開預設值會警告 - CORS 改為可設定 ALLOWED_ORIGINS,修正 "*" + credentials 無效組合 - 標註 X-User-Id 未驗證的信任模型 Repo 衛生 / CI: - 移除誤入版控的 scanBot.zip、db journal、真實發票 PDF,補強 .gitignore - 新增 pyproject.toml pytest 設定與 GitHub Actions CI(Python 3.10/3.11) - 統一版本號為 3.3.0,校正端點數 https://claude.ai/code/session_013AjD6vwG7qyojWtZD3VvD9
test_scanmail_plus 以僅含 DATABASE_PATH 的 stub 取代 get_settings, main.py 模組層級讀取 cors_origins 會因 import 順序在該 stub 下 AttributeError。 改用 getattr fallback,正式環境仍走真實 Settings(讀 .env)。 https://claude.ai/code/session_013AjD6vwG7qyojWtZD3VvD9
測試套件不含影片編碼測試,moviepy 走 bundled imageio-ffmpeg, 無系統 ffmpeg 仍可全綠(已於乾淨環境驗證 112 passed)。 apt-get update/install 是 runner 上易因網路/權限快速失敗的步驟,移除以穩定 CI。 https://claude.ai/code/session_013AjD6vwG7qyojWtZD3VvD9
CI 的 3 個失敗(test_pdfplumber_fill_with_cjk / test_c1_image_input_fill_works / test_pdf_to_markdown_preserves_cjk)皆因 Ubuntu runner 沒有 CJK 字型, ReportLab 把中文渲染成 \x00。 - ci.yml: 新增 fonts-noto-cjk / fonts-wqy-zenhei 安裝步驟 - doc_converter._register_cjk_font: - 加入 Windows(新細明/細明/正黑/標楷)與 macOS 字型路徑(順手修好本機 Windows 渲染) - 加 glob 容忍套件版本造成的檔名差異 - 移除 DejaVu —— 它無中文 glyph,卻會「註冊成功卻渲染成空白」而遮蔽問題 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
原測試用 delenv(GEMINI_API_KEY) + cache_clear 模擬「未設定 key」, 但 Settings.model_config 設了 env_file=.env,重建時又從 .env 讀回 key, 導致本機(有 .env key)會打到真實 API。改為直接在 cached Settings 實例上 清空 GEMINI_API_KEY —— patch 在正確的層級,環境/.env 皆不影響。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
專案改進盤點後的一輪整理,涵蓋四個面向:功能、安全、CI、repo 衛生。
Auto Form Fill
pdfplumber_extract.detect()新增page.find_tables()表格 cell 偵測,當某 cell 命中 label 關鍵字、且右側/下方相鄰 cell 為空時,視為填寫欄位。解決文件已記錄的「表格式表單(學生外宿訪視單)Layer 2 失效」真實 bug。dispatcher.detect_fields()在 Layer 2 偵測到 0 欄位時,自動轉影像走 Layer 3/4(Gemini)。tests/fixtures/forms/table_visit_form.pdf)與 3 個對應測試。安全與設定
Settings.ENCRYPTION_KEY單一來源(crypto.py不再自帶 default),啟動時若仍用公開預設值會logger.warning。預設值不變,不影響既有已加密資料。ALLOWED_ORIGINS,並修正allow_origins=["*"]+allow_credentials=True的無效組合(wildcard 時關閉 credentials)。X-User-Id未驗證的信任模型(sessions.py註解 + README 安全性說明)。CI / repo 衛生
pyproject.toml(pytest 設定)與.github/workflows/ci.yml(Python 3.10 / 3.11、安裝 ffmpeg、產生 fixtures、跑 pytest)。scanBot.zip、test_scanmail.db-journal,以及兩個含敏感資訊的真實發票 PDF;補強.gitignore(*.db-journal/-wal/-shm、*.zip)。3.3.0(app / health endpoint),校正 README / STATUS 端點數。Test plan
python tests/generate_test_forms.py產生含表格 fixturepytest tests/test_form_fill_tables.py(M4.5 偵測 8 欄位、M4.6 fallback 路徑、有欄位時不 fallback)→ 3 passedpytest tests/test_form_fill.py tests/test_form_fill_review_fixes.py等既有測試無回歸 → 全綠is_default_key()+cors_originssmoke 驗證/health回 3.3.0、CORS header 正確範圍外(後續 TODO)
完整登入/多使用者驗證、Rate Limiting、PWA、深色模式、Form Fill M5 Mapping UI / M7 整合 / PaddleOCR。
https://claude.ai/code/session_013AjD6vwG7qyojWtZD3VvD9
Generated by Claude Code