You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Surfaced from a deep-research audit of patina's evaluator paths. Three audit P0s already shipped (#96, #103). This is the remaining one and was deferred because it requires native-language curation, not a pure code change.
Problem
.patina.default.yaml currently restricts both stylometry and AI-lexicon overlap to en/ko:
Pattern packs already cover all four languages (ko/en/zh/ja, 6 packs each, ~28 patterns each). But for zh/ja runs:
lexicon/ai-zh.md — does not exist
lexicon/ai-ja.md — does not exist
Stylometry burstiness/MATTR thresholds are ko/en-calibrated; zh/ja word segmentation may need different bands
This is the asymmetry called out in the audit:
패턴 catalog는 4개 언어를 지원하지만, README의 stylometric/AI-lexicon 설명은 EN 약 108개, KO 약 102개만 명시한다. 즉 zh/ja에 대해 pattern 지원은 있어도 stylometry/lexicon calibration은 약할 가능성이 높다.
In rewrite/ouroboros runs targeting zh or ja text, this means the LLM does most of the detection work alone — there's no statistical floor signal from lexicon overlap to back it up.
lexicon/ai-ja.md — 50–100 high-precision AI-tell phrases in Japanese (e.g., まとめると, 結論として, 〜することが重要です, 〜と言えるでしょう, デジタル時代において, etc.)
.patina.default.yaml — flip lexicon.languages to [en, ko, zh, ja] after files exist
Optional: lexicon/README.md curation guide (entry format, severity, profile context)
Stylometry calibration (follow-up issue, not this one)
Decide if zh/ja need different burstiness/MATTR thresholds (CJK tokenization affects sentence-length CV and TTR window)
Calibration corpus for zh/ja (HC3 has Chinese pairs; ja needs sourcing)
Flip stylometry.languages only after thresholds are validated
Acceptance criteria
For each language:
≥ 50 lexicon entries, each with example + counterexample
High precision over recall — Wikipedia FP rate stays under 25% (ko/en boundary)
Calibrated against ≥ 200 paragraphs (HC3 zh / curated ja AI-vs-human pairs)
Documented in README's stylometry section
Why this needs a human
LLM-drafted lexicons fail in two predictable ways:
Too-common words → every Wikipedia article gets flagged AI
Too-narrow phrases → no detection lift beyond existing patterns
Korean lexicon (90 entries) and English (108 entries) were hand-curated by the maintainer against a 400-paragraph calibration corpus. zh/ja deserve the same bar.
Suggested approach
Draft 30 high-confidence starter entries per language (LLM-assisted, manually filtered)
Run against HC3-zh sample + author-collected ja AI/human pairs
Iterate based on FP rate, expand to 50–100
Land lexicon files first, flip config flag in a follow-up PR after corpus validation
Out of scope
ONNX-based detector (defer; the audit's L-difficulty item, lower ROI)
Multilingual Intl.Segmenter integration (handled separately when threshold calibration starts)
Context
Surfaced from a deep-research audit of patina's evaluator paths. Three audit P0s already shipped (#96, #103). This is the remaining one and was deferred because it requires native-language curation, not a pure code change.
Problem
.patina.default.yamlcurrently restricts both stylometry and AI-lexicon overlap to en/ko:Pattern packs already cover all four languages (ko/en/zh/ja, 6 packs each, ~28 patterns each). But for zh/ja runs:
lexicon/ai-zh.md— does not existlexicon/ai-ja.md— does not existThis is the asymmetry called out in the audit:
In rewrite/ouroboros runs targeting zh or ja text, this means the LLM does most of the detection work alone — there's no statistical floor signal from lexicon overlap to back it up.
Scope
Lexicon files (this issue)
lexicon/ai-zh.md— 50–100 high-precision AI-tell phrases in Mandarin (e.g., 总而言之, 综上所述, 在数字时代, 让我们一起, 不仅...而且, 至关重要的是, etc.)lexicon/ai-ja.md— 50–100 high-precision AI-tell phrases in Japanese (e.g., まとめると, 結論として, 〜することが重要です, 〜と言えるでしょう, デジタル時代において, etc.).patina.default.yaml— fliplexicon.languagesto[en, ko, zh, ja]after files existlexicon/README.mdcuration guide (entry format, severity, profile context)Stylometry calibration (follow-up issue, not this one)
stylometry.languagesonly after thresholds are validatedAcceptance criteria
For each language:
Why this needs a human
LLM-drafted lexicons fail in two predictable ways:
Korean lexicon (90 entries) and English (108 entries) were hand-curated by the maintainer against a 400-paragraph calibration corpus. zh/ja deserve the same bar.
Suggested approach
Out of scope
Intl.Segmenterintegration (handled separately when threshold calibration starts)