Add zh/ja AI-lexicon files for stylometry overlap detection

## Context

Surfaced from a deep-research audit of patina's evaluator paths. Three audit P0s already shipped (#96, #103). This is the remaining one and was deferred because it requires native-language curation, not a pure code change.

## Problem

`.patina.default.yaml` currently restricts both stylometry and AI-lexicon overlap to en/ko:

```yaml
stylometry:
  languages: [ko, en]
lexicon:
  languages: [en, ko]   # zh/ja deferred (no curated lexicon yet)
```

Pattern packs already cover all four languages (ko/en/zh/ja, 6 packs each, ~28 patterns each). But for zh/ja runs:

- `lexicon/ai-zh.md` — does not exist
- `lexicon/ai-ja.md` — does not exist
- Stylometry burstiness/MATTR thresholds are ko/en-calibrated; zh/ja word segmentation may need different bands

This is the asymmetry called out in the audit:

> 패턴 catalog는 4개 언어를 지원하지만, README의 stylometric/AI-lexicon 설명은 EN 약 108개, KO 약 102개만 명시한다. 즉 zh/ja에 대해 pattern 지원은 있어도 stylometry/lexicon calibration은 약할 가능성이 높다.

In rewrite/ouroboros runs targeting zh or ja text, this means the LLM does most of the detection work alone — there's no statistical floor signal from lexicon overlap to back it up.

## Scope

### Lexicon files (this issue)
- [ ] `lexicon/ai-zh.md` — 50–100 high-precision AI-tell phrases in Mandarin (e.g., 总而言之, 综上所述, 在数字时代, 让我们一起, 不仅...而且, 至关重要的是, etc.)
- [ ] `lexicon/ai-ja.md` — 50–100 high-precision AI-tell phrases in Japanese (e.g., まとめると, 結論として, 〜することが重要です, 〜と言えるでしょう, デジタル時代において, etc.)
- [ ] `.patina.default.yaml` — flip `lexicon.languages` to `[en, ko, zh, ja]` after files exist
- [ ] Optional: `lexicon/README.md` curation guide (entry format, severity, profile context)

### Stylometry calibration (follow-up issue, not this one)
- Decide if zh/ja need different burstiness/MATTR thresholds (CJK tokenization affects sentence-length CV and TTR window)
- Calibration corpus for zh/ja (HC3 has Chinese pairs; ja needs sourcing)
- Flip `stylometry.languages` only after thresholds are validated

## Acceptance criteria

For each language:
- ≥ 50 lexicon entries, each with example + counterexample
- High precision over recall — Wikipedia FP rate stays under 25% (ko/en boundary)
- Calibrated against ≥ 200 paragraphs (HC3 zh / curated ja AI-vs-human pairs)
- Documented in README's stylometry section

## Why this needs a human

LLM-drafted lexicons fail in two predictable ways:
- Too-common words → every Wikipedia article gets flagged AI
- Too-narrow phrases → no detection lift beyond existing patterns

Korean lexicon (90 entries) and English (108 entries) were hand-curated by the maintainer against a 400-paragraph calibration corpus. zh/ja deserve the same bar.

## Suggested approach

1. Draft 30 high-confidence starter entries per language (LLM-assisted, manually filtered)
2. Run against HC3-zh sample + author-collected ja AI/human pairs
3. Iterate based on FP rate, expand to 50–100
4. Land lexicon files first, flip config flag in a follow-up PR after corpus validation

## Out of scope

- ONNX-based detector (defer; the audit's L-difficulty item, lower ROI)
- Multilingual `Intl.Segmenter` integration (handled separately when threshold calibration starts)
- Profile-specific zh/ja tone overrides

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zh/ja AI-lexicon files for stylometry overlap detection #104

Context

Problem

Scope

Lexicon files (this issue)

Stylometry calibration (follow-up issue, not this one)

Acceptance criteria

Why this needs a human

Suggested approach

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add zh/ja AI-lexicon files for stylometry overlap detection #104

Description

Context

Problem

Scope

Lexicon files (this issue)

Stylometry calibration (follow-up issue, not this one)

Acceptance criteria

Why this needs a human

Suggested approach

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions