lemur47 · lemur47 · Feb 16, 2026 · Feb 16, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -47,6 +47,7 @@ jobs:
           python-version: ${{ matrix.python-version }}
       - run: uv sync --all-extras
       - run: uv pip install en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl
+      - run: uv pip install ja_core_news_sm@https://github.com/explosion/spacy-models/releases/download/ja_core_news_sm-3.8.0/ja_core_news_sm-3.8.0-py3-none-any.whl
       - run: uv run pytest --cov=si_protocols --cov-report=term-missing
 
   site:

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -10,7 +10,8 @@ Hybrid tech-psychic protocols for **Spiritual Intelligence** — open-source too
 
 ```bash
 uv sync --all-extras                          # Install all deps
-uv pip install en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl  # Required NLP model
+uv pip install en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl  # Required English NLP model
+uv pip install ja_core_news_sm@https://github.com/explosion/spacy-models/releases/download/ja_core_news_sm-3.8.0/ja_core_news_sm-3.8.0-py3-none-any.whl  # Required Japanese NLP model
 uv run pytest                                 # Run all tests
 uv run pytest tests/test_markers.py           # Run a single test file
 uv run pytest -k "test_deterministic"         # Run tests matching a name
@@ -25,7 +26,9 @@ uvicorn app.main:app --host 127.0.0.1 --port 8000  # Run API server (local-only)
 
 CLI entry point: `uv run si-threat-filter examples/synthetic_suspicious.txt`
 
-The CLI supports `--format rich` (default, colour-coded) and `--format json` (machine-readable). Rich output respects the `NO_COLOR` env var automatically.
+The CLI supports:
+- `--format rich` (default, colour-coded) and `--format json` (machine-readable). Rich output respects the `NO_COLOR` env var automatically.
+- `--lang en` (default) or `--lang ja` for Japanese language analysis.
 
 ### Site (Astro)
 
@@ -36,23 +39,32 @@ cd site && npm run build         # Production build
 
 ## Architecture
 
-The threat filter produces a 0–100 score by combining two analysis layers:
+The threat filter produces a 0–100 score by combining two analysis layers. The pipeline is **multi-language**: a `lang` parameter (`"en"` | `"ja"`, default `"en"`) flows through the CLI, API, and core library. Each language has its own spaCy model and marker set.
 
-1. **Tech layer** (`threat_filter.py:tech_analysis`) — spaCy NLP pipeline that scores text across seven dimensions: vagueness (adjective density against `markers.VAGUE_ADJECTIVES`), authority claims (phrase matching against `markers.AUTHORITY_PHRASES`), urgency/fear patterns (`markers.URGENCY_PATTERNS`), emotional manipulation (lemma-based matching against `markers.FEAR_WORDS` and `markers.EUPHORIA_WORDS` with a contrast bonus when both polarities appear), logical contradictions (detecting when both poles of `markers.CONTRADICTION_PAIRS` appear in the same text — e.g. empowerment alongside dependency), source attribution analysis (detecting unfalsifiable sources via `markers.UNFALSIFIABLE_SOURCE_PHRASES`, unnamed authorities via `markers.UNNAMED_AUTHORITY_PHRASES`, offset by verifiable citations via `markers.VERIFIABLE_CITATION_MARKERS`), and commitment escalation (detecting foot-in-the-door progression via `markers.COMMITMENT_ESCALATION_MARKERS` — splits text into thirds using spaCy sentence boundaries and measures whether tiered commitment intensity increases from early to late segments). Weighted composite: 17% vagueness + 17% authority + 13% urgency + 13% emotion + 13% contradiction + 13% source attribution + 14% escalation.
+### Multi-language support
+
+- **Marker registry** (`marker_registry.py`) — `MarkerSet` frozen dataclass bundles all 12 marker categories. `get_markers(lang)` dispatches to the correct language module with lazy loading and caching.
+- **English markers** (`markers.py`) — original marker definitions (unchanged).
+- **Japanese markers** (`markers_ja.py`) — culturally adapted markers for Japanese spiritual contexts (スピリチュアル, 霊感商法, カルト, etc.).
+- **NLP models** — `_nlp_cache` dict in `threat_filter.py` lazily loads the appropriate spaCy model per language: `en_core_web_sm` for English, `ja_core_news_sm` for Japanese.
+
+### Analysis pipeline
+
+1. **Tech layer** (`threat_filter.py:tech_analysis`) — spaCy NLP pipeline that scores text across seven dimensions: vagueness (adjective density against `markers.vague_adjectives`), authority claims (phrase matching against `markers.authority_phrases`), urgency/fear patterns (`markers.urgency_patterns`), emotional manipulation (lemma-based matching against `markers.fear_words` and `markers.euphoria_words` with a contrast bonus when both polarities appear), logical contradictions (detecting when both poles of `markers.contradiction_pairs` appear in the same text — e.g. empowerment alongside dependency), source attribution analysis (detecting unfalsifiable sources via `markers.unfalsifiable_source_phrases`, unnamed authorities via `markers.unnamed_authority_phrases`, offset by verifiable citations via `markers.verifiable_citation_markers`), and commitment escalation (detecting foot-in-the-door progression via `markers.commitment_escalation_markers` — splits text into thirds using spaCy sentence boundaries and measures whether tiered commitment intensity increases from early to late segments). Weighted composite: 17% vagueness + 17% authority + 13% urgency + 13% emotion + 13% contradiction + 13% source attribution + 14% escalation.
 
 2. **Heuristic layer** (`threat_filter.py:psychic_heuristic`) — probabilistic dissonance scanner using `random.Random` (intentional — placeholder for future biofeedback integration). Accepts a `seed` param for deterministic testing.
 
-3. **Hybrid scoring** (`threat_filter.py:hybrid_score`) — combines the two: 60% tech + 40% heuristic. Returns a `ThreatResult` frozen dataclass.
+3. **Hybrid scoring** (`threat_filter.py:hybrid_score`) — combines the two: 60% tech + 40% heuristic. Returns a `ThreatResult` frozen dataclass. Accepts `lang` keyword-only param.
 
-The spaCy model (`_nlp`) is lazy-loaded via `_get_nlp()` to avoid import-time side effects in tests. Tests that exercise the NLP pipeline are marked `@pytest.mark.slow`.
+The spaCy models are lazy-loaded via `_get_nlp(lang)` to avoid import-time side effects in tests. Tests that exercise the NLP pipeline are marked `@pytest.mark.slow`.
 
-4. **Output formatting** (`output.py`) — `render_rich()` produces colour-coded terminal output (green/yellow/red by threat level) with Rich panels and tables; `render_json()` emits `dataclasses.asdict()` as indented JSON. The `_threat_style()` helper maps score bands: 0-33 green, 34-66 yellow, 67-100 red bold.
+4. **Output formatting** (`output.py`) — `render_rich()` produces colour-coded terminal output (green/yellow/red by threat level) with Rich panels and tables; `render_json()` emits `dataclasses.asdict()` as indented JSON. The `_threat_style()` helper maps score bands: 0-33 green, 34-66 yellow, 67-100 red bold. Output is language-agnostic — it renders whatever strings are in the ThreatResult.
 
 `ThreatResult` frozen dataclass fields: `overall_threat_score`, `tech_contribution`, `intuition_contribution`, `detected_entities`, `authority_hits`, `urgency_hits`, `emotion_hits`, `contradiction_hits`, `source_attribution_hits`, `escalation_hits`, `message`.
 
-5. **REST API** (`app/main.py`) — FastAPI application providing `POST /analyse` (wraps `hybrid_score()`) and `GET /health`. The `app/` package is a dev/deployment artifact separate from the core library — it is not included in the wheel. Pydantic request/response schemas live in `app/schemas.py`. The `/analyse` endpoint is a sync `def` so FastAPI runs CPU-bound spaCy work in a thread pool. Run with `uvicorn app.main:app` on `127.0.0.1:8000` (local-only).
+5. **REST API** (`app/main.py`) — FastAPI application providing `POST /analyse` (wraps `hybrid_score()`) and `GET /health`. The `app/` package is a dev/deployment artifact separate from the core library — it is not included in the wheel. Pydantic request/response schemas live in `app/schemas.py`. The `/analyse` endpoint accepts a `lang` field (`"en"` | `"ja"`, default `"en"`). The endpoint is a sync `def` so FastAPI runs CPU-bound spaCy work in a thread pool. Run with `uvicorn app.main:app` on `127.0.0.1:8000` (local-only).
 
-Marker definitions in `markers.py` are static word/phrase lists (frozenset for adjectives, lists for phrases/patterns). All markers must be lowercase. Markers span tradition-specific categories: generic New Age, prosperity gospel, conspirituality, New Age commercial exploitation, high-demand group (cult) rhetoric, and fraternal/secret society traditions.
+Marker definitions are static word/phrase lists (frozenset for adjectives, lists for phrases/patterns). English markers must be lowercase; Japanese markers use standard full-width forms. Markers span tradition-specific categories: generic New Age / スピリチュアル, prosperity gospel / 繁栄の福音, conspirituality / 陰謀論スピ, commercial exploitation / 霊感商法, high-demand group (cult) / カルト, and fraternal/secret society / 秘密結社 traditions.
 
 ## Git workflow
 
@@ -68,3 +80,4 @@ Marker definitions in `markers.py` are static word/phrase lists (frozenset for a
 - Ruff line length: 99. Ruff rules include isort (`I`), pyupgrade (`UP`), bugbear (`B`), bandit (`S`)
 - Pre-commit hooks run ruff, gitleaks, bandit, and pytest on every commit
 - Coverage threshold: 70% (`fail_under` in pyproject.toml)
+- Adding a new language requires: a `markers_<lang>.py` file, a loader in `marker_registry.py`, a model entry in `_LANG_MODELS`, and the `SupportedLang` Literal updated
diff --git a/app/main.py b/app/main.py
@@ -103,7 +103,7 @@ def analyse(request: AnalyseRequest) -> AnalyseResponse:
 
     Sync endpoint — FastAPI runs CPU-bound spaCy work in a thread pool.
     """
-    result = hybrid_score(request.text, request.density_bias, seed=request.seed)
+    result = hybrid_score(request.text, request.density_bias, seed=request.seed, lang=request.lang)
     return AnalyseResponse(**dataclasses.asdict(result))
 
 

diff --git a/app/schemas.py b/app/schemas.py
@@ -2,6 +2,8 @@
 
 from __future__ import annotations
 
+from typing import Literal
+
 from pydantic import BaseModel, Field
 
 
@@ -11,6 +13,7 @@ class AnalyseRequest(BaseModel):
     text: str = Field(min_length=1, max_length=100_000)
     density_bias: float = Field(default=0.75, ge=0.0, le=1.0)
     seed: int | None = None
+    lang: Literal["en", "ja"] = Field(default="en")
 
 
 class AnalyseResponse(BaseModel):

diff --git a/docs/STRATEGY.md b/docs/STRATEGY.md
@@ -0,0 +1,75 @@
+# Strategy
+
+## Mission
+
+**Spiritual Intelligence** — cybersecurity for the soul and planetary defence.
+
+si-protocols provides open-source tools that detect disinformation patterns in metaphysical and spiritual content. The project operates at the intersection of natural language processing, pattern recognition, and spiritual literacy — giving individuals and communities the means to evaluate the content they consume.
+
+## What Is Spiritual Intelligence?
+
+Spiritual Intelligence (SI / 霊的知能) is the capacity to discern, evaluate, and navigate spiritual and metaphysical information with clarity. It encompasses:
+
+- **Critical discernment** — the ability to distinguish authentic spiritual teaching from manipulative rhetoric
+- **Pattern recognition** — identifying recurring disinformation tactics across traditions, cultures, and languages
+- **Emotional sovereignty** — maintaining independent judgement when confronted with fear-based or euphoria-driven persuasion
+- **Systemic awareness** — understanding how spiritual narratives interact with social, economic, and political systems
+
+SI is not about debunking spirituality. It is about raising the standard of spiritual discourse by making manipulation patterns visible.
+
+## Problem Statement
+
+Spiritual and metaphysical content is uniquely vulnerable to disinformation because:
+
+1. **Emotion-driven persuasion** — spiritual claims often bypass rational evaluation by targeting fear, hope, and identity
+2. **Unfalsifiable authority** — appeals to channelled messages, ancient wisdom, or divine revelation resist fact-checking
+3. **Escalating commitment** — manipulative content progressively demands greater financial, emotional, and social investment
+4. **Covert inter-systemic threats** — deceptive spiritual narratives can serve as vectors for financial exploitation, political radicalisation, and cult recruitment
+5. **Cross-cultural exploitation** — the same manipulation patterns appear in English-language New Age content, Japanese スピリチュアル商法, prosperity gospel, and conspirituality movements worldwide
+
+No mainstream content moderation tool addresses these patterns. si-protocols fills that gap.
+
+## Approach
+
+### Technology
+
+- **NLP-based pattern detection** — spaCy pipelines analyse text across seven dimensions: vagueness, authority claims, urgency/fear, emotional manipulation, logical contradictions, source attribution, and commitment escalation
+- **Heuristic layer** — probabilistic dissonance scanner (placeholder for future biofeedback integration)
+- **Multi-language architecture** — extensible marker registry supports English and Japanese, with a clean path to additional languages
+- **Hybrid scoring** — weighted combination of tech and heuristic layers produces a 0–100 threat score
+
+### Principles
+
+- **Open source** — MIT-licenced, fully transparent
+- **Local-only** — never hosts, collects, or analyses third-party content
+- **Non-judgemental** — detects manipulation patterns without evaluating spiritual beliefs themselves
+- **Culturally aware** — markers are adapted to specific spiritual traditions and linguistic contexts, not naively translated
+
+## Audience
+
+- **Researchers** studying spiritual disinformation, cult dynamics, and online radicalisation
+- **Content creators** (YouTubers, bloggers, podcasters) who want to evaluate their own content for inadvertent manipulation patterns
+- **Spiritual and thought leaders** seeking to maintain integrity in their communications
+- **Engineers** interested in the intersection of NLP, spirituality, and information integrity
+- **Individuals** navigating the spiritual marketplace who want analytical tools to complement their own discernment
+
+## Roadmap
+
+1. **Multi-language analysis** — Japanese language support (markers, NLP pipeline, CLI, API) ✓
+2. **Web demo** — Svelte-based interactive interface for browser-based analysis
+3. **note.com launch** — Japanese-language articles and community engagement
+4. **Community building** — contributor programme, marker refinement, tradition-specific modules
+5. **Consulting and SaaS** — professional services for organisations dealing with spiritual disinformation
+
+## Two-Channel Strategy
+
+- **English channel** — neutral, professional tone. Focused on the technology, research applications, and open-source community. Primary platform: GitHub, project website.
+- **Japanese channel** — nerdy, techie tone. Emphasises the engineering craft and Japanese spiritual context (スピリチュアル, 霊感商法, カルト対策). Primary platform: note.com.
+
+Both channels share the same codebase and core technology. The difference is editorial voice, not capability.
+
+## Tone
+
+Professional. Not provocative. Let the technology speak.
+
+si-protocols does not attack spiritual traditions or practitioners. It provides analytical tools. The project's credibility comes from technical rigour, transparent methodology, and respect for the complexity of spiritual experience.
diff --git a/examples/synthetic_suspicious_ja.txt b/examples/synthetic_suspicious_ja.txt
@@ -0,0 +1,18 @@
+アセンデッドマスターが言う、この宇宙的な真実は長い間秘められたものでした。
+銀河連合が確認した、今すぐ行動しなければ窓が閉じようとしている。
+選ばれた者だけがアセンションの道を歩むことができる。
+
+古代の叡智が教える量子場のエネルギーが、あなたのDNAを活性化する。
+科学者が言う、この波動を上げる方法は科学的に証明されている。
+隠蔽された研究が示す、彼らが隠していることを知る時が来た。
+
+崩壊と破滅が目覚めない者を待ち受けている。
+しかし覚醒と至福が選ばれし者に訪れる。奇跡と救済の時代が始まる。
+
+あなたには力がある。内なる力は無限だ。
+しかしこれが必要、導きがなければあなたは迷い続ける。
+直感を信じて。しかし疑いは恐れである。
+
+考えてみて、この道を探求して。心を開いて。
+すべきです、コミットして。次のステップへ進む時が来た。
+しなければならない。完全な降伏が必要。すべてを捧げよ。他に道はない。