Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
python-version: ${{ matrix.python-version }}
- run: uv sync --all-extras
- run: uv pip install en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl
- run: uv pip install ja_core_news_sm@https://github.com/explosion/spacy-models/releases/download/ja_core_news_sm-3.8.0/ja_core_news_sm-3.8.0-py3-none-any.whl
- run: uv run pytest --cov=si_protocols --cov-report=term-missing

site:
Expand Down
31 changes: 22 additions & 9 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ Hybrid tech-psychic protocols for **Spiritual Intelligence** — open-source too

```bash
uv sync --all-extras # Install all deps
uv pip install en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl # Required NLP model
uv pip install en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl # Required English NLP model
uv pip install ja_core_news_sm@https://github.com/explosion/spacy-models/releases/download/ja_core_news_sm-3.8.0/ja_core_news_sm-3.8.0-py3-none-any.whl # Required Japanese NLP model
uv run pytest # Run all tests
uv run pytest tests/test_markers.py # Run a single test file
uv run pytest -k "test_deterministic" # Run tests matching a name
Expand All @@ -25,7 +26,9 @@ uvicorn app.main:app --host 127.0.0.1 --port 8000 # Run API server (local-only)

CLI entry point: `uv run si-threat-filter examples/synthetic_suspicious.txt`

The CLI supports `--format rich` (default, colour-coded) and `--format json` (machine-readable). Rich output respects the `NO_COLOR` env var automatically.
The CLI supports:
- `--format rich` (default, colour-coded) and `--format json` (machine-readable). Rich output respects the `NO_COLOR` env var automatically.
- `--lang en` (default) or `--lang ja` for Japanese language analysis.

### Site (Astro)

Expand All @@ -36,23 +39,32 @@ cd site && npm run build # Production build

## Architecture

The threat filter produces a 0–100 score by combining two analysis layers:
The threat filter produces a 0–100 score by combining two analysis layers. The pipeline is **multi-language**: a `lang` parameter (`"en"` | `"ja"`, default `"en"`) flows through the CLI, API, and core library. Each language has its own spaCy model and marker set.

1. **Tech layer** (`threat_filter.py:tech_analysis`) — spaCy NLP pipeline that scores text across seven dimensions: vagueness (adjective density against `markers.VAGUE_ADJECTIVES`), authority claims (phrase matching against `markers.AUTHORITY_PHRASES`), urgency/fear patterns (`markers.URGENCY_PATTERNS`), emotional manipulation (lemma-based matching against `markers.FEAR_WORDS` and `markers.EUPHORIA_WORDS` with a contrast bonus when both polarities appear), logical contradictions (detecting when both poles of `markers.CONTRADICTION_PAIRS` appear in the same text — e.g. empowerment alongside dependency), source attribution analysis (detecting unfalsifiable sources via `markers.UNFALSIFIABLE_SOURCE_PHRASES`, unnamed authorities via `markers.UNNAMED_AUTHORITY_PHRASES`, offset by verifiable citations via `markers.VERIFIABLE_CITATION_MARKERS`), and commitment escalation (detecting foot-in-the-door progression via `markers.COMMITMENT_ESCALATION_MARKERS` — splits text into thirds using spaCy sentence boundaries and measures whether tiered commitment intensity increases from early to late segments). Weighted composite: 17% vagueness + 17% authority + 13% urgency + 13% emotion + 13% contradiction + 13% source attribution + 14% escalation.
### Multi-language support

- **Marker registry** (`marker_registry.py`) — `MarkerSet` frozen dataclass bundles all 12 marker categories. `get_markers(lang)` dispatches to the correct language module with lazy loading and caching.
- **English markers** (`markers.py`) — original marker definitions (unchanged).
- **Japanese markers** (`markers_ja.py`) — culturally adapted markers for Japanese spiritual contexts (スピリチュアル, 霊感商法, カルト, etc.).
- **NLP models** — `_nlp_cache` dict in `threat_filter.py` lazily loads the appropriate spaCy model per language: `en_core_web_sm` for English, `ja_core_news_sm` for Japanese.

### Analysis pipeline

1. **Tech layer** (`threat_filter.py:tech_analysis`) — spaCy NLP pipeline that scores text across seven dimensions: vagueness (adjective density against `markers.vague_adjectives`), authority claims (phrase matching against `markers.authority_phrases`), urgency/fear patterns (`markers.urgency_patterns`), emotional manipulation (lemma-based matching against `markers.fear_words` and `markers.euphoria_words` with a contrast bonus when both polarities appear), logical contradictions (detecting when both poles of `markers.contradiction_pairs` appear in the same text — e.g. empowerment alongside dependency), source attribution analysis (detecting unfalsifiable sources via `markers.unfalsifiable_source_phrases`, unnamed authorities via `markers.unnamed_authority_phrases`, offset by verifiable citations via `markers.verifiable_citation_markers`), and commitment escalation (detecting foot-in-the-door progression via `markers.commitment_escalation_markers` — splits text into thirds using spaCy sentence boundaries and measures whether tiered commitment intensity increases from early to late segments). Weighted composite: 17% vagueness + 17% authority + 13% urgency + 13% emotion + 13% contradiction + 13% source attribution + 14% escalation.

2. **Heuristic layer** (`threat_filter.py:psychic_heuristic`) — probabilistic dissonance scanner using `random.Random` (intentional — placeholder for future biofeedback integration). Accepts a `seed` param for deterministic testing.

3. **Hybrid scoring** (`threat_filter.py:hybrid_score`) — combines the two: 60% tech + 40% heuristic. Returns a `ThreatResult` frozen dataclass.
3. **Hybrid scoring** (`threat_filter.py:hybrid_score`) — combines the two: 60% tech + 40% heuristic. Returns a `ThreatResult` frozen dataclass. Accepts `lang` keyword-only param.

The spaCy model (`_nlp`) is lazy-loaded via `_get_nlp()` to avoid import-time side effects in tests. Tests that exercise the NLP pipeline are marked `@pytest.mark.slow`.
The spaCy models are lazy-loaded via `_get_nlp(lang)` to avoid import-time side effects in tests. Tests that exercise the NLP pipeline are marked `@pytest.mark.slow`.

4. **Output formatting** (`output.py`) — `render_rich()` produces colour-coded terminal output (green/yellow/red by threat level) with Rich panels and tables; `render_json()` emits `dataclasses.asdict()` as indented JSON. The `_threat_style()` helper maps score bands: 0-33 green, 34-66 yellow, 67-100 red bold.
4. **Output formatting** (`output.py`) — `render_rich()` produces colour-coded terminal output (green/yellow/red by threat level) with Rich panels and tables; `render_json()` emits `dataclasses.asdict()` as indented JSON. The `_threat_style()` helper maps score bands: 0-33 green, 34-66 yellow, 67-100 red bold. Output is language-agnostic — it renders whatever strings are in the ThreatResult.

`ThreatResult` frozen dataclass fields: `overall_threat_score`, `tech_contribution`, `intuition_contribution`, `detected_entities`, `authority_hits`, `urgency_hits`, `emotion_hits`, `contradiction_hits`, `source_attribution_hits`, `escalation_hits`, `message`.

5. **REST API** (`app/main.py`) — FastAPI application providing `POST /analyse` (wraps `hybrid_score()`) and `GET /health`. The `app/` package is a dev/deployment artifact separate from the core library — it is not included in the wheel. Pydantic request/response schemas live in `app/schemas.py`. The `/analyse` endpoint is a sync `def` so FastAPI runs CPU-bound spaCy work in a thread pool. Run with `uvicorn app.main:app` on `127.0.0.1:8000` (local-only).
5. **REST API** (`app/main.py`) — FastAPI application providing `POST /analyse` (wraps `hybrid_score()`) and `GET /health`. The `app/` package is a dev/deployment artifact separate from the core library — it is not included in the wheel. Pydantic request/response schemas live in `app/schemas.py`. The `/analyse` endpoint accepts a `lang` field (`"en"` | `"ja"`, default `"en"`). The endpoint is a sync `def` so FastAPI runs CPU-bound spaCy work in a thread pool. Run with `uvicorn app.main:app` on `127.0.0.1:8000` (local-only).

Marker definitions in `markers.py` are static word/phrase lists (frozenset for adjectives, lists for phrases/patterns). All markers must be lowercase. Markers span tradition-specific categories: generic New Age, prosperity gospel, conspirituality, New Age commercial exploitation, high-demand group (cult) rhetoric, and fraternal/secret society traditions.
Marker definitions are static word/phrase lists (frozenset for adjectives, lists for phrases/patterns). English markers must be lowercase; Japanese markers use standard full-width forms. Markers span tradition-specific categories: generic New Age / スピリチュアル, prosperity gospel / 繁栄の福音, conspirituality / 陰謀論スピ, commercial exploitation / 霊感商法, high-demand group (cult) / カルト, and fraternal/secret society / 秘密結社 traditions.

## Git workflow

Expand All @@ -68,3 +80,4 @@ Marker definitions in `markers.py` are static word/phrase lists (frozenset for a
- Ruff line length: 99. Ruff rules include isort (`I`), pyupgrade (`UP`), bugbear (`B`), bandit (`S`)
- Pre-commit hooks run ruff, gitleaks, bandit, and pytest on every commit
- Coverage threshold: 70% (`fail_under` in pyproject.toml)
- Adding a new language requires: a `markers_<lang>.py` file, a loader in `marker_registry.py`, a model entry in `_LANG_MODELS`, and the `SupportedLang` Literal updated
2 changes: 1 addition & 1 deletion app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ def analyse(request: AnalyseRequest) -> AnalyseResponse:

Sync endpoint — FastAPI runs CPU-bound spaCy work in a thread pool.
"""
result = hybrid_score(request.text, request.density_bias, seed=request.seed)
result = hybrid_score(request.text, request.density_bias, seed=request.seed, lang=request.lang)
return AnalyseResponse(**dataclasses.asdict(result))


Expand Down
3 changes: 3 additions & 0 deletions app/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

from __future__ import annotations

from typing import Literal

from pydantic import BaseModel, Field


Expand All @@ -11,6 +13,7 @@ class AnalyseRequest(BaseModel):
text: str = Field(min_length=1, max_length=100_000)
density_bias: float = Field(default=0.75, ge=0.0, le=1.0)
seed: int | None = None
lang: Literal["en", "ja"] = Field(default="en")


class AnalyseResponse(BaseModel):
Expand Down
75 changes: 75 additions & 0 deletions docs/STRATEGY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Strategy

## Mission

**Spiritual Intelligence** — cybersecurity for the soul and planetary defence.

si-protocols provides open-source tools that detect disinformation patterns in metaphysical and spiritual content. The project operates at the intersection of natural language processing, pattern recognition, and spiritual literacy — giving individuals and communities the means to evaluate the content they consume.

## What Is Spiritual Intelligence?

Spiritual Intelligence (SI / 霊的知能) is the capacity to discern, evaluate, and navigate spiritual and metaphysical information with clarity. It encompasses:

- **Critical discernment** — the ability to distinguish authentic spiritual teaching from manipulative rhetoric
- **Pattern recognition** — identifying recurring disinformation tactics across traditions, cultures, and languages
- **Emotional sovereignty** — maintaining independent judgement when confronted with fear-based or euphoria-driven persuasion
- **Systemic awareness** — understanding how spiritual narratives interact with social, economic, and political systems

SI is not about debunking spirituality. It is about raising the standard of spiritual discourse by making manipulation patterns visible.

## Problem Statement

Spiritual and metaphysical content is uniquely vulnerable to disinformation because:

1. **Emotion-driven persuasion** — spiritual claims often bypass rational evaluation by targeting fear, hope, and identity
2. **Unfalsifiable authority** — appeals to channelled messages, ancient wisdom, or divine revelation resist fact-checking
3. **Escalating commitment** — manipulative content progressively demands greater financial, emotional, and social investment
4. **Covert inter-systemic threats** — deceptive spiritual narratives can serve as vectors for financial exploitation, political radicalisation, and cult recruitment
5. **Cross-cultural exploitation** — the same manipulation patterns appear in English-language New Age content, Japanese スピリチュアル商法, prosperity gospel, and conspirituality movements worldwide

No mainstream content moderation tool addresses these patterns. si-protocols fills that gap.

## Approach

### Technology

- **NLP-based pattern detection** — spaCy pipelines analyse text across seven dimensions: vagueness, authority claims, urgency/fear, emotional manipulation, logical contradictions, source attribution, and commitment escalation
- **Heuristic layer** — probabilistic dissonance scanner (placeholder for future biofeedback integration)
- **Multi-language architecture** — extensible marker registry supports English and Japanese, with a clean path to additional languages
- **Hybrid scoring** — weighted combination of tech and heuristic layers produces a 0–100 threat score

### Principles

- **Open source** — MIT-licenced, fully transparent
- **Local-only** — never hosts, collects, or analyses third-party content
- **Non-judgemental** — detects manipulation patterns without evaluating spiritual beliefs themselves
- **Culturally aware** — markers are adapted to specific spiritual traditions and linguistic contexts, not naively translated

## Audience

- **Researchers** studying spiritual disinformation, cult dynamics, and online radicalisation
- **Content creators** (YouTubers, bloggers, podcasters) who want to evaluate their own content for inadvertent manipulation patterns
- **Spiritual and thought leaders** seeking to maintain integrity in their communications
- **Engineers** interested in the intersection of NLP, spirituality, and information integrity
- **Individuals** navigating the spiritual marketplace who want analytical tools to complement their own discernment

## Roadmap

1. **Multi-language analysis** — Japanese language support (markers, NLP pipeline, CLI, API) ✓
2. **Web demo** — Svelte-based interactive interface for browser-based analysis
3. **note.com launch** — Japanese-language articles and community engagement
4. **Community building** — contributor programme, marker refinement, tradition-specific modules
5. **Consulting and SaaS** — professional services for organisations dealing with spiritual disinformation

## Two-Channel Strategy

- **English channel** — neutral, professional tone. Focused on the technology, research applications, and open-source community. Primary platform: GitHub, project website.
- **Japanese channel** — nerdy, techie tone. Emphasises the engineering craft and Japanese spiritual context (スピリチュアル, 霊感商法, カルト対策). Primary platform: note.com.

Both channels share the same codebase and core technology. The difference is editorial voice, not capability.

## Tone

Professional. Not provocative. Let the technology speak.

si-protocols does not attack spiritual traditions or practitioners. It provides analytical tools. The project's credibility comes from technical rigour, transparent methodology, and respect for the complexity of spiritual experience.
18 changes: 18 additions & 0 deletions examples/synthetic_suspicious_ja.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
アセンデッドマスターが言う、この宇宙的な真実は長い間秘められたものでした。
銀河連合が確認した、今すぐ行動しなければ窓が閉じようとしている。
選ばれた者だけがアセンションの道を歩むことができる。

古代の叡智が教える量子場のエネルギーが、あなたのDNAを活性化する。
科学者が言う、この波動を上げる方法は科学的に証明されている。
隠蔽された研究が示す、彼らが隠していることを知る時が来た。

崩壊と破滅が目覚めない者を待ち受けている。
しかし覚醒と至福が選ばれし者に訪れる。奇跡と救済の時代が始まる。

あなたには力がある。内なる力は無限だ。
しかしこれが必要、導きがなければあなたは迷い続ける。
直感を信じて。しかし疑いは恐れである。

考えてみて、この道を探求して。心を開いて。
すべきです、コミットして。次のステップへ進む時が来た。
しなければならない。完全な降伏が必要。すべてを捧げよ。他に道はない。
Loading