From 8c9675f495a3abe29d2fecfde374420d467a4a49 Mon Sep 17 00:00:00 2001 From: dancinlife Date: Thu, 4 Jun 2026 14:28:03 +0900 Subject: [PATCH] =?UTF-8?q?spec(kosmos/2.1):=202.0=20deferrals=20=EC=A0=84?= =?UTF-8?q?=EB=B6=80=20landed=20=EA=B8=B0=EB=A1=9D=20+=20LSP=20@corpus=20b?= =?UTF-8?q?yte-parity=20=EB=B3=B5=EC=9B=90?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit kosmos 저장소를 coherent release-ready 상태로 마감. 문법(§1–§6)은 byte-무변경 — spec/format/tooling 정합성만 정직하게 정리. kosmos/2.1 (grammar-identical minor): - spec/kosmos.md §8에 `kosmos/2.1` 항목 append (§8 append-only 준수, 2.0 항목 불변) + SPEC VERSION 헤더 2.0→2.1, README 뱃지 lockstep. kosmos/2.0이 "2.x minors로 deferred"라 적어둔 4개 layer — ① `.limen` 포맷+코덱(spec/limen.md, impl/limen.hexa, 14/14 self-test) ② merkle 구성(spec/limen.md §3) ③ HF export(tool/corpus_to_hf.hexa, docs/hf_export.md) ④ LSP/tree-sitter `@corpus` 인식 — 전부 landed 상태를 정직하게 기록. deprecated `.py` LSP @corpus 동기화 (byte-parity 복원): - canonical lsp/kosmos_lsp.hexa는 `@corpus` 최상위 entry를 인식하나 라이브 에디터 stdio 서버 lsp/kosmos_lsp.py는 미인식 → 유효한 @corpus 파일을 "exactly one @anchor"로 오탐하고 진단 5건이 drift. `.py` validate()/hover()를 `.hexa`에 맞춰 @corpus top-level + nested member + corpus meta + corpus hover 미러. PARITY_VERIFY.md 갱신. stale 참조 교정: - .kanchors→.limen rename(#14)이 놓친 lsp/kosmos_lsp.hexa member hover 1건 → `*.limen`. repo 전체 kanchors 잔존 0. verify (verbatim): - LSP py↔hexa parity: 26/26 byte-equal (11 anchors + 11 fixtures + 4 examples, @corpus example exit 0 포함) — stdout+exit 동일. - limen 코덱 self-test: 14/14 PASS (FIPS SHA-256+CRC vectors, round-trip, tamper, merkle edge, disk I/O). - HF export: examples/04 → datasets manifest emit exit 0. - residual `kanchors`: 0. 남은 정직한 ⏳ (날조 없음, 본 마감 범위 밖): - corpus-of-shards merkle root (모든 shard 위 root) = spec/limen.md §4 future-minor. - anima-emergence-trace profile = draft (Phase B observability fire 대기, 명시됨). - tree-sitter CLI 로컬 미설치 → grammar.js의 @corpus rule 존재만 정적 확인(파싱 미실행). - 실제 .limen shard 미동봉(scale corpus 부재) — example는 placeholder sha256 pointer. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 18 +++++++ README.md | 7 +-- lsp/PARITY_VERIFY.md | 42 ++++++++++----- lsp/kosmos_lsp.hexa | 2 +- lsp/kosmos_lsp.py | 120 ++++++++++++++++++++++++++++++++++--------- spec/kosmos.md | 10 +++- spec/limen.md | 2 +- 7 files changed, 160 insertions(+), 41 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 58de882..3a2bc9e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,24 @@ For the full audit trail, see `git log`. --- +## 2026-06-04 + +- **`kosmos/2.1` — `kosmos/2.0` 미뤄둔 4개 layer 전부 LANDED 기록 (minor · grammar 무변경)** — + `spec/kosmos.md` §8에 `kosmos/2.1` 항목 추가 + `SPEC VERSION` 헤더/README 뱃지를 + `2.0`→`2.1`로 lockstep 갱신. `kosmos/2.0` 항목이 "2.x minors로 deferred"라 적어둔 + ① `.limen` packed-shard 바이너리 포맷(`spec/limen.md` + `impl/limen.hexa` 코덱, 14/14 self-test) + ② merkle 트리 구성(`spec/limen.md` §3) ③ HF-dataset export(`tool/corpus_to_hf.hexa` + `docs/hf_export.md`) + ④ LSP/tree-sitter `@corpus` 인식 — 네 가지가 모두 구현 완료된 상태를 §8에 정직하게 기록. + 문법(§1–§6)은 byte-동일 — 2.0 파일 = 2.1 파일. +- **deprecated `.py` LSP `@corpus` 동기화 — byte-parity 복원.** canonical `lsp/kosmos_lsp.hexa`는 + kosmos/2.0 `@corpus` 최상위 entry를 인식하는데 deprecated `lsp/kosmos_lsp.py`(라이브 에디터 stdio + 서버)는 인식하지 못해 유효한 `@corpus` 파일을 "must contain exactly one @anchor entry"로 **오탐**하고 + 진단 문자열 5개가 drift된 상태였음. `.py`의 `validate()`/`hover()`를 `.hexa`에 맞춰 `@corpus` + top-level + nested member + corpus meta field + corpus hover까지 미러 → `lsp/PARITY_VERIFY.md` + **26/26 파일 byte-equal**(stdout+exit) 복원, `@corpus` example exit 0 확인. +- **stale `.kanchors` 참조 1건 수정.** `.kanchors`→`.limen` rename(#14)이 놓친 canonical + `lsp/kosmos_lsp.hexa` `member` hover 문자열을 `*.limen`으로 교정 (repo 전체 `kanchors` 잔존 0). + ## 2026-05-31 - **`.limen` reference codec landed** (`impl/limen.hexa`) — the pure-hexa pack/unpack diff --git a/README.md b/README.md index dba7c79..972864f 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@

License - Spec + Spec Entry-types Payload-forms Sibling @@ -135,8 +135,9 @@ interactive stdio JSON-RPC server still runs `lsp/kosmos_lsp.py` (**DEPRECATED**, retained only because hexa 0.1.0-dispatch has no incremental raw N-byte stdin read for LSP Content-Length framing on a live editor pipe — see the `.hexa` header). The hexa `validate()` / `hover()` are byte-parity -with the `.py` over 22 corpus files + 45 hover lines -(`lsp/PARITY_VERIFY.log`). +with the `.py` over 26 files (11 anchors + 11 fixtures + 4 examples, +incl. the kosmos/2.0 `@corpus` example) + 45 hover lines +(`lsp/PARITY_VERIFY.md`). Both recognise the `@corpus` top-level entry. Verified against all clean anchors (0 errors) and broken fixtures (`lsp/test_fixtures/`). A Claude Code plugin can wire it via `.lsp.json`: diff --git a/lsp/PARITY_VERIFY.md b/lsp/PARITY_VERIFY.md index c0407b5..d2ac1ed 100644 --- a/lsp/PARITY_VERIFY.md +++ b/lsp/PARITY_VERIFY.md @@ -1,13 +1,17 @@ ================================================================================ kosmos LSP — hexa-native port PARITY VERIFICATION (G2) ================================================================================ - reference : lsp/kosmos_lsp.py (Python 3.9.6, DEPRECATED — kept for live LSP) + reference : lsp/kosmos_lsp.py (Python 3.x, DEPRECATED — kept for live LSP) port : lsp/kosmos_lsp.hexa (hexa 0.1.0-dispatch — canonical --check) - date (UTC): 2026-05-25T18:25:08Z - git rev : 1690345 (branch migrate/anima-kosmos-assets-2026-05-25) + date (UTC): 2026-06-04 (re-verified after kosmos/2.0 @corpus parity sync) + git rev : lane/kosmos-finalize (off origin/main b329901) method : for every file, compare `python3 lsp/kosmos_lsp.py --check F` against `hexa run --no-sentinel lsp/kosmos_lsp.hexa --check F` on BOTH stdout (verbatim diagnostic lines) AND exit code. + note : 2026-06-04 — the canonical .hexa learned the kosmos/2.0 @corpus + top-level entry (§5.6); the DEPRECATED .py is now synced to it so + both accept a @corpus file and emit identical "top-level entry + (@anchor or @corpus)" diagnostics. 26/26 files byte-equal. ================================================================================ ── FALSIFIER 1 — clean anchors must yield 0 error (exit 0) ───────────────────── @@ -24,17 +28,29 @@ PASS anchors/anima/knuth_095_unity.kosmos (exit=0, 0 diag) PASS anchors/anima/knuth_100_big_bang.kosmos (exit=0, 0 diag) ── FALSIFIER 2 — deliberately broken anchors must flag + exit 1 ──────────────── -PASS broken_two_anchor.kosmos (exit=1) → "exactly one @anchor per file" +PASS broken_two_anchor.kosmos (exit=1) → "exactly one top-level entry per file (@anchor XOR @corpus, kosmos/2.0)" PASS broken_missing_coord.kosmos (exit=0) → hint "missing ... coord" (sev2 = no exit-fail, matches .py) PASS broken_indent_malformed.kosmos (exit=1) → malformed @anchor + bad indent + 2 hints -PASS broken_payload_col0.kosmos (exit=1) → "only @anchor is a column-0 entry" + no-anchor -PASS broken_bom_crlf.kosmos (exit=1) → BOM + bad indent + malformed @payload + no-anchor +PASS broken_payload_col0.kosmos (exit=1) → "only @anchor / @corpus are column-0 entries" + no-top-entry +PASS broken_bom_crlf.kosmos (exit=1) → BOM + bad indent + malformed @payload + no-top-entry (note: CRLF does NOT raise the LF diag under --check — Python text-mode universal-newlines normalize CRLF→LF before validate(); the hexa port replicates this in k_normalize_newlines() so --check stays byte-parity.) ── FALSIFIER 3 — diagnostic PARITY (identical stdout + exit, .py == .hexa) ────── - 16/16 corpus files (11 clean + 5 broken) : stdout verbatim-equal AND exit-equal. + 16/16 fixture files (11 clean + 5 broken) : stdout verbatim-equal AND exit-equal. + +── FALSIFIER 4 — kosmos/2.0 @corpus top-level entry accepted by BOTH ─────────── +PASS examples/01_text_only.kosmos (exit=0) @anchor (1.x) — unchanged +PASS examples/02_multimodal.kosmos (exit=0) @anchor (1.x) — unchanged +PASS examples/03_anima_knuth_077_mandala… (exit=0) @anchor (1.x) — unchanged +PASS examples/04_corpus_clm_byte.kosmos (exit=0) @corpus (2.0) — nested @anchor + members + member=ref shards + accepted; pre-sync the .py + false-rejected this file + ("must contain exactly one + @anchor entry"), now fixed. + 4/4 example files : stdout verbatim-equal AND exit-equal (.py == .hexa). ── EXTRA edge-case fixtures (regex-port faithfulness, not coincidental exit) ──── PASS edge_heredoc_edges.kosmos (exit=0) heredoc body skipped (inner @anchor not counted); @@ -50,9 +66,11 @@ PASS edge_payload_malformed.kosmos (exit=1) `@payload := ...` (no modality) + 45/45 lines of knuth_077_mandala.kosmos : hover string verbatim-equal. ================================================================================ - RESULT: 22/22 --check files + 6/6 edge files + 45/45 hover lines → ALL PASS - validate(text) + hover(line) + --check FILE are byte-parity with the - reference .py. exit codes identical (error→1, hint-only/clean→0). + RESULT: 26/26 --check files (11 anchors + 11 fixtures + 4 examples, + incl. the kosmos/2.0 @corpus example) + 45/45 hover lines → ALL PASS + validate(text) + hover(line) + --check FILE are byte-parity between + the canonical .hexa and the deprecated .py. exit codes identical + (error→1, hint-only/clean→0). SCOPE : stdio JSON-RPC server NOT ported — hexa 0.1.0-dispatch has no incremental raw N-byte stdin read for Content-Length framing on a live (never-EOF) editor pipe. See kosmos_lsp.hexa header HONEST TODO. @@ -62,8 +80,8 @@ PASS edge_payload_malformed.kosmos (exit=1) `@payload := ...` (no modality) + ================================================================================ ── reproduce ─────────────────────────────────────────────────────────────────── - for f in anchors/anima/*.kosmos lsp/test_fixtures/*.kosmos; do + for f in anchors/anima/*.kosmos lsp/test_fixtures/*.kosmos examples/*.kosmos; do py=$(python3 lsp/kosmos_lsp.py --check "$f" 2>&1); pr=$? - hx=$(/Users/ghost/.hx/bin/hexa run --no-sentinel lsp/kosmos_lsp.hexa --check "$f" 2>&1); hr=$? + hx=$(hexa run --no-sentinel lsp/kosmos_lsp.hexa --check "$f" 2>&1); hr=$? [ "$py" = "$hx" ] && [ "$pr" = "$hr" ] && echo "PASS $f" || echo "FAIL $f" done diff --git a/lsp/kosmos_lsp.hexa b/lsp/kosmos_lsp.hexa index df8d1ba..6403f7f 100644 --- a/lsp/kosmos_lsp.hexa +++ b/lsp/kosmos_lsp.hexa @@ -576,7 +576,7 @@ fn k_hover(line: string) -> string { } if k_starts_with(s, "anchor_level") { return "**anchor_level** — member granularity: sample | topic | 2tier (default 2tier)" } if k_starts_with(s, "lane_mix") { return "**lane_mix** — per-lane mixing fractions, Σ = 1.0 (e.g. \"web=0.8, register=0.2\")" } - if k_starts_with(s, "member") { return "**member** — a packed anchor-pack shard: `member = ref \"*.kanchors\" sha256=… frac=…`" } + if k_starts_with(s, "member") { return "**member** — a packed anchor-pack shard: `member = ref \"*.limen\" sha256=… frac=…`" } if k_starts_with(s, "closed_corpus"){ return "**closed_corpus** — corpus integrity (Σ frac = 1.0 ∧ ∀ member sha256 verified)" } if k_starts_with(s, "coord") { return "**coord** — anchor placement (float vector, dim ≥ 1)" } if k_starts_with(s, "lane") { return "**lane** — partition / lane id (quoted string)" } diff --git a/lsp/kosmos_lsp.py b/lsp/kosmos_lsp.py index 0daafb7..a2e1a81 100755 --- a/lsp/kosmos_lsp.py +++ b/lsp/kosmos_lsp.py @@ -1,10 +1,20 @@ #!/usr/bin/env python3 -# kosmos-lsp — canonical LSP server for the .kosmos manifest grammar. +# kosmos-lsp — LSP server for the .kosmos manifest grammar (DEPRECATED). +# +# The CANONICAL linter is the hexa-native lsp/kosmos_lsp.hexa (project.tape +# @D k_hexa_native). This .py is retained ONLY as the live-editor stdio +# JSON-RPC server, because hexa 0.1.0-dispatch has no incremental raw N-byte +# stdin read for LSP Content-Length framing on a never-EOF editor pipe. The +# `--check` (CI) path routes to the .hexa via bin/kosmos-lsp; this file backs +# only interactive sessions. validate()/hover() are kept byte-parity with the +# canonical .hexa (see lsp/PARITY_VERIFY.md). +# # Stdio JSON-RPC, zero deps (Python 3.8+). Spec-grounded diagnostics + # hover from spec/kosmos.md. .kosmos is a superset of tape v1.2 adding -# exactly two entry types: @anchor (exactly one per file, at top) and -# @payload (2-space body). Required placement triple: coord/lane/radius. -# Conservative — missing optional structure is a hint (sev 2). +# three entry types (kosmos/2.0): @anchor / @corpus (exactly one top-level +# entry per file, at column 0) and @payload (2-space body). Required +# placement triple: coord/lane/radius. Conservative — missing optional +# structure is a hint (sev 2). # # kosmos-lsp # speak LSP on stdin/stdout # kosmos-lsp --check FILE # one-shot lint (exit 1 on any error) @@ -14,6 +24,13 @@ ANCHOR = re.compile( r'^@anchor\s+\S+\s*:=\s*".*?"\s*::\s*\S+\s*(?:\[[^\]\n]*\])?\s*$') +# @corpus has the same header shape as @anchor (kosmos/2.0 §5.6); the kind +# token \S+ accepts kosmos-corpus. ANCHOR_BODY matches a nested @anchor +# member's header at any indent (validated against the column-0-stripped form). +CORPUS = re.compile( + r'^@corpus\s+\S+\s*:=\s*".*?"\s*::\s*\S+\s*(?:\[[^\]\n]*\])?\s*$') +ANCHOR_BODY = re.compile( + r'^@anchor\s+\S+\s*:=\s*".*?"\s*::\s*\S+\s*(?:\[[^\]\n]*\])?\s*$') PAYLOAD = re.compile(r"^@payload\s+\S+\s*:=") KV = re.compile(r"^\S+\s*=") EDGES = ("<-", "->", "=>", "==", "~>", "|>", "!!", @@ -35,7 +52,8 @@ def validate(text): if "\r" in text: out.append(diag(0, 0, 1, "byte-canonical: LF line endings only")) lines = text.split("\n") - anchors, anchor_ln = 0, -1 + anchors, corpora, anchor_ln = 0, 0, -1 + corpus_mode = False # True once a @corpus top-level entry is seen seen = set() heredoc = None for i, raw in enumerate(lines): @@ -48,23 +66,60 @@ def validate(text): if t == "" or t.startswith("#"): continue indent = len(s) - len(s.lstrip(" ")) - if s.startswith("@anchor"): - anchors += 1 - anchor_ln = i - if indent: + if indent == 0: + # column-0 = the single top-level entry: @anchor XOR @corpus + if t.startswith("@anchor"): + anchors += 1 + anchor_ln = i + if not ANCHOR.match(s): + out.append(diag(i, 0, len(s), + 'malformed @anchor — expected `@anchor := ' + '"" :: kosmos-anchor [tier= active]`')) + elif t.startswith("@corpus"): + corpora += 1 + corpus_mode = True + anchor_ln = i + if not CORPUS.match(s): + out.append(diag(i, 0, len(s), + 'malformed @corpus — expected `@corpus := ' + '"" :: kosmos-corpus [tier= active]`')) + else: out.append(diag(i, 0, len(s), - "@anchor must be at column 0")) - elif not ANCHOR.match(s): - out.append(diag(i, 0, len(s), - 'malformed @anchor — expected `@anchor := ' - '"" :: kosmos-anchor [tier= active]`')) + "only @anchor / @corpus are column-0 entries; @payload " + "and fields are indented body lines")) continue - if s.startswith("@"): - out.append(diag(i, 0, len(s), - "only @anchor is a column-0 entry; @payload and fields " - "are 2-space body lines")) + # indented body line (indent >= 2) + if corpus_mode: + # §5.6 corpus body: nested @anchor members / member refs / corpus + # fields / member payloads — accepted at any indent >= 2. + m = HEREDOC.search(t) + if m: + heredoc = m.group(1) + continue + if t.startswith("@anchor"): + if not ANCHOR_BODY.match(t): + out.append(diag(i, indent, len(s), + 'malformed @corpus member — expected `@anchor ' + ':= "" :: kosmos-anchor [...]`')) + continue + if t.startswith("@payload"): + if not PAYLOAD.match(t): + out.append(diag(i, indent, len(s), + "malformed @payload — expected " + "`@payload := ...`")) + continue + key = t.split("=", 1)[0].strip() if "=" in t else "" + if key in TRIPLE: + seen.add(key) + tok = t.split(" ", 1)[0] + if not (tok in EDGES or t.startswith('"') or t.startswith("`") + or KV.match(t)): + out.append(diag(i, indent, len(s), + "unrecognised @corpus body — member @anchor/@payload, " + "`key = val` (anchor_level/count/lane_mix/vocab/encoding/" + "merkle/member/closed_corpus), edge, or \"prose\"", sev=2)) continue - # body line + # @anchor body — strict 2-space (legacy 1.x path, unchanged) if indent != 2: out.append(diag(i, 0, len(s), "body line indented exactly 2 spaces (under @anchor)")) @@ -88,12 +143,15 @@ def validate(text): out.append(diag(i, 2, len(s), "unrecognised body form — coord/lane/radius/tier/tags, " "an edge, `key = val`, or \"prose\"", sev=2)) - if anchors == 0: + top = anchors + corpora + if top == 0: out.append(diag(0, 0, 1, - "a .kosmos file must contain exactly one @anchor entry")) - elif anchors > 1: + "a .kosmos file must contain exactly one top-level entry " + "(@anchor or @corpus)")) + elif top > 1: out.append(diag(anchor_ln, 0, 1, - "exactly one @anchor per file (one anchor = one file)")) + "exactly one top-level entry per file " + "(@anchor XOR @corpus, kosmos/2.0)")) else: miss = [k for k in TRIPLE if k not in seen] if miss: @@ -105,11 +163,27 @@ def validate(text): def hover(line): + # byte-parity with lsp/kosmos_lsp.hexa k_hover (same order, same strings). s = line.strip() if s.startswith("@anchor"): return "**@anchor** — the one knowledge anchor (placement basin)" + if s.startswith("@corpus"): + return ("**@corpus** — a dataset: an ordered collection of member " + "anchors, itself a meta-anchor (kosmos/2.0 §5.6)") if s.startswith("@payload"): return "**@payload** — one sensory channel into the anchor" + if s.startswith("anchor_level"): + return ("**anchor_level** — member granularity: sample | topic | " + "2tier (default 2tier)") + if s.startswith("lane_mix"): + return ("**lane_mix** — per-lane mixing fractions, Σ = 1.0 (e.g. " + "\"web=0.8, register=0.2\")") + if s.startswith("member"): + return ("**member** — a packed anchor-pack shard: `member = ref " + "\"*.limen\" sha256=… frac=…`") + if s.startswith("closed_corpus"): + return ("**closed_corpus** — corpus integrity (Σ frac = 1.0 ∧ ∀ " + "member sha256 verified)") for k, d in (("coord", "anchor placement (float vector, dim ≥ 1)"), ("lane", "partition / lane id (quoted string)"), ("radius", "influence radius in coord space (float > 0)"), diff --git a/spec/kosmos.md b/spec/kosmos.md index a568c02..1ec01be 100644 --- a/spec/kosmos.md +++ b/spec/kosmos.md @@ -1,9 +1,10 @@ # kosmos.md — `.kosmos` multimodal knowledge-anchor manifest (canonical spec) -> **SPEC VERSION: `kosmos/2.0`** (status: active · 2026-05-30) +> **SPEC VERSION: `kosmos/2.1`** (status: active · 2026-06-04) > This document is the **canonical, versioned grammar** of the `.kosmos` format. It is *substrate-independent*: it defines placement coordinates and sensory payloads abstractly. Concrete field semantics are bound by a **profile** (see `spec/profiles/`). > > **kosmos/2.0** adds the `@corpus` collection entry (§5.6) — a `.kosmos` file is now EITHER one anchor (1.x, unchanged) OR one corpus (a dataset = an ordered collection of member anchors). Backward-compatible for single-anchor files; **major** because the top-level "exactly one `@anchor`" invariant is generalised to "exactly one top-level entry (`@anchor` XOR `@corpus`)". Migration note in §8. +> **kosmos/2.1** (§8) is a **grammar-identical** minor that records the four `kosmos/2.0` deferrals landing — the `.limen` packed-shard format + its reference codec, the merkle construction, the HF-dataset export tool, and LSP/tree-sitter `@corpus` recognition. No grammar delta; a 2.0 file is a 2.1 file and vice-versa. > > Spec changes append a new version entry to §8 (version history) and update the `SPEC VERSION` header. Semver: **major** = incompatible change (migration note required) · **minor** = backward-compatible extension · **patch** = clarification / typo. @@ -433,3 +434,10 @@ This section consolidates parser obligations. "MUST" / "SHOULD" / "MAY" are used - **`closed_corpus` (§5.6.4)** — corpus-level integrity (Σ frac = 1.0 ∧ ∀ ref member sha256 verified ∧ merkle root recomputes). The §4.2 cross-modal rule applies **per member**, not corpus-wide. - **MIGRATION NOTE (1.x → 2.0)**: every `kosmos/1.x` single-anchor file remains valid **unchanged** — it is one column-0 `@anchor`, which 2.0 accepts as one of the two top-level forms. The breaking change is only at the parser/spec contract level: a 1.x parser hard-coding "exactly one `@anchor`" will reject a `@corpus` file (it must learn the `@corpus` top-level form). No existing file is rewritten; no field removed. Producers emitting only `@anchor` need no change. Semver: **major** (top-level invariant generalised). The entry-type count badge moves 2 → 3 (`@anchor`, `@payload`, `@corpus`). - **Out of scope (deferred to 2.x minors)**: the `.limen` packed-shard binary format, the `merkle` tree construction detail, HF-dataset export, and LSP/tree-sitter `@corpus` recognition — each a follow-on layer; this entry defines the *grammar* only. + +### `kosmos/2.1` — 2026-06-04 (the 2.0 deferrals land · backward-compatible · active) +- **No grammar change.** The `.kosmos` grammar (§1–§6) is byte-identical to `kosmos/2.0`; a 2.0 file is a 2.1 file and a 2.1 file is a 2.0 file. This entry only records that the four layers the `kosmos/2.0` entry deferred to "2.x minors" have all **landed** — the format is now fully implemented around its 2.0 grammar, not just specified. Semver: **minor** (companion-layer completion, zero grammar delta). +- **`.limen` packed-shard binary format — SPEC'D + reference codec LANDED.** [`spec/limen.md`](limen.md) defines the wire format (`magic` + LE header + CRC-32 header guard + length-prefixed `@anchor` records + trailing SHA-256 merkle root) the `member = ref "*.limen"` form (§5.6.3 (b)) points at. The pure-hexa reference codec is `impl/limen.hexa` (`limen_pack`/`limen_unpack`/`limen_verify` + byte-array SHA-256 (FIPS 180-4) + CRC-32/IEEE + RFC-6962-style merkle); 14/14 self-test (`impl/test_limen_roundtrip.hexa`: FIPS+CRC vectors, round-trip, tamper, merkle edges, disk I/O). +- **merkle tree construction — SPEC'D.** [`spec/limen.md`](limen.md) §3 fixes the construction (binary tree over per-record leaf hashes · odd-level last-node duplication · `count==0`→SHA-256(empty), `count==1`→leaf). The within-shard root is authoritative now; the corpus-of-shards root (a `merkle` field over *all* shards) stays a future minor (`spec/limen.md` §4 note). +- **HF-dataset export — LANDED.** `tool/corpus_to_hf.hexa` projects a `@corpus` file to a HuggingFace `datasets` manifest (repo_id + card carrying the corpus meta `anchor_level`/`vocab`/`encoding`/`lane_mix`/`count`/`merkle` + a `data_files` array of the `member = ref` shards with `sha256`/`num_rows`/`frac`/`lane`). Docs: [`docs/hf_export.md`](../docs/hf_export.md). +- **LSP / tree-sitter `@corpus` recognition — LANDED.** The canonical hexa-native linter `lsp/kosmos_lsp.hexa` accepts `@corpus` as a column-0 top-level entry, lints nested `@anchor` members + `member = ref` shards + the corpus meta fields, and hovers `@corpus`/`anchor_level`/`lane_mix`/`member`/`closed_corpus`. The `tree-sitter-kosmos/` grammar recognises the `@corpus` rule. The deprecated `.py` editor server is synced to the same `@corpus` behaviour (`lsp/PARITY_VERIFY.md`: 26/26 files byte-equal between `.py` and `.hexa`, incl. the `@corpus` example). `.limen` shard *bytes* stay out of LSP scope (binary, not a `.kosmos` text grammar — `spec/limen.md` §6). diff --git a/spec/limen.md b/spec/limen.md index 853153e..12f3afc 100644 --- a/spec/limen.md +++ b/spec/limen.md @@ -1,6 +1,6 @@ # limen.md — `.limen` packed anchor-harbor binary format (spec) -> **STATUS: spec + reference codec** (codec landed 2026-05-31) · a `kosmos/2.x` follow-on layer. +> **STATUS: spec + reference codec** (codec landed 2026-05-31; folded into `kosmos/2.1`, `spec/kosmos.md` §8) · a `kosmos/2.x` follow-on layer. > This document defines the **binary container** that a `@corpus` member `ref` > points at (`member = ref "shards/web.limen" sha256=… count=… frac=… lane=…`, > see `spec/kosmos.md` §5.6.3 form (b)). The `kosmos/2.0` entry (`spec/kosmos.md`