Z-Gap — Beyond the Chomsky Wall

Research Program: 3 (Representation, Language, and Cultural Cognition) Status: Reproducible artifact (Zenodo DOI preprint) — not submitted to any venue; target: TACL Relationship to other work: Anchor of Program 3 (companions: macaronic, third-vertex-llm, habitus)

Z-Gap — Beyond the Chomsky Wall

The Platonic Representation Hypothesis (PRH) claims that networks trained on different modalities converge toward a shared latent $Z$. This paper accepts PRH and asks the next question: does convergence imply communicability? We argue $Z$ is stratified — $Z_{\text{sem}}$ (what is computed) converges cross-culturally, while $Z_{\text{proc}}$ (derivation path) and $Z_{\text{prag}}$ (communicative frame) remain culturally mediated — so existence and communicability are distinct properties. A pilot across 5 languages × 100 operations shows P2 (cross-lingual NL-NL invariance) failing at the description level even as NL-code alignment succeeds: convergence without communicability.

Currently implemented

paper/main.tex — canonical manuscript (ACL-styled)
paper/references.bib — shared bibliography
submissions/emnlp-2026/main.tex, submissions/colm-2026/main.tex — legacy frozen venue snapshots (not the active target; retained for provenance)
experiments/ — reproducible pilot: 100 stimuli (50 computational + 50 judgment) × 5 languages (~1,800 inputs incl. variants), embedded through 7 models (UniXcoder, MiniLM-L12, Nomic v1.5, E5-small/base/large, BGE-M3), pinned by HuggingFace revision SHA in experiments/src/model_registry.py. Results: NL-code alignment 35/35 tier-1 + 35/35 OOD (tier-2/3); cross-lingual P3 probing across 7 models (model-class dependent); P7 spacing/punctuation robustness. P1/P2 honestly reported as not-supported / failed-and-reinterpreted
planning/ — TODO, decisions log, review notes, P2-strategy audit

Planned

Target venue: TACL (journal, OpenReview, rolling; not yet submitted). Rationale in planning/decisions.md 2026-06-03
CodeSage-Large-v2 as a modern code-trained robustness model (closes the single-code-trained-model gap before submission)
Real cross-dialect evaluation via MADAR / NADI (Arabic) + AI Hub (Korean) corpora, replacing the retracted within-English dialect probe — see planning/decisions.md 2026-06-03
Native-speaker validation of the 5-language stimulus set (camera-ready)
Reconcile content drift between paper/main.tex and the frozen submissions/*/main.tex snapshots manually

Design intent

DDD-style layout (paper/ canonical, submissions/<venue>/ frozen snapshots): forces editorial drift between venues to be explicit rather than silently mutating one shared file. Rationale in planning/decisions.md 2026-04-19 entry.
experiments/scripts/ vs experiments/src/ is a library-vs-entry-point split, not a version distinction.
5 languages × 100 ops is sized as a pilot, not a benchmark: enough to show the qualitative P2 break, small enough to remain reproducible end-to-end on a single machine.
Z stratification is the load-bearing theoretical move — it lets PRH stay true while explaining why two systems sharing $Z$ can still fail to communicate.

Non-goals

Refuting PRH. The paper refines it, not against it.
A single auto-synced manuscript across venues. Venue snapshots are intentionally frozen.
A general theory of "communicability" for arbitrary modalities — scope is NL ↔ code.
Claims about subjective experience or consciousness from representational similarity. The neuroscience parallel is by analogy only.

Redacted

(none — this repo carries no external persons, tokens, or third-party identifiers)

Reproduce the pilot

cd experiments
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env   # OpenAI + Mistral keys
python scripts/run_all.py

See experiments/README.md for model list and prediction-to-script mapping.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
experiments		experiments
literature		literature
paper		paper
planning		planning
submissions		submissions
.gitignore		.gitignore
.python-version		.python-version
.zenodo.json		.zenodo.json
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Z-Gap — Beyond the Chomsky Wall

Currently implemented

Planned

Design intent

Non-goals

Redacted

Reproduce the pilot

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Z-Gap — Beyond the Chomsky Wall

Currently implemented

Planned

Design intent

Non-goals

Redacted

Reproduce the pilot

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages