Skip to content

metaSATOKEN/geometric_phase_extraction

Repository files navigation

Geometric Phase Extraction from Transformer Hidden States

Python 3.10+ License: Apache 2.0 DOI

What is this?

Transformer hidden states have coherent angular (phase-like) structure — but only if you use the right extraction method for the right architecture. This repository contains the paper, code, and data for a systematic investigation across 9 models (110M–2.8B parameters).

The standard signal-processing approach (PCA → bandpass → Hilbert) gives R-bar ≈ 0.12 on Transformers — indistinguishable from noise. We propose a geometric method (PCA → atan2) that achieves R-bar = 0.93–0.98 on Pre-LayerNorm architectures, an 8x improvement.

The key discovery: LayerNorm placement is the controlling variable. GPT-1 (Post-LN) and GPT-2 (Pre-LN) have the same model dimension, same depth, same parameter count — but PCA concentration differs by 6x (16.3% vs 96.3%). The only difference is where LayerNorm sits.

Paper: Geometric Phase Extraction from Transformer Hidden States: Architecture-Dependent Manifold Structure and Adaptive Observation Protocols

Kentaro Sato — info@metaclan.jp

Key Findings

# Finding Evidence
1 Standard Hilbert pipeline fails on Transformers R-bar ≈ 0.12, indistinguishable from null
2 PCA + atan2 achieves R-bar = 0.93–0.98 on Pre-LN architectures 7–8x improvement over Hilbert
3 LayerNorm placement is the controlling variable GPT-1 vs GPT-2: 6x difference in PCA concentration (16.3% vs 96.3%), all other dimensions matched
4 Wide-bandpass Hilbert provides a universal fallback R-bar = 0.60–0.94 across all 9 architectures

Adaptive Protocol

The full protocol is simple: compute PCA variance explained at k=2 (ρ₂). If ρ₂ > 0.80, use geometric extraction. Otherwise, use wide-bandpass Hilbert. Two methods, one decision boundary, works on everything we tested.

Quick Start

git clone https://github.com/metaSATOKEN/geometric_phase_extraction.git
cd geometric_phase_extraction/paper

# Install dependencies
python3 -m venv .venv && source .venv/bin/activate
pip install -r experiments/requirements.txt

# Run all experiments (~45 min on M1 Mac)
python experiments/run_all.py

# Or run a single experiment
python experiments/run_all.py --exp 3    # Geometric phase extraction only
python experiments/run_all.py --exp 1 3  # Multiple experiments

All figures are written to experiments/results/. No GPU required.

Related Papers

This paper provides the theoretical foundation for the Recync framework — runtime coherence control for LLMs. The phase structure discovered here explains why cosine-similarity-based detection generalizes across architectures.

# Paper Role DOI
Paper 1 Geometric Phase Extraction (this paper) Theoretical foundation 10.5281/zenodo.19230566
Paper 2 From Monitoring to Intervention Detection + token-level control limits 10.5281/zenodo.19148449
Paper 3 Beyond Micro-Control Response-level checkpoint restart 10.5281/zenodo.19148721

Code for Paper 2 & 3: github.com/metaSATOKEN/Recync_framework

Experiments Overview

Core Experiments (Exp 1-7)

# Script Question Models Runtime
1 exp1_phase_extraction.py Does the Hilbert pipeline extract phase from Transformers? GPT-2 ~18 s
2 exp2_prediction_error.py Does prediction error differentiate text difficulty? GPT-2 ~13 s
3 exp3_atan2_phase.py Does PCA + atan2 outperform Hilbert? GPT-2 ~13 s
4 exp4_atan2_cross_arch.py Does geometric extraction generalize across architectures? GPT-2, OPT-1.3B ~25 s
5 exp5_scaling_law.py How does phase coherence scale with model size? 6 models ~8 min
6 exp6_layernorm_controlled.py Is LayerNorm placement the controlling variable? 8 models ~4 min
7 exp7_hilbert_wide.py Can wide-bandpass Hilbert serve as a universal fallback? 6 models ~31 s

Supplementary Experiments (Exp A-F)

# Script Question Key Result
A expA_bootstrap_ci.py How tight are the confidence intervals? All CIs exclude null; widths 0.008-0.072
B expB_shuffle_null.py Is R-bar driven by token order or architecture? Shuffle drops R-bar by 9% (p < 0.01)
C expC_phase_prediction_bridge.py Does phase velocity correlate with prediction error? r = 0.72 (narrative), -0.58 (technical)
D expD_opt_within_family.py Does within-family OPT comparison replicate GPT result? OPT-350m vs OPT-125m mirrors GPT-1 vs GPT-2
E expE_eval_vs_train.py Does dropout/train mode affect results? No significant difference (eval vs train)
F expF_random_input.py Is phase structure input-dependent or architectural? Random tokens yield R-bar = 0.988 vs 0.962 natural

Models Used

Nine pre-trained models spanning 110M–2.8B parameters:

Model Params LayerNorm HuggingFace ID
GPT-1 110M Post-LN openai-gpt
OPT-350m 350M Post-LN facebook/opt-350m
GPT-2 124M Pre-LN gpt2
OPT-125m 125M Pre-LN facebook/opt-125m
OPT-1.3B 1.3B Pre-LN facebook/opt-1.3b
OPT-2.7B 2.7B Pre-LN facebook/opt-2.7b
Qwen2-0.5B 0.5B Pre-LN (RMSNorm) Qwen/Qwen2-0.5B
Qwen2-1.5B 1.5B Pre-LN (RMSNorm) Qwen/Qwen2-1.5B
Pythia-2.8B 2.8B Pre-LN EleutherAI/pythia-2.8b

All models are automatically downloaded from HuggingFace Hub on first run. The largest models (OPT-2.7B, Pythia-2.8B) use FP16 and require approximately 7 GB RAM.

Repository Structure

paper/
├── geometric_phase_extraction.tex   # LaTeX source (arXiv-ready)
├── geometric_phase_extraction.pdf   # Compiled PDF
├── GEOMETRIC_PHASE_EXTRACTION.md    # Markdown version of the full paper
├── EXPERIMENT_REPORT.md       # Detailed per-experiment methodology and results
├── LICENSE                    # CC BY 4.0 (paper content)
├── experiments/
│   ├── run_all.py             # CLI runner for all experiments
│   ├── exp1–exp7              # Core experiments
│   ├── expA–expF              # Supplementary experiments
│   ├── requirements.txt
│   └── results/               # Generated figures (PNG)
├── ZENODO_DESCRIPTION.md      # Zenodo record description
└── README.md                  # This file

Hardware Requirements

  • Minimum: 16 GB RAM, any modern CPU with Python 3.10+
  • Tested on: Apple M1, 16 GB (all experiments pass)
  • GPU: Not required (CPU inference throughout)

Generated Figures

Main figures (referenced in the paper):

Figure File Section
Fig. 1 exp1_narrative.png 5.1 — Hilbert null result
Fig. 2 exp3_method_comparison.png 5.2 — Hilbert vs. atan2 comparison
Fig. 3 exp2_comparison.png 5.3 — Prediction error across difficulty
Fig. 4 exp4_atan2_cross_arch.png 5.4 — Cross-architecture comparison
Fig. 5 exp5_scaling_law.png 5.5 — Scaling analysis
Fig. 6 exp6_layernorm_controlled.png 5.6 — LayerNorm controlled experiment
Fig. 7 exp7_hilbert_wide.png 5.7 — Wide-bandpass Hilbert fallback
Fig. 8 expC_bridge_narrative.png B.3 — Phase-prediction error bridge
Fig. 9 expD_opt_within_family.png B.4 — OPT within-family replication
Fig. 10 expF_random_input.png B.6 — Random vs. natural input comparison

Contact & Collaboration

Feedback, bug reports, and collaboration proposals are welcome.

License

  • Paper content (.tex, .pdf, .md, figures): CC BY 4.0 — see LICENSE in this directory.
  • Experiment code (.py): Apache License 2.0 — see ../LICENSE in the repository root.

Copyright 2026 Kentaro Sato.

Citation

DOI

@article{sato2026geometric,
  title={Geometric Phase Extraction from Transformer Hidden States:
         Architecture-Dependent Manifold Structure and Adaptive
         Observation Protocols},
  author={Sato, Kentaro},
  journal={arXiv preprint},
  year={2026},
  doi={10.5281/zenodo.19230566},
  url={https://github.com/metaSATOKEN/geometric_phase_extraction}
}

About

Geometric phase structure in Transformer hidden states. LayerNorm placement predicts manifold geometry — 6x difference in PCA concentration. 9 models, 13 experiments.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors