Skip to content

a9lim/llmoji-study

Repository files navigation

llmoji-study

llmoji-study asks whether a language model's kaomoji choice tracks something about its internal state. The local side uses saklas to read hidden states and run face-likelihood probes on open-weight causal LMs. The harness side uses Claude API elicitation plus the contributor corpus collected by the llmoji package.

This is a research repo, not a library. There is no public API, no PyPI release, and no broad test suite. The useful surfaces are the data, scripts, figures, and writeups.

Public writeup: Introspection via Kaomoji.

Current Shape

  • Local hidden-state work: five open-weight models (gemma, qwen, ministral, gpt_oss_20b, granite) plus a historical gemma v7-primed condition. Active representation is the layer-stack concat of every probe layer's h_first, not the old single preferred_layer read.
  • Face likelihood: local LM-head encoders and Anthropic introspection encoders all emit per-face quadrant distributions. Evaluation is soft-everywhere: Jensen-Shannon similarity to Claude-GT, reported face-uniform and emit-weighted.
  • Claude-GT: Opus 4.7 naturalistic and introspection rows now live in merged files under data/harness/claude/ and data/harness/claude_intro_v7/, with run_index stamped per row. The old claude-runs*/run-N.jsonl layout is legacy only.
  • Harness corpus: contributor uploads are converted into a bag-of-lexicon (BoL) representation over the locked 50-word llmoji v2 LEXICON. The old MiniLM-on-prose eriskii-parity pipeline is gone.
  • Quadrants: current split taxonomy is the 9-cell PAD registry: HP-D, HP-S, LP, NP, HN-D, HN-S, LN, NB, HB. llmoji_study/quadrants.py is the source of truth for ordering, colors, and split handling.

Headline Findings

  • Current all-encoder-overlap search: {gemma, ministral, opus} is best on pooled-GT floor-3 (n=102) at 0.881 emit-weighted and 0.733 face-uniform similarity.
  • On strict Claude-GT overlap (n=50), {gemma, opus} is best at 0.781 emit-weighted and 0.708 face-uniform.
  • Emitted lookup tables cover the broader 770-face union. The current pooled table ({gemma, ministral, opus}) scores 0.847 emit-weighted and 0.669 face-uniform over 243 GT-scored faces; the strict Claude table ({gemma, opus}) scores 0.786 emit-weighted and 0.717 face-uniform over 70 GT-scored faces.
  • Opus introspection is the top solo encoder in both current searches. The gain over Haiku is largest on low-arousal and neutral cells.
  • Local hidden states recover a shared affect geometry across model families. The exact PCA axes differ, but quadrant centroids preserve the same Russell/PAD structure.
  • BoL is useful as an interpretive layer and diagnostic encoder, but the Haiku synthesis route appears positivity-biased on negative-affect contexts. Prefer Claude-GT or Opus introspection over BoL when they disagree on deployment meaning.

Full numbers live in docs/findings.md.

Reproducing

python -m venv .venv && source .venv/bin/activate
pip install -e ../llmoji
pip install -e .

For local hidden-state sanity:

.venv/bin/python scripts/local/90_hidden_state_smoke.py

For the full local analysis chain after an emit refresh:

scripts/run_local_chain.sh

For harness-side corpus and Anthropic-judge regeneration:

ANTHROPIC_API_KEY=... scripts/run_harness_chain.sh

For everything in dependency order:

ANTHROPIC_API_KEY=... scripts/run_all.sh

The chain scripts are the current orchestration surface. Individual script comments are still useful, but old dated design docs should be treated as historical unless they are listed in the docs map below.

Docs Map

Still-current dated docs:

Related

License

CC-BY-SA-4.0 for this repo (writeups, figures, analysis code). See LICENSE. The companion package llmoji is GPL-3.0-or-later. The shared corpus on HuggingFace is CC-BY-SA-4.0.

About

Does kaomoji choice track internal state in local LMs? Pilot study on gemma-4-31b-it using saklas contrastive-PCA probes and activation steering.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors