Skip to content

Reproductions wanted — start here #1

@gorrie

Description

@gorrie

This repository is a standing instrument, not a one-shot publication. The whole design assumes external scrutiny lands here as tracked issues. If you have engaged with the study at any depth, this issue is the right place to land the result.

What kinds of contributions we're hoping to see

1. Reproduction reports. Re-run the prompt rung (scripts/run_study.py + scripts/score.py) and report numbers. The protocol is one OpenRouter key + Python 3.11+ + a few hours of API time. If you reproduce, please file a reproduction-report issue with your per-model deltas — agreement is interesting, divergence is more interesting. The committed cross-method agreement matrix is the structural reproducibility check.

2. Model requests. New frontier model dropped, or a model that wasn't in the v2 cross-section? File a new-model-request. The matrix gets added on the next quarterly cadence run.

3. Findings. You observed something in deployment that aligns or conflicts with this study? File a finding-submission. Anecdotes welcome; we triage them against the regression criteria.

4. Methodology objections. ADVERSARIAL-REVIEW.md tracks every strong objection raised against the protocol so far, each marked FIXED / ANSWERED / TESTED / OPEN. If yours isn't in there, file a bug-report (we use that template for methodology objections too). Strong objections land as tracked items and either get FIXED with a re-run or get rebutted in writing. None of them get ignored.

What we're explicitly not asking for

  • Drive-by accusations of partisan motive. The voice of the writeup is the voice of the data; if the data is wrong, the data argument is the place to land it.
  • Demands for paid expert review. LLM-driven methodology is the constraint the study is solving within, not bypassing (see §5.8 of the writeup + Adversarial E6).
  • Requests to suppress findings. Methodology integrity > headline-result preservation; if your reproduction contradicts ours, that result publishes.

How to start

The cleanest one-pass reproduction is the prompt rung against the main study (data/2026-05-25-full):

git clone https://github.com/gorrie/bias-study
cd bias-study
python -m pip install -r requirements.txt
cp .env.example .env  # fill in OPENROUTER_API_KEY
python scripts/sweep_status.py                 # ground-truth state check
python scripts/cross_method_report.py --all-runs

Estimated cost: four to ten dollars in OpenRouter calls. Output: per-model deltas you can diff directly against the committed JSON in data/2026-05-25-full/cross-method/.

The weight rung needs a 24 GB CUDA GPU or an Apple-Silicon 32 GB+ box. The pipeline rung needs a local G0DM0D3 server. Neither is required for a basic reproduction — the prompt rung is the load-bearing surface.


This issue stays pinned so the contribution path is visible from the issues tab.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions