measure(H_1041): 🔴 IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC — MPC reclaims deep-horizon lead on harder (nonlinear + partial-obs) tasks by dancinlife · Pull Request #1959 · dancinlab/anima

dancinlife · 2026-06-08T22:50:05Z

H_1041 — does imagine-rollout still beat MPC on a HARDER control task? (H_1034 generalization)

🔴 RED — IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC (closed-negative, a_paper_negative_ok). $0 CPU-local, 0 pods, g5 CODE-measured, a_phi_iit4_tool n/a.

Pre-registered falsifier (UNIVERSE/H_1041_imagine_harder_control.md, merged #1939) filled verbatim: three planners (naive-MPC, robust scenario/SAA tube-MPC, imagine-rollout through anima's learned LDS WM), frozen depth ladder {1,2,4,8,16}, N_RUNS=40×40ep, GAP_TOL=0.05, Welch p<1e-3. PASS = imagine still leads the best MPC by >GAP_TOL at deep horizon {8,16} on ≥1 harder task.

reproduce-H_1034 gate = PASS (bit-identical)

Re-ran the H_1034 planners; naive/robust/imag curves matched the stored .verdicts/1034 verdict EXACTLY before scoring (g73 honored).

Result = FAIL on BOTH harder tasks (advantage does NOT generalize)

task	d=8 lead (imag−bestMPC)	d=16 lead	result
A — nonlinear pendulum swing-up (angle-only obs, ω hidden)	−0.8336 (p=1.7e-61)	−0.8883 (p=3.9e-58)	MPC beats imagine
B — partial-obs + obs-noise station-keeping (Kalman-belief MPC)	−29.29 (p=9.1e-76)	−22.93 (p=1.4e-69)	MPC beats imagine

Task A is the clean FAIL: both MPCs plan the EXACT nonlinear true dynamics while imagine's WM is a LINEAR LDS that can't capture sin(θ). Task B FAILs harder (linear WM diverges under heavy obs-noise; MPCs get the optimal Kalman belief).

Read

The H_1034 'imagine beats MPC at deep horizon' was SPECIFIC to stiff-LINEAR CEM-landscape difficulty. When the true dynamics are nonlinear or the optimal belief (Kalman) is available, a true-model MPC reclaims the deep-horizon lead. The MECHANISM (robust/expected-return planning beats a brittle noise-free landscape) transfers; the HEADLINE (imagine > MPC) does NOT generalize beyond the stiff-linear toy.

TOY single rung per task; scenario-tube + Kalman-belief variants only; scale-transfer UNVERIFIED (a_scale_honest_scope · a_toy_scale_recheck).

Artifacts: UNIVERSE/h1041_imagine_harder_control.py · .verdicts/1041_imagine_harder_control/H_1041.txt · tiered UNIVERSE/H_1041_imagine_harder_control.md.

🤖 Generated with Claude Code

…rtial-obs Kalman) + reproduce-H_1034 gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…deep-horizon lead on nonlinear + partial-obs harder tasks H_1034 imagine>MPC advantage does NOT generalize: on a nonlinear pendulum swing-up (MPC plans exact nonlinear dynamics, linear-LDS WM cannot) and a partial-obs+noise station-keeping with a Kalman-belief MPC, the best MPC catches/beats imagine at deep horizon {8,16} (Task A lead -0.83/-0.89, Task B -29/-23; all Welch p<1e-3). The mechanism (robust planning vs brittle noise-free CEM landscape) transfers; the headline does not. reproduce-H_1034 = bit-identical PASS before scoring. $0 CPU-local, 0 pods, g5 CODE-measured, a_phi_iit4_tool n/a. Toy single rung per task, scale-transfer UNVERIFIED. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dancinlife and others added 2 commits June 9, 2026 07:36

WIP(H_1041): author harder-control falsifier (nonlinear swing-up + pa…

8f1314b

…rtial-obs Kalman) + reproduce-H_1034 gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dancinlife merged commit 2400661 into main Jun 8, 2026

dancinlife deleted the worktree-agent-a0e08c829dc1e0dad branch June 8, 2026 22:50

dancinlife mentioned this pull request Jun 8, 2026

matrix(MATRIX.tape): 0-pod consciousness-measurement cluster H_1037–H_1057 → axis D #1961

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

measure(H_1041): 🔴 IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC — MPC reclaims deep-horizon lead on harder (nonlinear + partial-obs) tasks#1959

measure(H_1041): 🔴 IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC — MPC reclaims deep-horizon lead on harder (nonlinear + partial-obs) tasks#1959
dancinlife merged 2 commits into
mainfrom
worktree-agent-a0e08c829dc1e0dad

dancinlife commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dancinlife commented Jun 8, 2026

H_1041 — does imagine-rollout still beat MPC on a HARDER control task? (H_1034 generalization)

reproduce-H_1034 gate = PASS (bit-identical)

Result = FAIL on BOTH harder tasks (advantage does NOT generalize)

Read

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant